diff mbox

[v5,2/5] driver core: Functional dependencies tracking support

Message ID 2715729.9U1nlcpFb3@vostro.rjw.lan (mailing list archive)
State Superseded, archived
Delegated to: Rafael Wysocki
Headers show

Commit Message

Rafael J. Wysocki Oct. 10, 2016, 12:51 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Currently, there is a problem with taking functional dependencies
between devices into account.

What I mean by a "functional dependency" is when the driver of device
B needs device A to be functional and (generally) its driver to be
present in order to work properly.  This has certain consequences
for power management (suspend/resume and runtime PM ordering) and
shutdown ordering of these devices.  In general, it also implies that
the driver of A needs to be working for B to be probed successfully
and it cannot be unbound from the device before the B's driver.

Support for representing those functional dependencies between
devices is added here to allow the driver core to track them and act
on them in certain cases where applicable.

The argument for doing that in the driver core is that there are
quite a few distinct use cases involving device dependencies, they
are relatively hard to get right in a driver (if one wants to
address all of them properly) and it only gets worse if multiplied
by the number of drivers potentially needing to do it.  Morever, at
least one case (asynchronous system suspend/resume) cannot be handled
in a single driver at all, because it requires the driver of A to
wait for B to suspend (during system suspend) and the driver of B to
wait for A to resume (during system resume).

For this reason, represent dependencies between devices as "links",
with the help of struct device_link objects each containing pointers
to the "linked" devices, a list node for each of them, status
information, flags, and an RCU head for synchronization.

Also add two new list heads, representing the lists of links to the
devices that depend on the given one (consumers) and to the devices
depended on by it (suppliers), and a "driver presence status" field
(needed for figuring out initial states of device links) to struct
device.

The entire data structure consisting of all of the lists of link
objects for all devices is protected by a mutex (for link object
addition/removal and for list walks during device driver probing
and removal) and by SRCU (for list walking in other case that will
be introduced by subsequent change sets).  In addition, each link
object has an internal status field whose value reflects whether or
not drivers are bound to the devices pointed to by the link or
probing/removal of their drivers is in progress etc.  That field
is only modified under the device links mutex, but it may be read
outside of it in some cases (introduced by subsequent change sets),
so modifications of it are annotated with WRITE_ONCE().

New links are added by calling device_link_add() which takes three
arguments: pointers to the devices in question and flags.  In
particular, if DL_FLAG_STATELESS is set in the flags, the link status
is not to be taken into account for this link and the driver core
will not manage it.  In turn, if DL_FLAG_AUTOREMOVE is set in the
flags, the driver core will remove the link automatically when the
consumer device driver unbinds from it.

One of the actions carried out by device_link_add() is to reorder
the lists used for device shutdown and system suspend/resume to
put the consumer device along with all of its children and all of
its consumers (and so on, recursively) to the ends of those lists
in order to ensure the right ordering between all of the supplier
and consumer devices.

For this reason, it is not possible to create a link between two
devices if the would-be supplier device already depends on the
would-be consumer device as either a direct descendant of it or a
consumer of one of its direct descendants or one of its consumers
and so on.

There are two types of link objects, persistent and non-persistent.
The persistent ones stay around until one of the target devices is
deleted, while the non-persistent ones are removed automatically when
the consumer driver unbinds from its device (ie. they are assumed to
be valid only as long as the consumer device has a driver bound to
it).  Persistent links are created by default and non-persistent
links are created when the DL_FLAG_AUTOREMOVE flag is passed
to device_link_add().

Both persistent and non-persistent device links can be deleted
with an explicit call to device_link_del().

Links created without the DL_FLAG_STATELESS flag set are managed
by the driver core using a simple state machine.  There are 5 states
each link can be in: DORMANT (unused), AVAILABLE (the supplier driver
is present and functional), CONSUMER_PROBE (the consumer driver is
probing), ACTIVE (both supplier and consumer drivers are present and
functional), and SUPPLIER_UNBIND (the supplier driver is unbinding).
The driver core updates the link state automatically depending on
what happens to the linked devices and for each link state specific
actions are taken in addition to that.

For example, if the supplier driver unbinds from its device, the
driver core will also unbind the drivers of all of its consumers
automatically under the assumption that they cannot function
properly without the supplier.  Analogously, the driver core will
only allow the consumer driver to bind to its device if the
supplier driver is present and functional (ie. the link is in
the AVAILABLE state).  If that's not the case, it will rely on
the existing deferred probing mechanism to wait for the supplier
driver to become available.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v4 -> v5:

- Redefine device_link_add() to take three arguments, the supplier and consumer
  pointers and flags, and to figure out the initial link state automatically.

- Move the links-related fields in struct device to a separate sub-structure
  and add a "driver presence tracking" field to it (to help device_link_add()
  to do its job).

- Modify device_links_check_suppliers(), device_links_driver_bound(),
  device_links_no_driver(), device_links_driver_cleanup(), device_links_busy(),
  and device_links_unbind_consumers() to walk link lists under device_links_lock
  (to make the new "driver presence tracking" mechanism work reliably).

- Drop the (not necessary any more) spinlock from struct device_link.

- Rename symbols to better reflect their purpose (flags vs link states etc).

v3 -> v4:

- Add the in_dpm_list field to struct dev_pm_info and use it for checking
  if the device has been added to dpm_list already, which is needed for
  handling the "not registered consumer" case in device_link_add().

- Rework device_link_add() to handle consumer devices that have not been
  registered before calling it.

- Drop the parent check from device_link_add().

- Rename device_links_driver_gone() to device_links_driver_cleanup()

- Rearrange device_links_unbind_consumers() to avoid one spin_unlock()
  instance (which wasn't necessary).

- Introduce device_links_purge() to delete links from a device going away
  and make device_del() call it (instead of running links-related code
  directly).

- Move the invocation of device_links_ckeck_suppliers() to really_probe().

- Change the description of the DEVICE_LINK_STATELESS flag.

---
 drivers/base/base.h        |   13 +
 drivers/base/core.c        |  491 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/base/dd.c          |   41 +++
 drivers/base/power/main.c  |    2 
 drivers/base/power/power.h |   10 
 include/linux/device.h     |   78 +++++++
 include/linux/pm.h         |    1 
 7 files changed, 631 insertions(+), 5 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Lukas Wunner Oct. 26, 2016, 11:19 a.m. UTC | #1
Hi Rafael,

sorry for not responding to v5 of your series earlier, just sending
this out now in the hope that it reaches you before your travels.

On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> - Modify device_links_check_suppliers(), device_links_driver_bound(),
>   device_links_no_driver(), device_links_driver_cleanup(), device_links_busy(),
>   and device_links_unbind_consumers() to walk link lists under device_links_lock
>   (to make the new "driver presence tracking" mechanism work reliably).

This change might increase boot time if drivers return -EPROBE_DEFER.
It can easily happen that such drivers are probed dozens of times,
and each time the lock will be acquired, blocking other drivers from
probing.  Granted, it's hard to measure if and in how far boot time
really increases, but somehow I liked the previous approach better.
The combination of a mutex with an RCU plus a spinlock seemed complicated
when I reviewed the patch, but I never said anything because I came
to the conclusion that the effort seemed worthwhile to keep up the
performance.

(BTW, kbuild test robot has complained that the usage of RCUs isn't
enclosed in #ifdef CONFIG_SRCU.  Just in case you haven't seen that.)


> - Redefine device_link_add() to take three arguments, the supplier and consumer
>   pointers and flags, and to figure out the initial link state automatically.

That's a good idea, however it seems to me that there's some redundancy
between the dl_dev_state and device_link_state:  AFAICS, at any given
moment the device_link_state can be computed from the supplier's and
consumer's dl_dev_state.

One could argue that caching the device_link_state is cheaper than
recomputing it every time it's needed, but often the dl_dev_state of
one of the two devices is known, so the link can only be in a subset
of all possible states, and instead of computing the device_link_state
it's sufficient to just look at the other device's dl_dev_state.

E.g. in device_links_check_suppliers() we have this:

+		if (link->status != DL_STATE_AVAILABLE) {
+			device_links_missing_supplier(dev);
+			ret = -EPROBE_DEFER;
+			break;
+		}

When this function is called we know that the consumer's dl_dev_state is
DL_DEV_PROBING.  Of interest is only the status of the supplier. The above
if-condition would seem to be equivalent to:

		if (link->supplier->links.status == DL_DEV_DRIVER_BOUND) {

So the device_link_state seems redundant, but if it's removed from
struct device_link, changes to dl_dev_state need to be synchronized
between supplier and consumers.  Perhaps this could be accomplished
by acquiring the device_lock for the other device.  E.g. when a
consumer wants to probe, before changing its own dl_dev_state it would
have to acquire the device_lock for all suppliers to prevent races
if one of them simultaneously decides to unbind (and changes its
dl_dev_state to DL_DEV_UNBINDING).  Hm, could this deadlock?


Also, I notice that the dl_dev_state is part of a device's "links"
structure, but being able to look up a device's driver state might be
useful in other cases, not just for device links.  Maybe it makes sense
to move the dl_dev_state one level up to struct device and name the
attribute "driver_status" or somesuch (like the existing "driver" and
"driver_data" attributes).


> +	/* Deterine the initial link state. */
                ^
                m

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Oct. 27, 2016, 3:25 p.m. UTC | #2
On Wed, Oct 26, 2016 at 01:19:02PM +0200, Lukas Wunner wrote:
> Hi Rafael,
> 
> sorry for not responding to v5 of your series earlier, just sending
> this out now in the hope that it reaches you before your travels.
> 
> On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> > - Modify device_links_check_suppliers(), device_links_driver_bound(),
> >   device_links_no_driver(), device_links_driver_cleanup(), device_links_busy(),
> >   and device_links_unbind_consumers() to walk link lists under device_links_lock
> >   (to make the new "driver presence tracking" mechanism work reliably).
> 
> This change might increase boot time if drivers return -EPROBE_DEFER.

"might"?  Please verify this before guessing....

And don't make this more complex than needed before actually determining
a real issue.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Oct. 27, 2016, 3:32 p.m. UTC | #3
On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> +/*
> + * Device link flags.
> + *
> + * STATELESS: The core won't track the presence of supplier/consumer drivers.
> + * AUTOREMOVE: Remove this link automatically on consumer driver unbind.
> + */
> +#define DL_FLAG_STATELESS	(1 << 0)
> +#define DL_FLAG_AUTOREMOVE	(1 << 1)

Minor nit, BIT()?  I'll leave this for someone to send a checkpatch
cleanup patch later on :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Oct. 28, 2016, 9:57 a.m. UTC | #4
On Thu, Oct 27, 2016 at 05:25:51PM +0200, Greg Kroah-Hartman wrote:
> On Wed, Oct 26, 2016 at 01:19:02PM +0200, Lukas Wunner wrote:
> > On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> > > - Modify device_links_check_suppliers(), device_links_driver_bound(),
> > >   device_links_no_driver(), device_links_driver_cleanup(), device_links_busy(),
> > >   and device_links_unbind_consumers() to walk link lists under device_links_lock
> > >   (to make the new "driver presence tracking" mechanism work reliably).
> > 
> > This change might increase boot time if drivers return -EPROBE_DEFER.
> 
> "might"?  Please verify this before guessing....

I can't, my machine only uses device links for Thunderbolt hotplug ports,
and their driver (portdrv) doesn't use deferred probing.  I'm only aware
of a single device in my system whose driver causes others to defer
probing (apple-gmux), but they don't use device links, so the mutex is
only acquired very briefly because the supplier/consumer lists are empty,
hence the performance impact can't be measured on my system.  But the
situation may be different for Marek's or Hanjun's use cases. I'm not
familiar with their systems, hence the "might".


> And don't make this more complex than needed before actually determining
> a real issue.

As pointed out it currently *is* more complex than needed because the
device_link_state is dispensable.  The whole issue is moot with the
changes I suggested because the mutex would only have to be taken for
addition/deletion of device links, not when probing.

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Nov. 7, 2016, 9:22 p.m. UTC | #5
On Thu, Oct 27, 2016 at 05:25:51PM +0200, Greg Kroah-Hartman wrote:
> On Wed, Oct 26, 2016 at 01:19:02PM +0200, Lukas Wunner wrote:
> > Hi Rafael,
> > 
> > sorry for not responding to v5 of your series earlier, just sending
> > this out now in the hope that it reaches you before your travels.
> > 
> > On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> > > - Modify device_links_check_suppliers(), device_links_driver_bound(),
> > >   device_links_no_driver(), device_links_driver_cleanup(), device_links_busy(),
> > >   and device_links_unbind_consumers() to walk link lists under device_links_lock
> > >   (to make the new "driver presence tracking" mechanism work reliably).
> > 
> > This change might increase boot time if drivers return -EPROBE_DEFER.
> 
> "might"?  Please verify this before guessing....
> 
> And don't make this more complex than needed before actually determining
> a real issue.

As clarified by Rafael at Plumbers, this functional dependencies
framework assumes your driver / subsystem supports deferred probe,
if it does not support its not clear what will happen....

We have no explicit semantics to check if a driver / subsystem
supports deferred probe.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Nov. 7, 2016, 9:39 p.m. UTC | #6
On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Currently, there is a problem with taking functional dependencies
> between devices into account.
> 
> What I mean by a "functional dependency" is when the driver of device
> B needs device A to be functional and (generally) its driver to be
> present in order to work properly.  This has certain consequences
> for power management (suspend/resume and runtime PM ordering) and
> shutdown ordering of these devices.  In general, it also implies that
> the driver of A needs to be working for B to be probed successfully
> and it cannot be unbound from the device before the B's driver.
> 
> Support for representing those functional dependencies between
> devices is added here to allow the driver core to track them and act
> on them in certain cases where applicable.
> 
> The argument for doing that in the driver core is that there are
> quite a few distinct use cases involving device dependencies, they
> are relatively hard to get right in a driver (if one wants to
> address all of them properly) and it only gets worse if multiplied
> by the number of drivers potentially needing to do it.

How many drivers actually *need this* today for suspend / resume ?

How many of these are because of ACPI firmware bugs rather than
some other reason ?

Can ACPI somehow be used instead by these devices to address quirks?

> Morever, at
> least one case (asynchronous system suspend/resume) cannot be handled
> in a single driver at all, because it requires the driver of A to
> wait for B to suspend (during system suspend) and the driver of B to
> wait for A to resume (during system resume).

Why is this all of a sudden a new issue? It seems there is quite a bit of
frameworks already out there that somehow deal with some sort of device
ordering / dependencies, and I am curious if they've been addressing some of
these problems for a while now on their own somehow with their own solutions,
is that the case? For instance can DAPM the / DRM / Audio component framework
v4l async solution be used or does it already address some of these concerns ?

> For this reason, represent dependencies between devices as "links",
> with the help of struct device_link objects each containing pointers
> to the "linked" devices, a list node for each of them, status
> information, flags, and an RCU head for synchronization.
> 
> Also add two new list heads, representing the lists of links to the
> devices that depend on the given one (consumers) and to the devices
> depended on by it (suppliers), and a "driver presence status" field
> (needed for figuring out initial states of device links) to struct
> device.
> 
> The entire data structure consisting of all of the lists of link
> objects for all devices is protected by a mutex (for link object
> addition/removal and for list walks during device driver probing
> and removal) and by SRCU (for list walking in other case that will
> be introduced by subsequent change sets).  In addition, each link
> object has an internal status field whose value reflects whether or
> not drivers are bound to the devices pointed to by the link or
> probing/removal of their drivers is in progress etc.  That field
> is only modified under the device links mutex, but it may be read
> outside of it in some cases (introduced by subsequent change sets),
> so modifications of it are annotated with WRITE_ONCE().
> 
> New links are added by calling device_link_add() which takes three
> arguments: pointers to the devices in question and flags.  In
> particular, if DL_FLAG_STATELESS is set in the flags, the link status
> is not to be taken into account for this link and the driver core
> will not manage it.  In turn, if DL_FLAG_AUTOREMOVE is set in the
> flags, the driver core will remove the link automatically when the
> consumer device driver unbinds from it.
> 
> One of the actions carried out by device_link_add() is to reorder
> the lists used for device shutdown and system suspend/resume to
> put the consumer device along with all of its children and all of
> its consumers (and so on, recursively) to the ends of those lists
> in order to ensure the right ordering between all of the supplier
> and consumer devices.

There's no explanation as to why this order is ensured to be
correct, I think its important to document this. From our discussions
at Plumbers it seems the order is ensured due to the fact that order
was already implicitly provided through platform firmware (ACPI
enumeration is one), adjusting order on the dpm list is just shuffling
order between consumer / provider, but nothing else. In theory it
works because in the simple case this should suffice however I
remain unconvinced that if we have more users of this framework this
simple algorithm will suffice. Having a test driver or series of
test drivers that shows this would be good. As it stands there is simply
an assumption that this is correct, how *strongly* do you feel that
the order will *always* be correct if this is done and no cycles
can be added, even if tons of drivers start using this ?

> For this reason, it is not possible to create a link between two
> devices if the would-be supplier device already depends on the
> would-be consumer device as either a direct descendant of it or a
> consumer of one of its direct descendants or one of its consumers
> and so on.
> 
> There are two types of link objects, persistent and non-persistent.
> The persistent ones stay around until one of the target devices is
> deleted, while the non-persistent ones are removed automatically when
> the consumer driver unbinds from its device (ie. they are assumed to
> be valid only as long as the consumer device has a driver bound to
> it).  Persistent links are created by default and non-persistent
> links are created when the DL_FLAG_AUTOREMOVE flag is passed
> to device_link_add().
> 
> Both persistent and non-persistent device links can be deleted
> with an explicit call to device_link_del().
> 
> Links created without the DL_FLAG_STATELESS flag set are managed
> by the driver core using a simple state machine.  There are 5 states
> each link can be in: DORMANT (unused), AVAILABLE (the supplier driver
> is present and functional), CONSUMER_PROBE (the consumer driver is
> probing), ACTIVE (both supplier and consumer drivers are present and
> functional), and SUPPLIER_UNBIND (the supplier driver is unbinding).
> The driver core updates the link state automatically depending on
> what happens to the linked devices and for each link state specific
> actions are taken in addition to that.
> 
> For example, if the supplier driver unbinds from its device, the
> driver core will also unbind the drivers of all of its consumers
> automatically under the assumption that they cannot function
> properly without the supplier.  Analogously, the driver core will
> only allow the consumer driver to bind to its device if the
> supplier driver is present and functional (ie. the link is in
> the AVAILABLE state).  If that's not the case, it will rely on
> the existing deferred probing mechanism to wait for the supplier
> driver to become available.

This assumes drivers and subsystems support deferred probe, is
there a way to ensure that drivers that use this framework support
it instead of possibly breaking them if they use this framework ?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Nov. 8, 2016, 6:45 a.m. UTC | #7
On Mon, Nov 07, 2016 at 10:22:50PM +0100, Luis R. Rodriguez wrote:
> We have no explicit semantics to check if a driver / subsystem
> supports deferred probe.

Why would we need such a thing?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Nov. 8, 2016, 7:21 p.m. UTC | #8
On Tue, Nov 08, 2016 at 07:45:41AM +0100, Greg Kroah-Hartman wrote:
> On Mon, Nov 07, 2016 at 10:22:50PM +0100, Luis R. Rodriguez wrote:
> > We have no explicit semantics to check if a driver / subsystem
> > supports deferred probe.
> 
> Why would we need such a thing?

It depends on the impact of a driver/subsystem not properly supporting
deffered probe, if this is no-op then such a need is not critical but
would be good to proactively inform developers / users so they avoid 
its use, if this will cause issues its perhaps best to make this a
no-op through a check. AFAICT reviewing implications of not supporting
deferred probe on drivers/subsytsems for this framework is not clearly
spelled out, if we start considering re-using this framework for probe
ordering I'd hate to see issues come up without this corner case being
concretely considered.

Furthermore -- how does this framework compare to Andrzej's resource tracking
solution? I confess I have not had a chance yet to review yet but in light of
this question it would be good to know if Andrzej's framework also requires
deferred probe as similar concerns would exist there as well.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Nov. 8, 2016, 7:43 p.m. UTC | #9
On Tue, Nov 08, 2016 at 08:21:04PM +0100, Luis R. Rodriguez wrote:
> On Tue, Nov 08, 2016 at 07:45:41AM +0100, Greg Kroah-Hartman wrote:
> > On Mon, Nov 07, 2016 at 10:22:50PM +0100, Luis R. Rodriguez wrote:
> > > We have no explicit semantics to check if a driver / subsystem
> > > supports deferred probe.
> > 
> > Why would we need such a thing?
> 
> It depends on the impact of a driver/subsystem not properly supporting
> deffered probe, if this is no-op then such a need is not critical but
> would be good to proactively inform developers / users so they avoid 
> its use, if this will cause issues its perhaps best to make this a
> no-op through a check. AFAICT reviewing implications of not supporting
> deferred probe on drivers/subsytsems for this framework is not clearly
> spelled out, if we start considering re-using this framework for probe
> ordering I'd hate to see issues come up without this corner case being
> concretely considered.

It should not matter to the driver core if a subsystem, or a driver,
supports or does not support deferred probe.  It's a quick and simple
solution to a complex problem that works well.  Yes, you can iterate a
lot of times, but that's fine, we have time at boot to do that (and
really, it is fast.)

> Furthermore -- how does this framework compare to Andrzej's resource tracking
> solution? I confess I have not had a chance yet to review yet but in light of
> this question it would be good to know if Andrzej's framework also requires
> deferred probe as similar concerns would exist there as well.

I have no idea what "framework" you are talking about here, do you have
a pointer to patches?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Nov. 8, 2016, 8:58 p.m. UTC | #10
On Tue, Nov 08, 2016 at 08:43:35PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Nov 08, 2016 at 08:21:04PM +0100, Luis R. Rodriguez wrote:
> > On Tue, Nov 08, 2016 at 07:45:41AM +0100, Greg Kroah-Hartman wrote:
> > > On Mon, Nov 07, 2016 at 10:22:50PM +0100, Luis R. Rodriguez wrote:
> > > > We have no explicit semantics to check if a driver / subsystem
> > > > supports deferred probe.
> > > 
> > > Why would we need such a thing?
> > 
> > It depends on the impact of a driver/subsystem not properly supporting
> > deffered probe, if this is no-op then such a need is not critical but
> > would be good to proactively inform developers / users so they avoid 
> > its use, if this will cause issues its perhaps best to make this a
> > no-op through a check. AFAICT reviewing implications of not supporting
> > deferred probe on drivers/subsytsems for this framework is not clearly
> > spelled out, if we start considering re-using this framework for probe
> > ordering I'd hate to see issues come up without this corner case being
> > concretely considered.
> 
> It should not matter to the driver core if a subsystem, or a driver,
> supports or does not support deferred probe.

That was my impression as well -- however Rafael noted this as an area
worth highlighting. So perhaps he can elaborate.

But at least as per my notes I do have here Geert Uytterhoeven reminding
us that:

  "Some drivers / subsystems don’t support deferred probe yet, such failures
   usually don’t blow up, but cause subtle malfunctioning. Example, an Ethernet
   phy could not get its interrupt as the primary IRQ chip had not been probed
   yet, it reverted to polling though. Sub-optimal." [0]

[0] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-August/003425.html

This sounds more like the existing deffered probe solution has detrimental
unexpected effects on some drivers / subsystems -- its not clear *why*, but
its worth reviewing if there are other drivers and seeing if we need this
annotation.

> It's a quick and simple solution to a complex problem that works well.

We all agree. The recent discussions over probe ordering are simply
optimization considerations -- should the time incurred to use deferred
probe be X and the time incurred when using an alternative is Y and we
determine the time Y is < X we have a winning alternative. Part of my
own big issue with deferred probe is a) its extremely non-deterministic,
b) it seems some folks have thought about similar problems and we might
really be able to do better. I'm not convinced that the functional
dependencies patches are the panacea for probe ordering, and while it
was not intended for that, some folks are assuming it could be. To really
vet this prospect we must really consider what other subsystem have done
and review other alternatives efforts.

> Yes, you can iterate a
> lot of times, but that's fine, we have time at boot to do that (and
> really, it is fast.)

Deferred probe is left for late_initcall() [1] -- this *assumes* that the
driver/subsystem can be loaded so late, and as per Andrzej this solution
isunacceptable/undesirable. So it would be unfair and incorrect to categorize
all drivers in the same boat, in fact given this lone fact it may be
worth revisiting the idea I mentioned about a capability aspect to support
a late deferred probe later. We tend to assume its fine, however it does
not seem to be the case.

[1] drivers/base/dd.c:late_initcall(deferred_probe_initcall);

> > Furthermore -- how does this framework compare to Andrzej's resource tracking
> > solution? I confess I have not had a chance yet to review yet but in light of
> > this question it would be good to know if Andrzej's framework also requires
> > deferred probe as similar concerns would exist there as well.
> 
> I have no idea what "framework" you are talking about here, do you have
> a pointer to patches?

I'm surprised given Andrzej did both Cc you on his patches [2] *and* chimed
in on Rafael's patches to indicate that we likely can integrate PM concerns
into his own "framework" [3]. There was no resolution to this discussion, however
its not IMHO sufficient to brush off Andrzej's points in particular because
Andrzej *is* indicating that his framework:

- Eliminates deferred probe and resulting late_initcall(), consumer registers
callbacks informing when given resources (clock, regulator, etc) becomes
available
- Properly handle resource disappearance (driver unbind, hotplug)
- Track resources which are not vital to the device, but can influence behavior
- Offers simplified resource allocation
- Can be easily expanded to help with power management

Granted I have not reviewed this yet but it at least was on my radar, and
I do believe its worth reviewing this further given the generally expressed
interest to see if we can have a common framework to address both ordering
problems, suspend and probe. At a quick glance the "ghost provider" idea
seems like a rather crazy idea but hey, there may be some goods in there.

It was sad both Andrzej and yourself could not attend the complex dependencies
tracks -- I think it would have been useful. I don't expect us to address a
resolution to probe ordering immediately -- but I am in the hopes we at least
can keep an open mind about the similarity of the problems and see if we can
aim for a clean elegant solution that might help both.

[2] https://lwn.net/Articles/625454/
[3] http://thread.gmane.org/gmane.linux.kernel/2087152

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Nov. 9, 2016, 6:45 a.m. UTC | #11
On Tue, Nov 08, 2016 at 09:58:24PM +0100, Luis R. Rodriguez wrote:
> > > Furthermore -- how does this framework compare to Andrzej's resource tracking
> > > solution? I confess I have not had a chance yet to review yet but in light of
> > > this question it would be good to know if Andrzej's framework also requires
> > > deferred probe as similar concerns would exist there as well.
> > 
> > I have no idea what "framework" you are talking about here, do you have
> > a pointer to patches?
> 
> I'm surprised given Andrzej did both Cc you on his patches [2] *and* chimed
> in on Rafael's patches to indicate that we likely can integrate PM concerns
> into his own "framework" [3]. There was no resolution to this discussion, however
> its not IMHO sufficient to brush off Andrzej's points in particular because
> Andrzej *is* indicating that his framework:

Dude, those patches were from 2014!  I can't remember patches people
sent to me a month ago...

> - Eliminates deferred probe and resulting late_initcall(), consumer registers
> callbacks informing when given resources (clock, regulator, etc) becomes
> available
> - Properly handle resource disappearance (driver unbind, hotplug)
> - Track resources which are not vital to the device, but can influence behavior
> - Offers simplified resource allocation
> - Can be easily expanded to help with power management
> 
> Granted I have not reviewed this yet but it at least was on my radar, and
> I do believe its worth reviewing this further given the generally expressed
> interest to see if we can have a common framework to address both ordering
> problems, suspend and probe. At a quick glance the "ghost provider" idea
> seems like a rather crazy idea but hey, there may be some goods in there.

From what I remember, and I could be totally wrong, these patches were
way too complex and required that every subsystem change their
interfaces.  That's not going to work out well, but read the email
threads for the details...

> It was sad both Andrzej and yourself could not attend the complex dependencies
> tracks -- I think it would have been useful.

Sometimes real-life gets in the way of work, sorry :(

> I don't expect us to address a
> resolution to probe ordering immediately -- but I am in the hopes we at least
> can keep an open mind about the similarity of the problems and see if we can
> aim for a clean elegant solution that might help both.

I'll always review patches of what people come up with.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrzej Hajda Nov. 9, 2016, 9:36 a.m. UTC | #12
On 09.11.2016 07:45, Greg Kroah-Hartman wrote:
> On Tue, Nov 08, 2016 at 09:58:24PM +0100, Luis R. Rodriguez wrote:
>>>> Furthermore -- how does this framework compare to Andrzej's resource tracking
>>>> solution? I confess I have not had a chance yet to review yet but in light of
>>>> this question it would be good to know if Andrzej's framework also requires
>>>> deferred probe as similar concerns would exist there as well.
>>> I have no idea what "framework" you are talking about here, do you have
>>> a pointer to patches?
>> I'm surprised given Andrzej did both Cc you on his patches [2] *and* chimed
>> in on Rafael's patches to indicate that we likely can integrate PM concerns
>> into his own "framework" [3]. There was no resolution to this discussion, however
>> its not IMHO sufficient to brush off Andrzej's points in particular because
>> Andrzej *is* indicating that his framework:
> Dude, those patches were from 2014!  I can't remember patches people
> sent to me a month ago...
>
>> - Eliminates deferred probe and resulting late_initcall(), consumer registers
>> callbacks informing when given resources (clock, regulator, etc) becomes
>> available
>> - Properly handle resource disappearance (driver unbind, hotplug)
>> - Track resources which are not vital to the device, but can influence behavior
>> - Offers simplified resource allocation
>> - Can be easily expanded to help with power management
>>
>> Granted I have not reviewed this yet but it at least was on my radar, and
>> I do believe its worth reviewing this further given the generally expressed
>> interest to see if we can have a common framework to address both ordering
>> problems, suspend and probe. At a quick glance the "ghost provider" idea
>> seems like a rather crazy idea but hey, there may be some goods in there.
> >From what I remember, and I could be totally wrong, these patches were
> way too complex and required that every subsystem change their
> interfaces.  That's not going to work out well, but read the email
> threads for the details...

I haven't seen your comment on my patches, except few general questions
regarding one of earlier version of the framework.
So maybe you are talking about different framework.

Regarding complexity, if the subsystem have simple way of
'(un)publishing' resources it just adds single calls to restrack core:
restrack_up, restrack_down in proper places.
Additionally it adds quite simple stuff to encapsulate resource
description and allocation routines into generic *_restrack_desc
structure, see for example patch adding restrack to phy framework[1].

[1]:
https://lists.freedesktop.org/archives/dri-devel/2014-December/073759.html

Regards
Andrzej

>
>> It was sad both Andrzej and yourself could not attend the complex dependencies
>> tracks -- I think it would have been useful.
> Sometimes real-life gets in the way of work, sorry :(
>
>> I don't expect us to address a
>> resolution to probe ordering immediately -- but I am in the hopes we at least
>> can keep an open mind about the similarity of the problems and see if we can
>> aim for a clean elegant solution that might help both.
> I'll always review patches of what people come up with.
>
> thanks,
>
> greg k-h
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Nov. 9, 2016, 9:41 a.m. UTC | #13
On Wed, Nov 09, 2016 at 10:36:54AM +0100, Andrzej Hajda wrote:
> On 09.11.2016 07:45, Greg Kroah-Hartman wrote:
> > On Tue, Nov 08, 2016 at 09:58:24PM +0100, Luis R. Rodriguez wrote:
> >>>> Furthermore -- how does this framework compare to Andrzej's resource tracking
> >>>> solution? I confess I have not had a chance yet to review yet but in light of
> >>>> this question it would be good to know if Andrzej's framework also requires
> >>>> deferred probe as similar concerns would exist there as well.
> >>> I have no idea what "framework" you are talking about here, do you have
> >>> a pointer to patches?
> >> I'm surprised given Andrzej did both Cc you on his patches [2] *and* chimed
> >> in on Rafael's patches to indicate that we likely can integrate PM concerns
> >> into his own "framework" [3]. There was no resolution to this discussion, however
> >> its not IMHO sufficient to brush off Andrzej's points in particular because
> >> Andrzej *is* indicating that his framework:
> > Dude, those patches were from 2014!  I can't remember patches people
> > sent to me a month ago...
> >
> >> - Eliminates deferred probe and resulting late_initcall(), consumer registers
> >> callbacks informing when given resources (clock, regulator, etc) becomes
> >> available
> >> - Properly handle resource disappearance (driver unbind, hotplug)
> >> - Track resources which are not vital to the device, but can influence behavior
> >> - Offers simplified resource allocation
> >> - Can be easily expanded to help with power management
> >>
> >> Granted I have not reviewed this yet but it at least was on my radar, and
> >> I do believe its worth reviewing this further given the generally expressed
> >> interest to see if we can have a common framework to address both ordering
> >> problems, suspend and probe. At a quick glance the "ghost provider" idea
> >> seems like a rather crazy idea but hey, there may be some goods in there.
> > >From what I remember, and I could be totally wrong, these patches were
> > way too complex and required that every subsystem change their
> > interfaces.  That's not going to work out well, but read the email
> > threads for the details...
> 
> I haven't seen your comment on my patches, except few general questions
> regarding one of earlier version of the framework.
> So maybe you are talking about different framework.
> 
> Regarding complexity, if the subsystem have simple way of
> '(un)publishing' resources it just adds single calls to restrack core:
> restrack_up, restrack_down in proper places.
> Additionally it adds quite simple stuff to encapsulate resource
> description and allocation routines into generic *_restrack_desc
> structure, see for example patch adding restrack to phy framework[1].

Ok, again, I have no idea what my response was to a 2 year-old patchset,
again, I can't remember my response to a patchset that was sent just a
month ago...

update it, and repost and we can all go from there if you think it is a
viable solution.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Nov. 10, 2016, 12:43 a.m. UTC | #14
On Mon, Nov 7, 2016 at 10:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Thu, Oct 27, 2016 at 05:25:51PM +0200, Greg Kroah-Hartman wrote:
>> On Wed, Oct 26, 2016 at 01:19:02PM +0200, Lukas Wunner wrote:
>> > Hi Rafael,
>> >
>> > sorry for not responding to v5 of your series earlier, just sending
>> > this out now in the hope that it reaches you before your travels.
>> >
>> > On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
>> > > - Modify device_links_check_suppliers(), device_links_driver_bound(),
>> > >   device_links_no_driver(), device_links_driver_cleanup(), device_links_busy(),
>> > >   and device_links_unbind_consumers() to walk link lists under device_links_lock
>> > >   (to make the new "driver presence tracking" mechanism work reliably).
>> >
>> > This change might increase boot time if drivers return -EPROBE_DEFER.
>>
>> "might"?  Please verify this before guessing....
>>
>> And don't make this more complex than needed before actually determining
>> a real issue.
>
> As clarified by Rafael at Plumbers, this functional dependencies
> framework assumes your driver / subsystem supports deferred probe,

It isn't particularly clear what you mean by "support" here.

I guess that you mean that it will allow the ->probe callback to be
invoked for multiple times for the same device/driver combination
without issues.  If that's the case, the way the new code uses
-EPROBE_DEFER doesn't interfere with this, because it will not invoke
the ->probe callbacks for consumers at all until their (required)
suppliers are ready.

> if it does not support its not clear what will happen....

I don't see any problems here, but if you see any, please just say
what they are.

> We have no explicit semantics to check if a driver / subsystem
> supports deferred probe.

That's correct, but then do we need it?

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Nov. 10, 2016, 12:59 a.m. UTC | #15
On Wed, Nov 9, 2016 at 4:43 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Mon, Nov 7, 2016 at 10:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> On Thu, Oct 27, 2016 at 05:25:51PM +0200, Greg Kroah-Hartman wrote:
>>> On Wed, Oct 26, 2016 at 01:19:02PM +0200, Lukas Wunner wrote:
>>> > Hi Rafael,
>>> >
>>> > sorry for not responding to v5 of your series earlier, just sending
>>> > this out now in the hope that it reaches you before your travels.
>>> >
>>> > On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
>>> > > - Modify device_links_check_suppliers(), device_links_driver_bound(),
>>> > >   device_links_no_driver(), device_links_driver_cleanup(), device_links_busy(),
>>> > >   and device_links_unbind_consumers() to walk link lists under device_links_lock
>>> > >   (to make the new "driver presence tracking" mechanism work reliably).
>>> >
>>> > This change might increase boot time if drivers return -EPROBE_DEFER.
>>>
>>> "might"?  Please verify this before guessing....
>>>
>>> And don't make this more complex than needed before actually determining
>>> a real issue.
>>
>> As clarified by Rafael at Plumbers, this functional dependencies
>> framework assumes your driver / subsystem supports deferred probe,
>
> It isn't particularly clear what you mean by "support" here.

I noted some folks had reported issues, and you acknowledged that if
deferred probe was used in some drivers and if this created an issue
the same issue would be seen with this framework. AFAICT there are two
possible issues to consider:

1) the one Geert Uytterhoeven noted. Again I'll note what he had mentioned [0].

  "Some drivers / subsystems don’t support deferred probe yet, such failures
   usually don’t blow up, but cause subtle malfunctioning. Example, an Ethernet
   phy could not get its interrupt as the primary IRQ chip had not been probed
   yet, it reverted to polling though. Sub-optimal." [0]

[0] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-August/003425.html

Geert can you provide more details?

2) Since deferred probe relies on late_initcall() if your driver must
load earlier than this deferred probe can create an issue. Andrzej had
you identified a driver that ran into this and had issues ? If not
this seems like a semantics thing we should consider in extending the
documentation for drivers so that driver writers are aware of this
limitation. I would suppose candidates for this would be anything not
using module_init() or late_initcall() on their inits and have a
probe.

>> if it does not support its not clear what will happen....
>
> I don't see any problems here, but if you see any, please just say
> what they are.
>
>> We have no explicit semantics to check if a driver / subsystem
>> supports deferred probe.
>
> That's correct, but then do we need it?

We can determine this by reviewing the two items above.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Nov. 10, 2016, 1:07 a.m. UTC | #16
On Mon, Nov 7, 2016 at 10:39 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Currently, there is a problem with taking functional dependencies
>> between devices into account.
>>
>> What I mean by a "functional dependency" is when the driver of device
>> B needs device A to be functional and (generally) its driver to be
>> present in order to work properly.  This has certain consequences
>> for power management (suspend/resume and runtime PM ordering) and
>> shutdown ordering of these devices.  In general, it also implies that
>> the driver of A needs to be working for B to be probed successfully
>> and it cannot be unbound from the device before the B's driver.
>>
>> Support for representing those functional dependencies between
>> devices is added here to allow the driver core to track them and act
>> on them in certain cases where applicable.
>>
>> The argument for doing that in the driver core is that there are
>> quite a few distinct use cases involving device dependencies, they
>> are relatively hard to get right in a driver (if one wants to
>> address all of them properly) and it only gets worse if multiplied
>> by the number of drivers potentially needing to do it.
>
> How many drivers actually *need this* today for suspend / resume ?

Admittedly, I don't have a list, but from discussions I had in the
past it looked like at least a dozen.

Plus, there is this pesky ACPI _DEP thing that in some cases is
actually correct and (before this patchest) there is no way to follow
it (a classical example here is when ACPI power management methods on
one device are implemented in terms of I2C accesses and require a
specific I2C controller to be present and functional for the PM of the
device in question to work, but there are other cases like that).

> How many of these are because of ACPI firmware bugs rather than
> some other reason ?

Two things here (as I said elsewhere).  There is the order in which
operations (eg. async suspend resume of devices during system-wide
state transitions) are started and there is the order in which they
are run/completed.  Even if the former is right, the latter still may
not be correct and there needs to be a generic way to make that
happen.

The "firmware bugs" above affect the first item and even if it is
entirely correct (ie. all PM operations for all devices are always
started in the right order, consumers before suppliers or the other
way around depending on whether this is suspend or resume), there
still is the second item to address.

Apart from this, even if the information provided by the firmware does
not lead to a device registration ordering that will produce the
correct dpm_list ordering (for example), it may still not be clear
whether or not that's a bug in the firmware.  The device registration
ordering itself is just not sufficient in general.

> Can ACPI somehow be used instead by these devices to address quirks?
>
>> Morever, at
>> least one case (asynchronous system suspend/resume) cannot be handled
>> in a single driver at all, because it requires the driver of A to
>> wait for B to suspend (during system suspend) and the driver of B to
>> wait for A to resume (during system resume).
>
> Why is this all of a sudden a new issue? It seems there is quite a bit of
> frameworks already out there that somehow deal with some sort of device
> ordering / dependencies, and I am curious if they've been addressing some of
> these problems for a while now on their own somehow with their own solutions,
> is that the case? For instance can DAPM the / DRM / Audio component framework
> v4l async solution be used or does it already address some of these concerns ?

None of them don't address all of the problems involved in a
sufficiently generic way to my knowledge.

>> For this reason, represent dependencies between devices as "links",
>> with the help of struct device_link objects each containing pointers
>> to the "linked" devices, a list node for each of them, status
>> information, flags, and an RCU head for synchronization.
>>
>> Also add two new list heads, representing the lists of links to the
>> devices that depend on the given one (consumers) and to the devices
>> depended on by it (suppliers), and a "driver presence status" field
>> (needed for figuring out initial states of device links) to struct
>> device.
>>
>> The entire data structure consisting of all of the lists of link
>> objects for all devices is protected by a mutex (for link object
>> addition/removal and for list walks during device driver probing
>> and removal) and by SRCU (for list walking in other case that will
>> be introduced by subsequent change sets).  In addition, each link
>> object has an internal status field whose value reflects whether or
>> not drivers are bound to the devices pointed to by the link or
>> probing/removal of their drivers is in progress etc.  That field
>> is only modified under the device links mutex, but it may be read
>> outside of it in some cases (introduced by subsequent change sets),
>> so modifications of it are annotated with WRITE_ONCE().
>>
>> New links are added by calling device_link_add() which takes three
>> arguments: pointers to the devices in question and flags.  In
>> particular, if DL_FLAG_STATELESS is set in the flags, the link status
>> is not to be taken into account for this link and the driver core
>> will not manage it.  In turn, if DL_FLAG_AUTOREMOVE is set in the
>> flags, the driver core will remove the link automatically when the
>> consumer device driver unbinds from it.
>>
>> One of the actions carried out by device_link_add() is to reorder
>> the lists used for device shutdown and system suspend/resume to
>> put the consumer device along with all of its children and all of
>> its consumers (and so on, recursively) to the ends of those lists
>> in order to ensure the right ordering between all of the supplier
>> and consumer devices.
>
> There's no explanation as to why this order is ensured to be
> correct, I think its important to document this.

Agreed.  There is a documentation work in progress (started by Lukas).
We will get to that point.

> From our discussions
> at Plumbers it seems the order is ensured due to the fact that order
> was already implicitly provided through platform firmware (ACPI
> enumeration is one), adjusting order on the dpm list is just shuffling
> order between consumer / provider, but nothing else.

That basically is the case.

> In theory it
> works because in the simple case this should suffice however I
> remain unconvinced that if we have more users of this framework this
> simple algorithm will suffice. Having a test driver or series of
> test drivers that shows this would be good. As it stands there is simply
> an assumption that this is correct, how *strongly* do you feel that
> the order will *always* be correct if this is done and no cycles
> can be added, even if tons of drivers start using this ?

Cycles cannot be added because of the way the code works.  It will
just return NULL from device_link_add() if it detects a cycle (that it
cannot deal with).

>> For this reason, it is not possible to create a link between two
>> devices if the would-be supplier device already depends on the
>> would-be consumer device as either a direct descendant of it or a
>> consumer of one of its direct descendants or one of its consumers
>> and so on.
>>
>> There are two types of link objects, persistent and non-persistent.
>> The persistent ones stay around until one of the target devices is
>> deleted, while the non-persistent ones are removed automatically when
>> the consumer driver unbinds from its device (ie. they are assumed to
>> be valid only as long as the consumer device has a driver bound to
>> it).  Persistent links are created by default and non-persistent
>> links are created when the DL_FLAG_AUTOREMOVE flag is passed
>> to device_link_add().
>>
>> Both persistent and non-persistent device links can be deleted
>> with an explicit call to device_link_del().
>>
>> Links created without the DL_FLAG_STATELESS flag set are managed
>> by the driver core using a simple state machine.  There are 5 states
>> each link can be in: DORMANT (unused), AVAILABLE (the supplier driver
>> is present and functional), CONSUMER_PROBE (the consumer driver is
>> probing), ACTIVE (both supplier and consumer drivers are present and
>> functional), and SUPPLIER_UNBIND (the supplier driver is unbinding).
>> The driver core updates the link state automatically depending on
>> what happens to the linked devices and for each link state specific
>> actions are taken in addition to that.
>>
>> For example, if the supplier driver unbinds from its device, the
>> driver core will also unbind the drivers of all of its consumers
>> automatically under the assumption that they cannot function
>> properly without the supplier.  Analogously, the driver core will
>> only allow the consumer driver to bind to its device if the
>> supplier driver is present and functional (ie. the link is in
>> the AVAILABLE state).  If that's not the case, it will rely on
>> the existing deferred probing mechanism to wait for the supplier
>> driver to become available.
>
> This assumes drivers and subsystems support deferred probe,

No, I don't think so.

Where does that assumption matter, exactly?

> is there a way to ensure that drivers that use this framework support
> it instead of possibly breaking them if they use this framework ?

The whole driver/device binding tracking thing is not mandatory in the
first place.  You can just tell the framework to ignore that part and
just use the links for PM and shutdown.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurent Pinchart Nov. 10, 2016, 7:05 a.m. UTC | #17
Hi Luis,

On Monday 07 Nov 2016 22:39:54 Luis R. Rodriguez wrote:
> On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > Currently, there is a problem with taking functional dependencies
> > between devices into account.
> > 
> > What I mean by a "functional dependency" is when the driver of device
> > B needs device A to be functional and (generally) its driver to be
> > present in order to work properly.  This has certain consequences
> > for power management (suspend/resume and runtime PM ordering) and
> > shutdown ordering of these devices.  In general, it also implies that
> > the driver of A needs to be working for B to be probed successfully
> > and it cannot be unbound from the device before the B's driver.
> > 
> > Support for representing those functional dependencies between
> > devices is added here to allow the driver core to track them and act
> > on them in certain cases where applicable.
> > 
> > The argument for doing that in the driver core is that there are
> > quite a few distinct use cases involving device dependencies, they
> > are relatively hard to get right in a driver (if one wants to
> > address all of them properly) and it only gets worse if multiplied
> > by the number of drivers potentially needing to do it.
> 
> How many drivers actually *need this* today for suspend / resume ?

I don't think there's a list, but just speaking for Renesas R-Car platforms 
such a dependency exists from all DMA bus masters to the system IOMMUs (we're 
talking about dozens of devices here), as well as from the display, video 
codec and video processing IP cores to transparent memory access IP cores that 
handle burst access and compression/decompression.

> How many of these are because of ACPI firmware bugs rather than
> some other reason ?

None from the list above, even with s/ACPI/DT/.

> Can ACPI somehow be used instead by these devices to address quirks?

Certainly not on ARM embedded platforms :-)

> > Morever, at least one case (asynchronous system suspend/resume) cannot be
> > handled in a single driver at all, because it requires the driver of A to
> > wait for B to suspend (during system suspend) and the driver of B to
> > wait for A to resume (during system resume).
> 
> Why is this all of a sudden a new issue?

It's not a new issue, we've just never addressed it properly so far. Various 
workarounds existed, with various level of success (usually more for runtime 
PM than system suspend/resume).

> It seems there is quite a bit of frameworks already out there that somehow
> deal with some sort of device ordering / dependencies, and I am curious if
> they've been addressing some of these problems for a while now on their own
> somehow with their own solutions, is that the case? For instance can DAPM
> the / DRM / Audio component framework v4l async solution be used or does it
> already address some of these concerns ?
>
> > For this reason, represent dependencies between devices as "links",
> > with the help of struct device_link objects each containing pointers
> > to the "linked" devices, a list node for each of them, status
> > information, flags, and an RCU head for synchronization.
> > 
> > Also add two new list heads, representing the lists of links to the
> > devices that depend on the given one (consumers) and to the devices
> > depended on by it (suppliers), and a "driver presence status" field
> > (needed for figuring out initial states of device links) to struct
> > device.
> > 
> > The entire data structure consisting of all of the lists of link
> > objects for all devices is protected by a mutex (for link object
> > addition/removal and for list walks during device driver probing
> > and removal) and by SRCU (for list walking in other case that will
> > be introduced by subsequent change sets).  In addition, each link
> > object has an internal status field whose value reflects whether or
> > not drivers are bound to the devices pointed to by the link or
> > probing/removal of their drivers is in progress etc.  That field
> > is only modified under the device links mutex, but it may be read
> > outside of it in some cases (introduced by subsequent change sets),
> > so modifications of it are annotated with WRITE_ONCE().
> > 
> > New links are added by calling device_link_add() which takes three
> > arguments: pointers to the devices in question and flags.  In
> > particular, if DL_FLAG_STATELESS is set in the flags, the link status
> > is not to be taken into account for this link and the driver core
> > will not manage it.  In turn, if DL_FLAG_AUTOREMOVE is set in the
> > flags, the driver core will remove the link automatically when the
> > consumer device driver unbinds from it.
> > 
> > One of the actions carried out by device_link_add() is to reorder
> > the lists used for device shutdown and system suspend/resume to
> > put the consumer device along with all of its children and all of
> > its consumers (and so on, recursively) to the ends of those lists
> > in order to ensure the right ordering between all of the supplier
> > and consumer devices.
> 
> There's no explanation as to why this order is ensured to be
> correct, I think its important to document this. From our discussions
> at Plumbers it seems the order is ensured due to the fact that order
> was already implicitly provided through platform firmware (ACPI
> enumeration is one), adjusting order on the dpm list is just shuffling
> order between consumer / provider, but nothing else. In theory it
> works because in the simple case this should suffice however I
> remain unconvinced that if we have more users of this framework this
> simple algorithm will suffice. Having a test driver or series of
> test drivers that shows this would be good. As it stands there is simply
> an assumption that this is correct, how *strongly* do you feel that
> the order will *always* be correct if this is done and no cycles
> can be added, even if tons of drivers start using this ?
> 
> > For this reason, it is not possible to create a link between two
> > devices if the would-be supplier device already depends on the
> > would-be consumer device as either a direct descendant of it or a
> > consumer of one of its direct descendants or one of its consumers
> > and so on.
> > 
> > There are two types of link objects, persistent and non-persistent.
> > The persistent ones stay around until one of the target devices is
> > deleted, while the non-persistent ones are removed automatically when
> > the consumer driver unbinds from its device (ie. they are assumed to
> > be valid only as long as the consumer device has a driver bound to
> > it).  Persistent links are created by default and non-persistent
> > links are created when the DL_FLAG_AUTOREMOVE flag is passed
> > to device_link_add().
> > 
> > Both persistent and non-persistent device links can be deleted
> > with an explicit call to device_link_del().
> > 
> > Links created without the DL_FLAG_STATELESS flag set are managed
> > by the driver core using a simple state machine.  There are 5 states
> > each link can be in: DORMANT (unused), AVAILABLE (the supplier driver
> > is present and functional), CONSUMER_PROBE (the consumer driver is
> > probing), ACTIVE (both supplier and consumer drivers are present and
> > functional), and SUPPLIER_UNBIND (the supplier driver is unbinding).
> > The driver core updates the link state automatically depending on
> > what happens to the linked devices and for each link state specific
> > actions are taken in addition to that.
> > 
> > For example, if the supplier driver unbinds from its device, the
> > driver core will also unbind the drivers of all of its consumers
> > automatically under the assumption that they cannot function
> > properly without the supplier.  Analogously, the driver core will
> > only allow the consumer driver to bind to its device if the
> > supplier driver is present and functional (ie. the link is in
> > the AVAILABLE state).  If that's not the case, it will rely on
> > the existing deferred probing mechanism to wait for the supplier
> > driver to become available.
> 
> This assumes drivers and subsystems support deferred probe, is
> there a way to ensure that drivers that use this framework support
> it instead of possibly breaking them if they use this framework ?
Laurent Pinchart Nov. 10, 2016, 7:14 a.m. UTC | #18
Hi Luis,

On Wednesday 09 Nov 2016 16:59:30 Luis R. Rodriguez wrote:
> On Wed, Nov 9, 2016 at 4:43 PM, Rafael J. Wysocki wrote:
> > On Mon, Nov 7, 2016 at 10:22 PM, Luis R. Rodriguez wrote:
> >> On Thu, Oct 27, 2016 at 05:25:51PM +0200, Greg Kroah-Hartman wrote:
> >>> On Wed, Oct 26, 2016 at 01:19:02PM +0200, Lukas Wunner wrote:
> >>>> Hi Rafael,
> >>>> 
> >>>> sorry for not responding to v5 of your series earlier, just sending
> >>>> this out now in the hope that it reaches you before your travels.
> >>>> 
> >>>> On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> >>>>> - Modify device_links_check_suppliers(),
> >>>>> device_links_driver_bound(),
> >>>>> 
> >>>>>   device_links_no_driver(), device_links_driver_cleanup(),
> >>>>>   device_links_busy(), and device_links_unbind_consumers() to walk
> >>>>>   link lists under device_links_lock (to make the new "driver
> >>>>>   presence tracking" mechanism work reliably).
> >>>>
> >>>> This change might increase boot time if drivers return -EPROBE_DEFER.
> >>> 
> >>> "might"?  Please verify this before guessing....
> >>> 
> >>> And don't make this more complex than needed before actually determining
> >>> a real issue.
> >> 
> >> As clarified by Rafael at Plumbers, this functional dependencies
> >> framework assumes your driver / subsystem supports deferred probe,
> > 
> > It isn't particularly clear what you mean by "support" here.
> 
> I noted some folks had reported issues, and you acknowledged that if
> deferred probe was used in some drivers and if this created an issue
> the same issue would be seen with this framework. AFAICT there are two
> possible issues to consider:
> 
> 1) the one Geert Uytterhoeven noted. Again I'll note what he had mentioned
> [0].
> 
>   "Some drivers / subsystems don’t support deferred probe yet, such failures
> usually don’t blow up, but cause subtle malfunctioning. Example, an
> Ethernet phy could not get its interrupt as the primary IRQ chip had not
> been probed yet, it reverted to polling though. Sub-optimal." [0]
> 
> [0]
> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-August/003
> 425.html
> 
> Geert can you provide more details?

This is a more global issue. In many cases drivers depend on optional 
resources. They are able to operate in a degraded mode (reduced feature set, 
reduced performances, ...) when those resources are not present. They can 
easily determine at probe time whether those resources are present, but have 
no way to know, in case they're absent, whether they will be present at some 
point in the near future (due to another driver probing the device providing 
the resource for instance) or if they will never be present (for instance 
because the required driver is missing). In the first case it would make sense 
to defer probe, in the latter case deferring probe forever for missing 
optional resources would prevent the device from being probed successfully at 
all.

The functional dependencies tracking patch series isn't meant to address this 
issue. I can imagine a framework that would notify drivers of optional 
resource availability after probe time, but it would come at a high cost for 
drivers as switching between modes of operation at runtime based on the 
availability of such resources would be way more complex than a mechanism 
based on probe deferral.

> 2) Since deferred probe relies on late_initcall() if your driver must
> load earlier than this deferred probe can create an issue. Andrzej had
> you identified a driver that ran into this and had issues ? If not
> this seems like a semantics thing we should consider in extending the
> documentation for drivers so that driver writers are aware of this
> limitation. I would suppose candidates for this would be anything not
> using module_init() or late_initcall() on their inits and have a
> probe.
> 
> >> if it does not support its not clear what will happen....
> > 
> > I don't see any problems here, but if you see any, please just say
> > what they are.
> > 
> >> We have no explicit semantics to check if a driver / subsystem
> >> supports deferred probe.
> > 
> > That's correct, but then do we need it?
> 
> We can determine this by reviewing the two items above.
Geert Uytterhoeven Nov. 10, 2016, 8:46 a.m. UTC | #19
On Thu, Nov 10, 2016 at 1:59 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Wed, Nov 9, 2016 at 4:43 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Mon, Nov 7, 2016 at 10:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>>> As clarified by Rafael at Plumbers, this functional dependencies
>>> framework assumes your driver / subsystem supports deferred probe,
>>
>> It isn't particularly clear what you mean by "support" here.
>
> I noted some folks had reported issues, and you acknowledged that if
> deferred probe was used in some drivers and if this created an issue
> the same issue would be seen with this framework. AFAICT there are two
> possible issues to consider:
>
> 1) the one Geert Uytterhoeven noted. Again I'll note what he had mentioned [0].
>
>   "Some drivers / subsystems don’t support deferred probe yet, such failures
>    usually don’t blow up, but cause subtle malfunctioning. Example, an Ethernet
>    phy could not get its interrupt as the primary IRQ chip had not been probed
>    yet, it reverted to polling though. Sub-optimal." [0]
>
> [0] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-August/003425.html
>
> Geert can you provide more details?

Issue reported in "of_mdiobus_register_phy() and deferred probe"
(http://lkml.iu.edu/hypermail/linux/kernel/1510.2/05770.html)

Key point is:
"However, of_mdiobus_register_phy() uses irq_of_parse_and_map(), which plainly
 ignores EPROBE_DEFER, and it just continues."

At that time, the PHY driver fell back to polling, but as of commit d5c3d8465
("net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS") that's no longer the
case, and now the PHY fails to work completely.

Workaround is "[PATCH v2] irqchip/renesas-irqc: Postpone driver initialization"
(https://www.spinics.net/lists/netdev/msg403325.html), which seems to have
sparked some interest in fixing the issue for good ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Nov. 10, 2016, 10:04 p.m. UTC | #20
On Thu, Nov 10, 2016 at 09:14:32AM +0200, Laurent Pinchart wrote:
> Hi Luis,
> 
> On Wednesday 09 Nov 2016 16:59:30 Luis R. Rodriguez wrote:
> > On Wed, Nov 9, 2016 at 4:43 PM, Rafael J. Wysocki wrote:
> > > On Mon, Nov 7, 2016 at 10:22 PM, Luis R. Rodriguez wrote:
> > >> On Thu, Oct 27, 2016 at 05:25:51PM +0200, Greg Kroah-Hartman wrote:
> > >>> On Wed, Oct 26, 2016 at 01:19:02PM +0200, Lukas Wunner wrote:
> > >>>> Hi Rafael,
> > >>>> 
> > >>>> sorry for not responding to v5 of your series earlier, just sending
> > >>>> this out now in the hope that it reaches you before your travels.
> > >>>> 
> > >>>> On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> > >>>>> - Modify device_links_check_suppliers(),
> > >>>>> device_links_driver_bound(),
> > >>>>> 
> > >>>>>   device_links_no_driver(), device_links_driver_cleanup(),
> > >>>>>   device_links_busy(), and device_links_unbind_consumers() to walk
> > >>>>>   link lists under device_links_lock (to make the new "driver
> > >>>>>   presence tracking" mechanism work reliably).
> > >>>>
> > >>>> This change might increase boot time if drivers return -EPROBE_DEFER.
> > >>> 
> > >>> "might"?  Please verify this before guessing....
> > >>> 
> > >>> And don't make this more complex than needed before actually determining
> > >>> a real issue.
> > >> 
> > >> As clarified by Rafael at Plumbers, this functional dependencies
> > >> framework assumes your driver / subsystem supports deferred probe,
> > > 
> > > It isn't particularly clear what you mean by "support" here.
> > 
> > I noted some folks had reported issues, and you acknowledged that if
> > deferred probe was used in some drivers and if this created an issue
> > the same issue would be seen with this framework. AFAICT there are two
> > possible issues to consider:
> > 
> > 1) the one Geert Uytterhoeven noted. Again I'll note what he had mentioned
> > [0].
> > 
> >   "Some drivers / subsystems don’t support deferred probe yet, such failures
> > usually don’t blow up, but cause subtle malfunctioning. Example, an
> > Ethernet phy could not get its interrupt as the primary IRQ chip had not
> > been probed yet, it reverted to polling though. Sub-optimal." [0]
> > 
> > [0]
> > https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-August/003
> > 425.html
> > 
> > Geert can you provide more details?
> 
> This is a more global issue. In many cases drivers depend on optional 
> resources. They are able to operate in a degraded mode (reduced feature set, 
> reduced performances, ...) when those resources are not present. They can 
> easily determine at probe time whether those resources are present, but have 
> no way to know, in case they're absent, whether they will be present at some 
> point in the near future (due to another driver probing the device providing 
> the resource for instance) or if they will never be present (for instance 
> because the required driver is missing).

I see thanks, so -EPROBE_DEFER assumes a late_initcall() would suffice to load
all necessary requirements.

> In the first case it would make sense  to defer probe,

So if the assumption is correct then it -EPROBE_DEFER should work.

> in the latter case deferring probe forever for missing 
> optional resources would prevent the device from being probed successfully at 
> all.

Right I see. And the driver core has no way to know what things *may* be
needed.

> The functional dependencies tracking patch series isn't meant to address this 
> issue.

Right, however it does track functional dependencies for suspend/run time PM
and we were certainly in hope this could help with probe ordering *later* in
the future. This is a separate topic. But more on point, the issue here is that
this framework relies on -EPROBE_DEFER -- so the issues discussed with it still
exist and should be properly documented.

> I can imagine a framework that would notify drivers of optional 
> resource availability after probe time, but it would come at a high cost for 
> drivers as switching between modes of operation at runtime based on the 
> availability of such resources would be way more complex than a mechanism 
> based on probe deferral.

Right, I see.

This is more forward looking, but -- if we had an annotation in Kconfig/turned
to a mod info section, or to start off with just a driver MODULE_SUGGESTS() macro
to start off with it might suffice for the driver core to request_module()
annotated dependencies, such requests could be explicitly suggested as
synchronous so init + probe do run together (as-is today), after which it
could know that all possible drivers that needed to be loaded should now be
loaded. If this sounds plausible to help, do we have drivers where we can
test this on? For instance, since the functional dependency framework
annotates functional dependencies for consumers/providers for suspend/resume
and un time PM could such MODULE_SUGGESTS() annotations be considered on the
consumers to suggest the provider drivers so their own probe yields to their
providers to try first ?

  Luis

> > 2) Since deferred probe relies on late_initcall() if your driver must
> > load earlier than this deferred probe can create an issue. Andrzej had
> > you identified a driver that ran into this and had issues ? If not
> > this seems like a semantics thing we should consider in extending the
> > documentation for drivers so that driver writers are aware of this
> > limitation. I would suppose candidates for this would be anything not
> > using module_init() or late_initcall() on their inits and have a
> > probe.
> > 
> > >> if it does not support its not clear what will happen....
> > > 
> > > I don't see any problems here, but if you see any, please just say
> > > what they are.
> > > 
> > >> We have no explicit semantics to check if a driver / subsystem
> > >> supports deferred probe.
> > > 
> > > That's correct, but then do we need it?
> > 
> > We can determine this by reviewing the two items above.
> 
> -- 
> Regards,
> 
> Laurent Pinchart
> 
>
Luis Chamberlain Nov. 10, 2016, 10:12 p.m. UTC | #21
On Thu, Nov 10, 2016 at 09:46:42AM +0100, Geert Uytterhoeven wrote:
> On Thu, Nov 10, 2016 at 1:59 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > On Wed, Nov 9, 2016 at 4:43 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> On Mon, Nov 7, 2016 at 10:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >>> As clarified by Rafael at Plumbers, this functional dependencies
> >>> framework assumes your driver / subsystem supports deferred probe,
> >>
> >> It isn't particularly clear what you mean by "support" here.
> >
> > I noted some folks had reported issues, and you acknowledged that if
> > deferred probe was used in some drivers and if this created an issue
> > the same issue would be seen with this framework. AFAICT there are two
> > possible issues to consider:
> >
> > 1) the one Geert Uytterhoeven noted. Again I'll note what he had mentioned [0].
> >
> >   "Some drivers / subsystems don’t support deferred probe yet, such failures
> >    usually don’t blow up, but cause subtle malfunctioning. Example, an Ethernet
> >    phy could not get its interrupt as the primary IRQ chip had not been probed
> >    yet, it reverted to polling though. Sub-optimal." [0]
> >
> > [0] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-August/003425.html
> >
> > Geert can you provide more details?
> 
> Issue reported in "of_mdiobus_register_phy() and deferred probe"
> (http://lkml.iu.edu/hypermail/linux/kernel/1510.2/05770.html)
> 
> Key point is:
> "However, of_mdiobus_register_phy() uses irq_of_parse_and_map(), which plainly
>  ignores EPROBE_DEFER, and it just continues."
> 
> At that time, the PHY driver fell back to polling, but as of commit d5c3d8465
> ("net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS") that's no longer the
> case, and now the PHY fails to work completely.
> 
> Workaround is "[PATCH v2] irqchip/renesas-irqc: Postpone driver initialization"
> (https://www.spinics.net/lists/netdev/msg403325.html), which seems to have
> sparked some interest in fixing the issue for good ;-)

Ah playing with init levels. You are lucky here that this suffices, the
IOMMU folks already ran out with enough levels to play with so they cannot
resolve their issue this way, for instance.

 Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Nov. 10, 2016, 10:40 p.m. UTC | #22
On Thu, Nov 10, 2016 at 11:04:07PM +0100, Luis R. Rodriguez wrote:
> This is more forward looking, but -- if we had an annotation in Kconfig/turned
> to a mod info section, or to start off with just a driver MODULE_SUGGESTS() macro
> to start off with it might suffice for the driver core to request_module()
> annotated dependencies, such requests could be explicitly suggested as
> synchronous so init + probe do run together (as-is today), after which it
> could know that all possible drivers that needed to be loaded should now be
> loaded. If this sounds plausible to help, do we have drivers where we can
> test this on? For instance, since the functional dependency framework
> annotates functional dependencies for consumers/providers for suspend/resume
> and un time PM could such MODULE_SUGGESTS() annotations be considered on the
> consumers to suggest the provider drivers so their own probe yields to their
> providers to try first ?

No.

Stop.

First off, the "driver core" NEVER can "know" if "all possible drivers
that should be loaded, are loaded.  That way lies madness and
impossibility.

Secondly, yet-another-section isn't going to help anything here, we
alredy "suggest" to userspace a bunch of stuff, so we get the needed
modules loaded, at sometime in the future, if they are around, and
userspace feels like it.  That's the best we can ever do.

Don't try to make this more difficult than it is please. DEFER works
today really really well, and it's really really simple.
Inter-dependancy of modules and devices connected to each other are two
different things, be careful about this.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Nov. 10, 2016, 11:09 p.m. UTC | #23
On Thu, Nov 10, 2016 at 09:05:30AM +0200, Laurent Pinchart wrote:
> Hi Luis,
> 
> On Monday 07 Nov 2016 22:39:54 Luis R. Rodriguez wrote:
> > On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> > > Currently, there is a problem with taking functional dependencies
> > > between devices into account.
> > > 
> > > What I mean by a "functional dependency" is when the driver of device
> > > B needs device A to be functional and (generally) its driver to be
> > > present in order to work properly.  This has certain consequences
> > > for power management (suspend/resume and runtime PM ordering) and
> > > shutdown ordering of these devices.  In general, it also implies that
> > > the driver of A needs to be working for B to be probed successfully
> > > and it cannot be unbound from the device before the B's driver.
> > > 
> > > Support for representing those functional dependencies between
> > > devices is added here to allow the driver core to track them and act
> > > on them in certain cases where applicable.
> > > 
> > > The argument for doing that in the driver core is that there are
> > > quite a few distinct use cases involving device dependencies, they
> > > are relatively hard to get right in a driver (if one wants to
> > > address all of them properly) and it only gets worse if multiplied
> > > by the number of drivers potentially needing to do it.
> > 
> > How many drivers actually *need this* today for suspend / resume ?
> 
> I don't think there's a list, but just speaking for Renesas R-Car platforms 
> such a dependency exists from all DMA bus masters to the system IOMMUs (we're 
> talking about dozens of devices here), as well as from the display, video 
> codec and video processing IP cores to transparent memory access IP cores that 
> handle burst access and compression/decompression.

This is very significant, thanks.

> > How many of these are because of ACPI firmware bugs rather than
> > some other reason ?
> 
> None from the list above, even with s/ACPI/DT/.
> 
> > Can ACPI somehow be used instead by these devices to address quirks?
> 
> Certainly not on ARM embedded platforms :-)

So this framework adds a generic way.

> > > Morever, at least one case (asynchronous system suspend/resume) cannot be
> > > handled in a single driver at all, because it requires the driver of A to
> > > wait for B to suspend (during system suspend) and the driver of B to
> > > wait for A to resume (during system resume).
> > 
> > Why is this all of a sudden a new issue?
> 
> It's not a new issue, we've just never addressed it properly so far. Various 
> workarounds existed, with various level of success (usually more for runtime 
> PM than system suspend/resume).

This helps a lot. I think there has at this point been enough effort to
try to inform as many folks who might have these workaround or their own
solution to voice their concerns, I tried to review some of the existing
solutions myself but its hard to get a quick grasp of them -- provided no one
else yells out I welcome then this change as a proper evolution.

I suppose the only last remaining concerns I had were aggregating on top of
deferred probe and the lack of a test driver to test complex topologies
using this framework to test order is never broken. For the first concern
it would seem that folks using this framework would actually test and ensure
device links works for their drivers, even if deferred probe is used. For
the later concern, a separate test driver could be added later.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurent Pinchart Nov. 11, 2016, 12:08 a.m. UTC | #24
Hi Greg,

On Thursday 10 Nov 2016 23:40:54 Greg Kroah-Hartman wrote:
> On Thu, Nov 10, 2016 at 11:04:07PM +0100, Luis R. Rodriguez wrote:
> > This is more forward looking, but -- if we had an annotation in
> > Kconfig/turned to a mod info section, or to start off with just a driver
> > MODULE_SUGGESTS() macro to start off with it might suffice for the driver
> > core to request_module() annotated dependencies, such requests could be
> > explicitly suggested as synchronous so init + probe do run together
> > (as-is today), after which it could know that all possible drivers that
> > needed to be loaded should now be loaded. If this sounds plausible to
> > help, do we have drivers where we can test this on? For instance, since
> > the functional dependency framework annotates functional dependencies for
> > consumers/providers for suspend/resume and un time PM could such
> > MODULE_SUGGESTS() annotations be considered on the consumers to suggest
> > the provider drivers so their own probe yields to their providers to try
> > first ?
> 
> No.
> 
> Stop.
> 
> First off, the "driver core" NEVER can "know" if "all possible drivers
> that should be loaded, are loaded.  That way lies madness and
> impossibility.
> 
> Secondly, yet-another-section isn't going to help anything here, we
> alredy "suggest" to userspace a bunch of stuff, so we get the needed
> modules loaded, at sometime in the future, if they are around, and
> userspace feels like it.  That's the best we can ever do.
> 
> Don't try to make this more difficult than it is please. DEFER works
> today really really well, and it's really really simple.
> Inter-dependancy of modules and devices connected to each other are two
> different things, be careful about this.

One issue we don't address today is handling of optional dependencies. A 
simple example is an SPI controller that can use a DMA engine or work in PIO 
mode. At probe time the driver will request a DMA channel if the platform 
(ACPI, DT, platform data) specifies that DMA is available. This can fail for 
various reasons, one of them being that the DMA engine driver hasn't probed 
the DMA device yet. In that case the SPI controller driver will continue in 
PIO mode, ignoring the DMA engine that will later be probed. We can't defer 
probing of the SPI controller as the DMA engine driver might never get loaded, 
which would result in the SPI controller probe being deferred forever.

One solution for this type of dependency issue would be to notify the SPI 
controller driver after probe that the DMA channel is now available. I'd like 
to avoid that though, as it would drastically increase the complexity of lots 
of drivers and create lots of race conditions.

There are certain configurations that we could possibly consider as invalid. 
For instance if the SPI controller driver is built-in and the DMA engine 
driver built as a module, the user clearly shot themselves in the foot and the 
kernel can't be blamed.

For resources that can't be built as a module (IOMMUs for instance) we thus 
only have to consider the case where both drivers are built-in, as the 
resource built-in and consumer as a module should work properly from an 
ordering point of view (at least as long as we don't allow asynchronous 
probing of built-in drivers to be delayed enough for modules to be loaded...). 
In this case, if the resource driver isn't available when the consumer is 
probed, if will never be available at the consumer can safely proceed in a 
degraded mode. We would thus only need to solve the probe ordering issue.

I'm not sure how far these simple(r) solutions that consider certain cases as 
invalid would scale though, and whether we won't need a more generic solution 
at some point anyway.
Greg Kroah-Hartman Nov. 13, 2016, 10:59 a.m. UTC | #25
On Fri, Nov 11, 2016 at 02:08:35AM +0200, Laurent Pinchart wrote:
> Hi Greg,
> 
> On Thursday 10 Nov 2016 23:40:54 Greg Kroah-Hartman wrote:
> > On Thu, Nov 10, 2016 at 11:04:07PM +0100, Luis R. Rodriguez wrote:
> > > This is more forward looking, but -- if we had an annotation in
> > > Kconfig/turned to a mod info section, or to start off with just a driver
> > > MODULE_SUGGESTS() macro to start off with it might suffice for the driver
> > > core to request_module() annotated dependencies, such requests could be
> > > explicitly suggested as synchronous so init + probe do run together
> > > (as-is today), after which it could know that all possible drivers that
> > > needed to be loaded should now be loaded. If this sounds plausible to
> > > help, do we have drivers where we can test this on? For instance, since
> > > the functional dependency framework annotates functional dependencies for
> > > consumers/providers for suspend/resume and un time PM could such
> > > MODULE_SUGGESTS() annotations be considered on the consumers to suggest
> > > the provider drivers so their own probe yields to their providers to try
> > > first ?
> > 
> > No.
> > 
> > Stop.
> > 
> > First off, the "driver core" NEVER can "know" if "all possible drivers
> > that should be loaded, are loaded.  That way lies madness and
> > impossibility.
> > 
> > Secondly, yet-another-section isn't going to help anything here, we
> > alredy "suggest" to userspace a bunch of stuff, so we get the needed
> > modules loaded, at sometime in the future, if they are around, and
> > userspace feels like it.  That's the best we can ever do.
> > 
> > Don't try to make this more difficult than it is please. DEFER works
> > today really really well, and it's really really simple.
> > Inter-dependancy of modules and devices connected to each other are two
> > different things, be careful about this.
> 
> One issue we don't address today is handling of optional dependencies. A 
> simple example is an SPI controller that can use a DMA engine or work in PIO 
> mode. At probe time the driver will request a DMA channel if the platform 
> (ACPI, DT, platform data) specifies that DMA is available. This can fail for 
> various reasons, one of them being that the DMA engine driver hasn't probed 
> the DMA device yet. In that case the SPI controller driver will continue in 
> PIO mode, ignoring the DMA engine that will later be probed. We can't defer 
> probing of the SPI controller as the DMA engine driver might never get loaded, 
> which would result in the SPI controller probe being deferred forever.
> 
> One solution for this type of dependency issue would be to notify the SPI 
> controller driver after probe that the DMA channel is now available. I'd like 
> to avoid that though, as it would drastically increase the complexity of lots 
> of drivers and create lots of race conditions.
> 
> There are certain configurations that we could possibly consider as invalid. 
> For instance if the SPI controller driver is built-in and the DMA engine 
> driver built as a module, the user clearly shot themselves in the foot and the 
> kernel can't be blamed.
> 
> For resources that can't be built as a module (IOMMUs for instance) we thus 
> only have to consider the case where both drivers are built-in, as the 
> resource built-in and consumer as a module should work properly from an 
> ordering point of view (at least as long as we don't allow asynchronous 
> probing of built-in drivers to be delayed enough for modules to be loaded...). 
> In this case, if the resource driver isn't available when the consumer is 
> probed, if will never be available at the consumer can safely proceed in a 
> degraded mode. We would thus only need to solve the probe ordering issue.
> 
> I'm not sure how far these simple(r) solutions that consider certain cases as 
> invalid would scale though, and whether we won't need a more generic solution 
> at some point anyway.

I would love to see a generic solution that works for all of these
complex cases, as I agree with you, it's complex :)

But I have yet to see any such patches that implement this.  As always,
I am very glad to review anything that people create, but I don't have
the time to work on such a solution myself at the moment.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Nov. 13, 2016, 4:58 p.m. UTC | #26
On Tue, Nov 08, 2016 at 09:58:24PM +0100, Luis R. Rodriguez wrote:
> On Tue, Nov 08, 2016 at 08:43:35PM +0100, Greg Kroah-Hartman wrote:
> > Yes, you can iterate a
> > lot of times, but that's fine, we have time at boot to do that (and
> > really, it is fast.)
> 
> Deferred probe is left for late_initcall() [1] -- this *assumes* that the
> driver/subsystem can be loaded so late, and as per Andrzej this solution
> isunacceptable/undesirable. So it would be unfair and incorrect to categorize
> all drivers in the same boat, in fact given this lone fact it may be
> worth revisiting the idea I mentioned about a capability aspect to support
> a late deferred probe later. We tend to assume its fine, however it does
> not seem to be the case.
> 
> [1] drivers/base/dd.c:late_initcall(deferred_probe_initcall);

Note that driver_deferred_probe_trigger() is called not only from this
late_initcall but also from driver_bound().  In other words, whenever
a driver binds successfully, deferred devices will reprobe, and devices
which had to defer will not necessarily have to wait until late_initcall
until they're reprobed.  The late_initcall is just a way to tell devices
"this is last call".

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Nov. 13, 2016, 5:34 p.m. UTC | #27
On Mon, Nov 07, 2016 at 10:39:54PM +0100, Luis R. Rodriguez wrote:
> On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> > One of the actions carried out by device_link_add() is to reorder
> > the lists used for device shutdown and system suspend/resume to
> > put the consumer device along with all of its children and all of
> > its consumers (and so on, recursively) to the ends of those lists
> > in order to ensure the right ordering between all of the supplier
> > and consumer devices.
> 
> There's no explanation as to why this order is ensured to be
> correct, I think its important to document this. From our discussions
> at Plumbers it seems the order is ensured due to the fact that order
> was already implicitly provided through platform firmware (ACPI
> enumeration is one), adjusting order on the dpm list is just shuffling
> order between consumer / provider, but nothing else.

ACPI specifies a hierarchy and the order on the dpm_list and
devices_kset is such that children are behind their parent.

A device link specifies a dependency that exists in addition
to the hierarchy, hence consumers need to be moved behind
their supplier.  And not only the consumers themselves but
also recursively their children and consumers. Essentially
the entire subtree is moved to the back.  That happens in
device_reorder_to_tail() in patch 2.

If another device is enumerated which acts as a supplier to
an existing other supplier, that other supplier and all its
dependents are moved behind the newly enumerated device,
and so on.

That is provably correct so long as no loops are introduced
in the dependency graph.  That is checked by device_is_dependent(),
which is called from device_link_add(), and the addition of the
link is aborted if a loop is detected.

Best regards,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Geert Uytterhoeven Nov. 14, 2016, 8:15 a.m. UTC | #28
Hi Laurent,

On Fri, Nov 11, 2016 at 1:08 AM, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
> On Thursday 10 Nov 2016 23:40:54 Greg Kroah-Hartman wrote:
>> On Thu, Nov 10, 2016 at 11:04:07PM +0100, Luis R. Rodriguez wrote:
>> Don't try to make this more difficult than it is please. DEFER works
>> today really really well, and it's really really simple.
>> Inter-dependancy of modules and devices connected to each other are two
>> different things, be careful about this.
>
> One issue we don't address today is handling of optional dependencies. A
> simple example is an SPI controller that can use a DMA engine or work in PIO
> mode. At probe time the driver will request a DMA channel if the platform
> (ACPI, DT, platform data) specifies that DMA is available. This can fail for
> various reasons, one of them being that the DMA engine driver hasn't probed
> the DMA device yet. In that case the SPI controller driver will continue in
> PIO mode, ignoring the DMA engine that will later be probed. We can't defer
> probing of the SPI controller as the DMA engine driver might never get loaded,
> which would result in the SPI controller probe being deferred forever.
>
> One solution for this type of dependency issue would be to notify the SPI
> controller driver after probe that the DMA channel is now available. I'd like
> to avoid that though, as it would drastically increase the complexity of lots
> of drivers and create lots of race conditions.

Alternatively, the driver can request optional DMA resources at open time
instead of at probe time.
E.g. sh_sci does that in its uart_ops.startup() callback.

Regardless of that, DMA can fail temporarily for other reasons
(e.g. running out of channels).

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Nov. 14, 2016, 1:48 p.m. UTC | #29
On Sun, Nov 13, 2016 at 06:34:13PM +0100, Lukas Wunner wrote:
> On Mon, Nov 07, 2016 at 10:39:54PM +0100, Luis R. Rodriguez wrote:
> > On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> > > One of the actions carried out by device_link_add() is to reorder
> > > the lists used for device shutdown and system suspend/resume to
> > > put the consumer device along with all of its children and all of
> > > its consumers (and so on, recursively) to the ends of those lists
> > > in order to ensure the right ordering between all of the supplier
> > > and consumer devices.
> > 
> > There's no explanation as to why this order is ensured to be
> > correct, I think its important to document this. From our discussions
> > at Plumbers it seems the order is ensured due to the fact that order
> > was already implicitly provided through platform firmware (ACPI
> > enumeration is one), adjusting order on the dpm list is just shuffling
> > order between consumer / provider, but nothing else.
> 
> ACPI specifies a hierarchy and the order on the dpm_list and
> devices_kset is such that children are behind their parent.
> 
> A device link specifies a dependency that exists in addition
> to the hierarchy, hence consumers need to be moved behind
> their supplier.  And not only the consumers themselves but
> also recursively their children and consumers. Essentially
> the entire subtree is moved to the back.  That happens in
> device_reorder_to_tail() in patch 2.

Ah neat, I failed to notice this full subtree tree move, its
rather important.

> If another device is enumerated which acts as a supplier to
> an existing other supplier, that other supplier and all its
> dependents are moved behind the newly enumerated device,
> and so on.
> 
> That is probably correct so long as no loops are introduced
> in the dependency graph.

"Probably" is what concerns me, there is no formality about
the correctness of this.

> That is checked by device_is_dependent(),
> which is called from device_link_add(), and the addition of the
> link is aborted if a loop is detected.

And that is sufficient ?

 Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Nov. 14, 2016, 2:50 p.m. UTC | #30
On Sun, Nov 13, 2016 at 11:59:42AM +0100, Greg Kroah-Hartman wrote:
> On Fri, Nov 11, 2016 at 02:08:35AM +0200, Laurent Pinchart wrote:
> > Hi Greg,
> > 
> > On Thursday 10 Nov 2016 23:40:54 Greg Kroah-Hartman wrote:
> > > On Thu, Nov 10, 2016 at 11:04:07PM +0100, Luis R. Rodriguez wrote:
> > > > This is more forward looking, but -- if we had an annotation in
> > > > Kconfig/turned to a mod info section, or to start off with just a driver
> > > > MODULE_SUGGESTS() macro to start off with it might suffice for the driver
> > > > core to request_module() annotated dependencies, such requests could be
> > > > explicitly suggested as synchronous so init + probe do run together
> > > > (as-is today), after which it could know that all possible drivers that
> > > > needed to be loaded should now be loaded. If this sounds plausible to
> > > > help, do we have drivers where we can test this on? For instance, since
> > > > the functional dependency framework annotates functional dependencies for
> > > > consumers/providers for suspend/resume and un time PM could such
> > > > MODULE_SUGGESTS() annotations be considered on the consumers to suggest
> > > > the provider drivers so their own probe yields to their providers to try
> > > > first ?
> > > 
> > > No.
> > > 
> > > Stop.
> > > 
> > > First off, the "driver core" NEVER can "know" if "all possible drivers
> > > that should be loaded, are loaded.  That way lies madness and
> > > impossibility.

At first I had discarded the generic driver problem as a slightly unrelated
topic however its clear now its not. In terms of functional dependencies I
agree that not providing strict order is *sometimes* desirable to help with
simplicity. The generic driver problem can be described graph-wise: on a DAG,
we're considering a topology with where nodes have optional superior
alternatives, and what you seem to be advocating is a transition to an
alternative is better and much simpler than forcing order from the start. That
can only work so long as some intermediary dependencies are (for lack of a
better term) soft-nodes -- where transitions to better alternatives are
possible and can such transitions can be handled in software. You may have
(again for lack of a better term) hard-nodes though where an entry in the DAG
is required as a hard requirement immediately prior to letting another entry
proceed. An example here is the x86 IOMMU drivers and dependent GPU DRM
drivers, currently only link order asserts proper order given we have run out
of semantics in between to ensure proper order is correct.

> > > Secondly, yet-another-section isn't going to help anything here, we
> > > alredy "suggest" to userspace a bunch of stuff, so we get the needed
> > > modules loaded, at sometime in the future, if they are around, and
> > > userspace feels like it.  That's the best we can ever do.

For some cases this is sufficient, for hard-nodes (term used above) though
if you get the incorrect order you may in the worst case oops.

> > > Don't try to make this more difficult than it is please. DEFER works
> > > today really really well, and it's really really simple.

It seems many disagree. What is clear is its simplicity outweighs the
complexity by alternatives which have been historically considered. This is
reasonable. Part of the reason probe ordering, as an optimization
consideration, came up while function dependencies for runtime PM and suspend
are being discussed is as we've determined this is a related problem and
at least for hard-nodes this is critical to resolve. For now we have enough
tools to work around problems for hard-nodes, but it would be silly for us
not to start thinking about ways to improve upon this for the future.

> > > Inter-dependancy of modules and devices connected to each other are two
> > > different things, be careful about this.

This is a *very* fair warning :)

> > One issue we don't address today is handling of optional dependencies. A 
> > simple example is an SPI controller that can use a DMA engine or work in PIO 
> > mode. At probe time the driver will request a DMA channel if the platform 
> > (ACPI, DT, platform data) specifies that DMA is available. This can fail for 
> > various reasons, one of them being that the DMA engine driver hasn't probed 
> > the DMA device yet. In that case the SPI controller driver will continue in 
> > PIO mode, ignoring the DMA engine that will later be probed. We can't defer 
> > probing of the SPI controller as the DMA engine driver might never get loaded, 
> > which would result in the SPI controller probe being deferred forever.
> > 
> > One solution for this type of dependency issue would be to notify the SPI 
> > controller driver after probe that the DMA channel is now available. I'd like 
> > to avoid that though, as it would drastically increase the complexity of lots 
> > of drivers and create lots of race conditions.
> > 
> > There are certain configurations that we could possibly consider as invalid. 
> > For instance if the SPI controller driver is built-in and the DMA engine 
> > driver built as a module, the user clearly shot themselves in the foot and the 
> > kernel can't be blamed.
> > 
> > For resources that can't be built as a module (IOMMUs for instance) we thus 
> > only have to consider the case where both drivers are built-in, as the 
> > resource built-in and consumer as a module should work properly from an 
> > ordering point of view (at least as long as we don't allow asynchronous 
> > probing of built-in drivers to be delayed enough for modules to be loaded...). 
> > In this case, if the resource driver isn't available when the consumer is 
> > probed, if will never be available at the consumer can safely proceed in a 
> > degraded mode. We would thus only need to solve the probe ordering issue.
> > 
> > I'm not sure how far these simple(r) solutions that consider certain cases as 
> > invalid would scale though, and whether we won't need a more generic solution 
> > at some point anyway.
> 
> I would love to see a generic solution that works for all of these
> complex cases, as I agree with you, it's complex :)
> 
> But I have yet to see any such patches that implement this.

The generic driver topic is related but it certainly only part of the
picture. It seems there were enough folks interested in that topic though
so perhaps patches will be eventually produced for it.

> As always,
> I am very glad to review anything that people create, but I don't have
> the time to work on such a solution myself at the moment.

Part of what we tried to discuss during the complex dependencies topics at
Plumbers was evaluating if some of the existing solutions for run time PM and
suspend could help with probe ordering, it seems we had agreement on it, what
we found though was that for many cases the use of struct device for link
association is too late. Alternatives mechanisms will be considered in the
future, and it seems that one path forward will be to consider expanding upon
this simple functional device dependency framework.

So let's see more patches!

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Nov. 14, 2016, 3:48 p.m. UTC | #31
On Mon, Nov 14, 2016 at 02:48:32PM +0100, Luis R. Rodriguez wrote:
> On Sun, Nov 13, 2016 at 06:34:13PM +0100, Lukas Wunner wrote:
> > On Mon, Nov 07, 2016 at 10:39:54PM +0100, Luis R. Rodriguez wrote:
> > > On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
> > > > One of the actions carried out by device_link_add() is to reorder
> > > > the lists used for device shutdown and system suspend/resume to
> > > > put the consumer device along with all of its children and all of
> > > > its consumers (and so on, recursively) to the ends of those lists
> > > > in order to ensure the right ordering between all of the supplier
> > > > and consumer devices.
> > > 
> > > There's no explanation as to why this order is ensured to be
> > > correct, I think its important to document this. From our discussions
> > > at Plumbers it seems the order is ensured due to the fact that order
> > > was already implicitly provided through platform firmware (ACPI
> > > enumeration is one), adjusting order on the dpm list is just shuffling
> > > order between consumer / provider, but nothing else.
> > 
> > ACPI specifies a hierarchy and the order on the dpm_list and
> > devices_kset is such that children are behind their parent.
> > 
> > A device link specifies a dependency that exists in addition
> > to the hierarchy, hence consumers need to be moved behind
> > their supplier.  And not only the consumers themselves but
> > also recursively their children and consumers. Essentially
> > the entire subtree is moved to the back.  That happens in
> > device_reorder_to_tail() in patch 2.
> 
> Ah neat, I failed to notice this full subtree tree move, its
> rather important.
> 
> > If another device is enumerated which acts as a supplier to
> > an existing other supplier, that other supplier and all its
> > dependents are moved behind the newly enumerated device,
> > and so on.
> > 
> > That is probably correct so long as no loops are introduced
> > in the dependency graph.
> 
> "Probably" is what concerns me, there is no formality about
> the correctness of this.

It's a typo, I meant to say "provably correct". Sorry.

Quite a difference in meaning. :-)


> > That is checked by device_is_dependent(),
> > which is called from device_link_add(), and the addition of the
> > link is aborted if a loop is detected.
> 
> And that is sufficient ?

The device links turn the device tree into a directed acyclic graph.
For the dpm_list and devices_kset, that graph is flattened into a
one-dimensional form such that all ancestors and suppliers of a
device appear in front of that device in the lists.  I'm not a
graph theorist and can't provide a formal proof.  I think Rafael
is a Dr., maybe he can do it. :-)  I merely looked at this from a
practical point of view, i.e. I tried to come up with corner cases
where dependencies are added that would result in incorrect ordering,
and concluded that I couldn't find any.

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Nov. 14, 2016, 4 p.m. UTC | #32
On Mon, Nov 14, 2016 at 7:48 AM, Lukas Wunner <lukas@wunner.de> wrote:
> On Mon, Nov 14, 2016 at 02:48:32PM +0100, Luis R. Rodriguez wrote:
>> On Sun, Nov 13, 2016 at 06:34:13PM +0100, Lukas Wunner wrote:
>> > On Mon, Nov 07, 2016 at 10:39:54PM +0100, Luis R. Rodriguez wrote:
>> > > On Mon, Oct 10, 2016 at 02:51:04PM +0200, Rafael J. Wysocki wrote:
>> > > > One of the actions carried out by device_link_add() is to reorder
>> > > > the lists used for device shutdown and system suspend/resume to
>> > > > put the consumer device along with all of its children and all of
>> > > > its consumers (and so on, recursively) to the ends of those lists
>> > > > in order to ensure the right ordering between all of the supplier
>> > > > and consumer devices.
>> > >
>> > > There's no explanation as to why this order is ensured to be
>> > > correct, I think its important to document this. From our discussions
>> > > at Plumbers it seems the order is ensured due to the fact that order
>> > > was already implicitly provided through platform firmware (ACPI
>> > > enumeration is one), adjusting order on the dpm list is just shuffling
>> > > order between consumer / provider, but nothing else.
>> >
>> > ACPI specifies a hierarchy and the order on the dpm_list and
>> > devices_kset is such that children are behind their parent.
>> >
>> > A device link specifies a dependency that exists in addition
>> > to the hierarchy, hence consumers need to be moved behind
>> > their supplier.  And not only the consumers themselves but
>> > also recursively their children and consumers. Essentially
>> > the entire subtree is moved to the back.  That happens in
>> > device_reorder_to_tail() in patch 2.
>>
>> Ah neat, I failed to notice this full subtree tree move, its
>> rather important.
>>
>> > If another device is enumerated which acts as a supplier to
>> > an existing other supplier, that other supplier and all its
>> > dependents are moved behind the newly enumerated device,
>> > and so on.
>> >
>> > That is probably correct so long as no loops are introduced
>> > in the dependency graph.
>>
>> "Probably" is what concerns me, there is no formality about
>> the correctness of this.
>
> It's a typo, I meant to say "provably correct". Sorry.
>
> Quite a difference in meaning. :-)

No sorry my own mistake -- you had written provably but I thought you
had a typo, clearly you meant as you typed :)

If the trees are independent then yes, I can see this working.

>> > That is checked by device_is_dependent(),
>> > which is called from device_link_add(), and the addition of the
>> > link is aborted if a loop is detected.
>>
>> And that is sufficient ?
>
> The device links turn the device tree into a directed acyclic graph.
> For the dpm_list and devices_kset, that graph is flattened into a
> one-dimensional form such that all ancestors and suppliers of a
> device appear in front of that device in the lists.  I'm not a
> graph theorist and can't provide a formal proof.  I think Rafael
> is a Dr., maybe he can do it. :-)

If you traverse the DAG you can get a linear one-dimensional
representation of the dependencies -- I'm no expert on this either,
however I'm buying what you described above then, provided we do
indeed avoid cycles.

> I merely looked at this from a
> practical point of view, i.e. I tried to come up with corner cases
> where dependencies are added that would result in incorrect ordering,
> and concluded that I couldn't find any.

Fair enough, likewise. I think the takeaway from this discussion is a
test driver trying to break this might be good to have, but also
documenting how this works in practice and how we avoid the cycles is
important. This was not very clear from the patches.

 Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-pm/drivers/base/base.h
===================================================================
--- linux-pm.orig/drivers/base/base.h
+++ linux-pm/drivers/base/base.h
@@ -107,6 +107,9 @@  extern void bus_remove_device(struct dev
 
 extern int bus_add_driver(struct device_driver *drv);
 extern void bus_remove_driver(struct device_driver *drv);
+extern void device_release_driver_internal(struct device *dev,
+					   struct device_driver *drv,
+					   struct device *parent);
 
 extern void driver_detach(struct device_driver *drv);
 extern int driver_probe_device(struct device_driver *drv, struct device *dev);
@@ -152,3 +155,13 @@  extern int devtmpfs_init(void);
 #else
 static inline int devtmpfs_init(void) { return 0; }
 #endif
+
+/* Device links support */
+extern int device_links_read_lock(void);
+extern void device_links_read_unlock(int idx);
+extern int device_links_check_suppliers(struct device *dev);
+extern void device_links_driver_bound(struct device *dev);
+extern void device_links_driver_cleanup(struct device *dev);
+extern void device_links_no_driver(struct device *dev);
+extern bool device_links_busy(struct device *dev);
+extern void device_links_unbind_consumers(struct device *dev);
Index: linux-pm/drivers/base/core.c
===================================================================
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -44,6 +44,492 @@  static int __init sysfs_deprecated_setup
 early_param("sysfs.deprecated", sysfs_deprecated_setup);
 #endif
 
+/* Device links support. */
+
+static DEFINE_MUTEX(device_links_lock);
+DEFINE_STATIC_SRCU(device_links_srcu);
+
+int device_links_read_lock(void)
+{
+	return srcu_read_lock(&device_links_srcu);
+}
+
+void device_links_read_unlock(int idx)
+{
+	return srcu_read_unlock(&device_links_srcu, idx);
+}
+
+/**
+ * device_is_dependent - Check if one device depends on another one
+ * @dev: Device to check dependencies for.
+ * @target: Device to check against.
+ *
+ * Check if @target depends on @dev or any device dependent on it (its child or
+ * its consumer etc).  Return 1 if that is the case or 0 otherwise.
+ */
+static int device_is_dependent(struct device *dev, void *target)
+{
+	struct device_link *link;
+	int ret;
+
+	if (WARN_ON(dev == target))
+		return 1;
+
+	ret = device_for_each_child(dev, target, device_is_dependent);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(link, &dev->links.consumers, s_node) {
+		if (WARN_ON(link->consumer == target))
+			return 1;
+
+		ret = device_is_dependent(link->consumer, target);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
+static int device_reorder_to_tail(struct device *dev, void *not_used)
+{
+	struct device_link *link;
+
+	/*
+	 * Devices that have not been registered yet will be put to the ends
+	 * of the lists during the registration, so skip them here.
+	 */
+	if (device_is_registered(dev))
+		devices_kset_move_last(dev);
+
+	if (device_pm_initialized(dev))
+		device_pm_move_last(dev);
+
+	device_for_each_child(dev, NULL, device_reorder_to_tail);
+	list_for_each_entry(link, &dev->links.consumers, s_node)
+		device_reorder_to_tail(link->consumer, NULL);
+
+	return 0;
+}
+
+/**
+ * device_link_add - Create a link between two devices.
+ * @consumer: Consumer end of the link.
+ * @supplier: Supplier end of the link.
+ * @flags: Link flags.
+ *
+ * If the DL_FLAG_AUTOREMOVE is set, the link will be removed automatically
+ * when the consumer device driver unbinds from it.  The combination of both
+ * DL_FLAG_AUTOREMOVE and DL_FLAG_STATELESS set is invalid and will cause NULL
+ * to be returned.
+ *
+ * A side effect of the link creation is re-ordering of dpm_list and the
+ * devices_kset list by moving the consumer device and all devices depending
+ * on it to the ends of these lists (that does not happen to devices that have
+ * not been registered when this function is called).
+ *
+ * The supplier device is required to be registered when this function is called
+ * and NULL will be returned if that is not the case.  The consumer device need
+ * not be registerd, however.
+ */
+struct device_link *device_link_add(struct device *consumer,
+				    struct device *supplier, u32 flags)
+{
+	struct device_link *link;
+
+	if (!consumer || !supplier ||
+	    ((flags & DL_FLAG_STATELESS) && (flags & DL_FLAG_AUTOREMOVE)))
+		return NULL;
+
+	mutex_lock(&device_links_lock);
+	device_pm_lock();
+
+	/*
+	 * If the supplier has not been fully registered yet or there is a
+	 * reverse dependency between the consumer and the supplier already in
+	 * the graph, return NULL.
+	 */
+	if (!device_pm_initialized(supplier)
+	    || device_is_dependent(consumer, supplier)) {
+		link = NULL;
+		goto out;
+	}
+
+	list_for_each_entry(link, &supplier->links.consumers, s_node)
+		if (link->consumer == consumer)
+			goto out;
+
+	link = kmalloc(sizeof(*link), GFP_KERNEL);
+	if (!link)
+		goto out;
+
+	get_device(supplier);
+	link->supplier = supplier;
+	INIT_LIST_HEAD(&link->s_node);
+	get_device(consumer);
+	link->consumer = consumer;
+	INIT_LIST_HEAD(&link->c_node);
+	link->flags = flags;
+
+	/* Deterine the initial link state. */
+	if (flags & DL_FLAG_STATELESS) {
+		link->status = DL_STATE_NONE;
+	} else {
+		switch (supplier->links.status) {
+		case DL_DEV_DRIVER_BOUND:
+			switch (consumer->links.status) {
+			case DL_DEV_PROBING:
+				link->status = DL_STATE_CONSUMER_PROBE;
+				break;
+			case DL_DEV_DRIVER_BOUND:
+				link->status = DL_STATE_ACTIVE;
+				break;
+			default:
+				link->status = DL_STATE_AVAILABLE;
+				break;
+			}
+			break;
+		case DL_DEV_UNBINDING:
+			link->status = DL_STATE_SUPPLIER_UNBIND;
+			break;
+		default:
+			link->status = DL_STATE_DORMANT;
+			break;
+		}
+	}
+
+	/*
+	 * Move the consumer and all of the devices depending on it to the end
+	 * of dpm_list and the devices_kset list.
+	 *
+	 * It is necessary to hold dpm_list locked throughout all that or else
+	 * we may end up suspending with a wrong ordering of it.
+	 */
+	device_reorder_to_tail(consumer, NULL);
+
+	list_add_tail_rcu(&link->s_node, &supplier->links.consumers);
+	list_add_tail_rcu(&link->c_node, &consumer->links.suppliers);
+
+	dev_info(consumer, "Linked as a consumer to %s\n", dev_name(supplier));
+
+ out:
+	device_pm_unlock();
+	mutex_unlock(&device_links_lock);
+	return link;
+}
+EXPORT_SYMBOL_GPL(device_link_add);
+
+static void __device_link_free_srcu(struct rcu_head *rhead)
+{
+	struct device_link *link;
+
+	link = container_of(rhead, struct device_link, rcu_head);
+	put_device(link->consumer);
+	put_device(link->supplier);
+	kfree(link);
+}
+
+static void __device_link_del(struct device_link *link)
+{
+	dev_info(link->consumer, "Dropping the link to %s\n",
+		 dev_name(link->supplier));
+
+	list_del_rcu(&link->s_node);
+	list_del_rcu(&link->c_node);
+	call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
+}
+
+/**
+ * device_link_del - Delete a link between two devices.
+ * @link: Device link to delete.
+ *
+ * The caller must ensure proper synchronization of this function with runtime
+ * PM.
+ */
+void device_link_del(struct device_link *link)
+{
+	mutex_lock(&device_links_lock);
+	device_pm_lock();
+	__device_link_del(link);
+	device_pm_unlock();
+	mutex_unlock(&device_links_lock);
+}
+EXPORT_SYMBOL_GPL(device_link_del);
+
+static void device_links_missing_supplier(struct device *dev)
+{
+	struct device_link *link;
+
+	list_for_each_entry(link, &dev->links.suppliers, c_node)
+		if (link->status == DL_STATE_CONSUMER_PROBE)
+			WRITE_ONCE(link->status, DL_STATE_AVAILABLE);
+}
+
+/**
+ * device_links_check_suppliers - Check presence of supplier drivers.
+ * @dev: Consumer device.
+ *
+ * Check links from this device to any suppliers.  Walk the list of the device's
+ * links to suppliers and see if all of them are available.  If not, simply
+ * return -EPROBE_DEFER.
+ *
+ * We need to guarantee that the supplier will not go away after the check has
+ * been positive here.  It only can go away in __device_release_driver() and
+ * that function  checks the device's links to consumers.  This means we need to
+ * mark the link as "consumer probe in progress" to make the supplier removal
+ * wait for us to complete (or bad things may happen).
+ *
+ * Links with the DL_FLAG_STATELESS flag set are ignored.
+ */
+int device_links_check_suppliers(struct device *dev)
+{
+	struct device_link *link;
+	int ret = 0;
+
+	mutex_lock(&device_links_lock);
+
+	list_for_each_entry(link, &dev->links.suppliers, c_node) {
+		if (link->flags & DL_FLAG_STATELESS)
+			continue;
+
+		if (link->status != DL_STATE_AVAILABLE) {
+			device_links_missing_supplier(dev);
+			ret = -EPROBE_DEFER;
+			break;
+		}
+		WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE);
+	}
+	dev->links.status = DL_DEV_PROBING;
+
+	mutex_unlock(&device_links_lock);
+	return ret;
+}
+
+/**
+ * device_links_driver_bound - Update device links after probing its driver.
+ * @dev: Device to update the links for.
+ *
+ * The probe has been successful, so update links from this device to any
+ * consumers by changing their status to "available".
+ *
+ * Also change the status of @dev's links to suppliers to "active".
+ *
+ * Links with the DL_FLAG_STATELESS flag set are ignored.
+ */
+void device_links_driver_bound(struct device *dev)
+{
+	struct device_link *link;
+
+	mutex_lock(&device_links_lock);
+
+	list_for_each_entry(link, &dev->links.consumers, s_node) {
+		if (link->flags & DL_FLAG_STATELESS)
+			continue;
+
+		WARN_ON(link->status != DL_STATE_DORMANT);
+		WRITE_ONCE(link->status, DL_STATE_AVAILABLE);
+	}
+
+	list_for_each_entry(link, &dev->links.suppliers, c_node) {
+		if (link->flags & DL_FLAG_STATELESS)
+			continue;
+
+		WARN_ON(link->status != DL_STATE_CONSUMER_PROBE);
+		WRITE_ONCE(link->status, DL_STATE_ACTIVE);
+	}
+
+	dev->links.status = DL_DEV_DRIVER_BOUND;
+
+	mutex_unlock(&device_links_lock);
+}
+
+/**
+ * __device_links_no_driver - Update links of a device without a driver.
+ * @dev: Device without a drvier.
+ *
+ * Delete all non-persistent links from this device to any suppliers.
+ *
+ * Persistent links stay around, but their status is changed to "available",
+ * unless they already are in the "supplier unbind in progress" state in which
+ * case they need not be updated.
+ *
+ * Links with the DL_FLAG_STATELESS flag set are ignored.
+ */
+static void __device_links_no_driver(struct device *dev)
+{
+	struct device_link *link, *ln;
+
+	list_for_each_entry_safe_reverse(link, ln, &dev->links.suppliers, c_node) {
+		if (link->flags & DL_FLAG_STATELESS)
+			continue;
+
+		if (link->flags & DL_FLAG_AUTOREMOVE)
+			__device_link_del(link);
+		else if (link->status != DL_STATE_SUPPLIER_UNBIND)
+			WRITE_ONCE(link->status, DL_STATE_AVAILABLE);
+	}
+
+	dev->links.status = DL_DEV_NO_DRIVER;
+}
+
+void device_links_no_driver(struct device *dev)
+{
+	mutex_lock(&device_links_lock);
+	__device_links_no_driver(dev);
+	mutex_unlock(&device_links_lock);
+}
+
+/**
+ * device_links_driver_cleanup - Update links after driver removal.
+ * @dev: Device whose driver has just gone away.
+ *
+ * Update links to consumers for @dev by changing their status to "dormant" and
+ * invoke %__device_links_no_driver() to update links to suppliers for it as
+ * appropriate.
+ *
+ * Links with the DL_FLAG_STATELESS flag set are ignored.
+ */
+void device_links_driver_cleanup(struct device *dev)
+{
+	struct device_link *link;
+
+	mutex_lock(&device_links_lock);
+
+	list_for_each_entry(link, &dev->links.consumers, s_node) {
+		if (link->flags & DL_FLAG_STATELESS)
+			continue;
+
+		WARN_ON(link->flags & DL_FLAG_AUTOREMOVE);
+		WARN_ON(link->status != DL_STATE_SUPPLIER_UNBIND);
+		WRITE_ONCE(link->status, DL_STATE_DORMANT);
+	}
+
+	__device_links_no_driver(dev);
+
+	mutex_unlock(&device_links_lock);
+}
+
+/**
+ * device_links_busy - Check if there are any busy links to consumers.
+ * @dev: Device to check.
+ *
+ * Check each consumer of the device and return 'true' if its link's status
+ * is one of "consumer probe" or "active" (meaning that the given consumer is
+ * probing right now or its driver is present).  Otherwise, change the link
+ * state to "supplier unbind" to prevent the consumer from being probed
+ * successfully going forward.
+ *
+ * Return 'false' if there are no probing or active consumers.
+ *
+ * Links with the DL_FLAG_STATELESS flag set are ignored.
+ */
+bool device_links_busy(struct device *dev)
+{
+	struct device_link *link;
+	bool ret = false;
+
+	mutex_lock(&device_links_lock);
+
+	list_for_each_entry(link, &dev->links.consumers, s_node) {
+		if (link->flags & DL_FLAG_STATELESS)
+			continue;
+
+		if (link->status == DL_STATE_CONSUMER_PROBE
+		    || link->status == DL_STATE_ACTIVE) {
+			ret = true;
+			break;
+		}
+		WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND);
+	}
+
+	dev->links.status = DL_DEV_UNBINDING;
+
+	mutex_unlock(&device_links_lock);
+	return ret;
+}
+
+/**
+ * device_links_unbind_consumers - Force unbind consumers of the given device.
+ * @dev: Device to unbind the consumers of.
+ *
+ * Walk the list of links to consumers for @dev and if any of them is in the
+ * "consumer probe" state, wait for all device probes in progress to complete
+ * and start over.
+ *
+ * If that's not the case, change the status of the link to "supplier unbind"
+ * and check if the link was in the "active" state.  If so, force the consumer
+ * driver to unbind and start over (the consumer will not re-probe as we have
+ * changed the state of the link already).
+ *
+ * Links with the DL_FLAG_STATELESS flag set are ignored.
+ */
+void device_links_unbind_consumers(struct device *dev)
+{
+	struct device_link *link;
+
+ start:
+	mutex_lock(&device_links_lock);
+
+	list_for_each_entry(link, &dev->links.consumers, s_node) {
+		enum device_link_state status;
+
+		if (link->flags & DL_FLAG_STATELESS)
+			continue;
+
+		status = link->status;
+		if (status == DL_STATE_CONSUMER_PROBE) {
+			mutex_unlock(&device_links_lock);
+
+			wait_for_device_probe();
+			goto start;
+		}
+		WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND);
+		if (status == DL_STATE_ACTIVE) {
+			struct device *consumer = link->consumer;
+
+			get_device(consumer);
+
+			mutex_unlock(&device_links_lock);
+
+			device_release_driver_internal(consumer, NULL,
+						       consumer->parent);
+			put_device(consumer);
+			goto start;
+		}
+	}
+
+	mutex_unlock(&device_links_lock);
+}
+
+/**
+ * device_links_purge - Delete existing links to other devices.
+ * @dev: Target device.
+ */
+static void device_links_purge(struct device *dev)
+{
+	struct device_link *link, *ln;
+
+	/*
+	 * Delete all of the remaining links from this device to any other
+	 * devices (either consumers or suppliers).
+	 */
+	mutex_lock(&device_links_lock);
+
+	list_for_each_entry_safe_reverse(link, ln, &dev->links.suppliers, c_node) {
+		WARN_ON(link->status == DL_STATE_ACTIVE);
+		__device_link_del(link);
+	}
+
+	list_for_each_entry_safe_reverse(link, ln, &dev->links.consumers, s_node) {
+		WARN_ON(link->status != DL_STATE_DORMANT &&
+			link->status != DL_STATE_NONE);
+		__device_link_del(link);
+	}
+
+	mutex_unlock(&device_links_lock);
+}
+
+/* Device links support end. */
+
 int (*platform_notify)(struct device *dev) = NULL;
 int (*platform_notify_remove)(struct device *dev) = NULL;
 static struct kobject *dev_kobj;
@@ -711,6 +1197,9 @@  void device_initialize(struct device *de
 #ifdef CONFIG_GENERIC_MSI_IRQ
 	INIT_LIST_HEAD(&dev->msi_list);
 #endif
+	INIT_LIST_HEAD(&dev->links.consumers);
+	INIT_LIST_HEAD(&dev->links.suppliers);
+	dev->links.status = DL_DEV_NO_DRIVER;
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
@@ -1240,6 +1729,8 @@  void device_del(struct device *dev)
 	if (dev->bus)
 		blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_DEL_DEVICE, dev);
+
+	device_links_purge(dev);
 	dpm_sysfs_remove(dev);
 	if (parent)
 		klist_del(&dev->p->knode_parent);
Index: linux-pm/drivers/base/dd.c
===================================================================
--- linux-pm.orig/drivers/base/dd.c
+++ linux-pm/drivers/base/dd.c
@@ -249,6 +249,7 @@  static void driver_bound(struct device *
 		 __func__, dev_name(dev));
 
 	klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
+	device_links_driver_bound(dev);
 
 	device_pm_check_callbacks(dev);
 
@@ -341,6 +342,10 @@  static int really_probe(struct device *d
 		return ret;
 	}
 
+	ret = device_links_check_suppliers(dev);
+	if (ret)
+		return ret;
+
 	atomic_inc(&probe_count);
 	pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
 		 drv->bus->name, __func__, drv->name, dev_name(dev));
@@ -399,6 +404,7 @@  probe_failed:
 		blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
 pinctrl_bind_failed:
+	device_links_no_driver(dev);
 	devres_release_all(dev);
 	driver_sysfs_remove(dev);
 	dev->driver = NULL;
@@ -756,7 +762,7 @@  EXPORT_SYMBOL_GPL(driver_attach);
  * __device_release_driver() must be called with @dev lock held.
  * When called for a USB interface, @dev->parent lock must be held as well.
  */
-static void __device_release_driver(struct device *dev)
+static void __device_release_driver(struct device *dev, struct device *parent)
 {
 	struct device_driver *drv;
 
@@ -765,6 +771,25 @@  static void __device_release_driver(stru
 		if (driver_allows_async_probing(drv))
 			async_synchronize_full();
 
+		while (device_links_busy(dev)) {
+			device_unlock(dev);
+			if (parent)
+				device_unlock(parent);
+
+			device_links_unbind_consumers(dev);
+			if (parent)
+				device_lock(parent);
+
+			device_lock(dev);
+			/*
+			 * A concurrent invocation of the same function might
+			 * have released the driver successfully while this one
+			 * was waiting, so check for that.
+			 */
+			if (dev->driver != drv)
+				return;
+		}
+
 		pm_runtime_get_sync(dev);
 
 		driver_sysfs_remove(dev);
@@ -780,6 +805,8 @@  static void __device_release_driver(stru
 			dev->bus->remove(dev);
 		else if (drv->remove)
 			drv->remove(dev);
+
+		device_links_driver_cleanup(dev);
 		devres_release_all(dev);
 		dev->driver = NULL;
 		dev_set_drvdata(dev, NULL);
@@ -796,16 +823,16 @@  static void __device_release_driver(stru
 	}
 }
 
-static void device_release_driver_internal(struct device *dev,
-					   struct device_driver *drv,
-					   struct device *parent)
+void device_release_driver_internal(struct device *dev,
+				    struct device_driver *drv,
+				    struct device *parent)
 {
 	if (parent)
 		device_lock(parent);
 
 	device_lock(dev);
 	if (!drv || drv == dev->driver)
-		__device_release_driver(dev);
+		__device_release_driver(dev, parent);
 
 	device_unlock(dev);
 	if (parent)
@@ -818,6 +845,10 @@  static void device_release_driver_intern
  *
  * Manually detach device from driver.
  * When called for a USB interface, @dev->parent lock must be held.
+ *
+ * If this function is to be called with @dev->parent lock held, ensure that
+ * the device's consumers are unbound in advance or that their locks can be
+ * acquired under the @dev->parent lock.
  */
 void device_release_driver(struct device *dev)
 {
Index: linux-pm/include/linux/device.h
===================================================================
--- linux-pm.orig/include/linux/device.h
+++ linux-pm/include/linux/device.h
@@ -707,6 +707,79 @@  struct device_dma_parameters {
 };
 
 /**
+ * enum device_link_state - Device link states.
+ * @NONE: The presence of the drivers is not being tracked.
+ * @DORMANT: None of the supplier/consumer drivers is present.
+ * @AVAILABLE: The supplier driver is present, but the consumer is not.
+ * @CONSUMER_PROBE: The supplier driver is present and the consumer is probing.
+ * @ACTIVE: Active link; both the supplier and consumer drivers are present.
+ * @SUPPLIER_UNBIND: The supplier driver is unbinding.
+ */
+enum device_link_state {
+	DL_STATE_NONE = -1,
+	DL_STATE_DORMANT = 0,
+	DL_STATE_AVAILABLE,
+	DL_STATE_CONSUMER_PROBE,
+	DL_STATE_ACTIVE,
+	DL_STATE_SUPPLIER_UNBIND,
+};
+
+/*
+ * Device link flags.
+ *
+ * STATELESS: The core won't track the presence of supplier/consumer drivers.
+ * AUTOREMOVE: Remove this link automatically on consumer driver unbind.
+ */
+#define DL_FLAG_STATELESS	(1 << 0)
+#define DL_FLAG_AUTOREMOVE	(1 << 1)
+
+/**
+ * struct device_link - Device link representation.
+ * @supplier: The device on the supplier end of the link.
+ * @s_node: Hook to the supplier device's list of links to consumers.
+ * @consumer: The device on the consumer end of the link.
+ * @c_node: Hook to the consumer device's list of links to suppliers.
+ * @status: The state of the link (with respect to the presence of drivers).
+ * @flags: Link flags.
+ * @rcu_head: An RCU head to use for deferred execution of SRCU callbacks.
+ */
+struct device_link {
+	struct device *supplier;
+	struct list_head s_node;
+	struct device *consumer;
+	struct list_head c_node;
+	enum device_link_state status;
+	u32 flags;
+	struct rcu_head rcu_head;
+};
+
+/**
+ * enum dl_dev_state - Device driver presence tracking information.
+ * @NO_DRIVER: There is no driver attached to the device.
+ * @PROBING: A driver is probing.
+ * @DRIVER_BOUND: The driver has been bound to the device.
+ * @UNBINDING: The driver is unbinding from the device.
+ */
+enum dl_dev_state {
+	DL_DEV_NO_DRIVER = 0,
+	DL_DEV_PROBING,
+	DL_DEV_DRIVER_BOUND,
+	DL_DEV_UNBINDING,
+};
+
+/**
+ * struct dev_links_info - Device data related to device links.
+ * @suppliers: List of links to supplier devices.
+ * @consumers: List of links to consumer devices.
+ * @status: Driver status information.
+ */
+struct dev_links_info {
+	struct list_head suppliers;
+	struct list_head consumers;
+	enum dl_dev_state status;
+};
+
+/**
  * struct device - The basic device structure
  * @parent:	The device's "parent" device, the device to which it is attached.
  * 		In most cases, a parent device is some sort of bus or host
@@ -797,6 +870,7 @@  struct device {
 					   core doesn't touch it */
 	void		*driver_data;	/* Driver data, set and get with
 					   dev_set/get_drvdata */
+	struct dev_links_info	links;
 	struct dev_pm_info	power;
 	struct dev_pm_domain	*pm_domain;
 
@@ -1113,6 +1187,10 @@  extern void device_shutdown(void);
 /* debugging and troubleshooting/diagnostic helpers. */
 extern const char *dev_driver_string(const struct device *dev);
 
+/* Device links interface. */
+struct device_link *device_link_add(struct device *consumer,
+				    struct device *supplier, u32 flags);
+void device_link_del(struct device_link *link);
 
 #ifdef CONFIG_PRINTK
 
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -131,6 +131,7 @@  void device_pm_add(struct device *dev)
 		dev_warn(dev, "parent %s should not be sleeping\n",
 			dev_name(dev->parent));
 	list_add_tail(&dev->power.entry, &dpm_list);
+	dev->power.in_dpm_list = true;
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -145,6 +146,7 @@  void device_pm_remove(struct device *dev
 	complete_all(&dev->power.completion);
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
+	dev->power.in_dpm_list = false;
 	mutex_unlock(&dpm_list_mtx);
 	device_wakeup_disable(dev);
 	pm_runtime_remove(dev);
Index: linux-pm/drivers/base/power/power.h
===================================================================
--- linux-pm.orig/drivers/base/power/power.h
+++ linux-pm/drivers/base/power/power.h
@@ -127,6 +127,11 @@  extern void device_pm_move_after(struct
 extern void device_pm_move_last(struct device *);
 extern void device_pm_check_callbacks(struct device *dev);
 
+static inline bool device_pm_initialized(struct device *dev)
+{
+	return dev->power.in_dpm_list;
+}
+
 #else /* !CONFIG_PM_SLEEP */
 
 static inline void device_pm_sleep_init(struct device *dev) {}
@@ -146,6 +151,11 @@  static inline void device_pm_move_last(s
 
 static inline void device_pm_check_callbacks(struct device *dev) {}
 
+static inline bool device_pm_initialized(struct device *dev)
+{
+	return device_is_registered(dev);
+}
+
 #endif /* !CONFIG_PM_SLEEP */
 
 static inline void device_pm_init(struct device *dev)
Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -559,6 +559,7 @@  struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned int		can_wakeup:1;
 	unsigned int		async_suspend:1;
+	bool			in_dpm_list:1;	/* Owned by the PM core */
 	bool			is_prepared:1;	/* Owned by the PM core */
 	bool			is_suspended:1;	/* Ditto */
 	bool			is_noirq_suspended:1;