diff mbox

[v2,02/10] driver core: Functional dependencies tracking support

Message ID 1466144820-6286-3-git-send-email-m.szyprowski@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marek Szyprowski June 17, 2016, 6:26 a.m. UTC
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

Currently, there is a problem with handling cases where functional
dependencies between devices are involved.

What I mean by a "functional dependency" is when the driver of device
B needs both device A and its driver to be present and functional to
be able to work.  This implies that the driver of A needs to be
working for B to be probed successfully and it cannot be unbound from
the device before the B's driver.  This also has certain consequences
for power management of these devices (suspend/resume and runtime PM
ordering).

Add support for representing those functional dependencies between
devices to allow the driver core to track them and act on them in
certain cases where they matter.

The argument for doing that in the driver core is that there are
quite a few distinct use cases related to that, they are relatively
hard to get right in a driver (if one wants to address all of them
properly) and it only gets worse if multiplied by the number of
drivers potentially needing to do it.  Morever, at least one case
(asynchronous system suspend/resume) cannot be handled in a single
driver at all, because it requires the driver of A to wait for B to
suspend (during system suspend) and the driver of B to wait for
A to resume (during system resume).

To that end, represent links between devices (or more precisely
between device+driver combos) as a struct devlink object containing
pointers to the devices in question, a list node for each of them,
status information, flags, a lock and an RCU head for synchronization.

Also add two new list heads, supplier_links and consumer_links, to
struct device to represent the lists of links to the devices that
depend on the given one (consumers) and to the devices depended on
by it (suppliers), respectively.

The entire data structure consisting of all of the lists of link
objects for all devices is protected by SRCU (for list walking)
and a by mutex (for link object addition/removal).  In addition
to that, each link object has an internal status field whose
value reflects what's happening to the devices pointed to by
the link.  That status field is protected by an internal spinlock.

New links are added by calling device_link_add() which may happen
either before the consumer device is probed or when probing it, in
which case the caller needs to ensure that the driver of the
supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
flag should be passed to device_link_add() to reflect that.

Link objects are deleted either explicitly, by calling
device_link_del() on the link object in question, or automatically,
when the consumer device is unbound from its driver or when one
of the target devices is deleted, depending on the link type.

There are two types of link objects, persistent and non-persistent.
The difference between them is that the persistent links stay around
until one of the target devices is deleted, which the non-persistent
ones are deleted when the consumer driver is unbound from its device
(ie. they are assumed to be valid only as long as the consumer device
has a driver bound to it).  The DEVICE_LINK_PERSISTENT flag has to
be passed to device_link_add() so as to create a persistent link.

One of the actions carried out by device_link_add() is to reorder
the lists used for device shutdown and system suspend/resume to
put the consumer device along with all of its children and all of
its consumers (and so on, recursively) to the ends of those list
in order to ensure the right ordering between the all of the supplier
and consumer devices.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 drivers/base/base.h    |  11 ++
 drivers/base/core.c    | 386 +++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/base/dd.c      |  42 +++++-
 include/linux/device.h |  36 +++++
 4 files changed, 470 insertions(+), 5 deletions(-)

Comments

Lukas Wunner June 17, 2016, 10:36 a.m. UTC | #1
Hi Marek,

On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
> From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> 
> Currently, there is a problem with handling cases where functional
> dependencies between devices are involved.
> 
> What I mean by a "functional dependency" is when the driver of device
> B needs both device A and its driver to be present and functional to
> be able to work.  This implies that the driver of A needs to be
> working for B to be probed successfully and it cannot be unbound from
> the device before the B's driver.  This also has certain consequences
> for power management of these devices (suspend/resume and runtime PM
> ordering).
> 
> Add support for representing those functional dependencies between
> devices to allow the driver core to track them and act on them in
> certain cases where they matter.

Rafael has indicated that he intends to respin this series:
https://lkml.org/lkml/2016/6/8/1061

We also have such a functional dependency for Thunderbolt on Macs:
On resume from system sleep, the PCIe hotplug ports may not resume
before the thunderbolt driver has reestablished the PCI tunnels.
Currently this is enforced by quirk_apple_wait_for_thunderbolt()
in drivers/pci/quirks.c. It would be good if we could represent
this dependency using something like Rafael's approach instead of
open coding it, however one detail in Rafael's patches is problematic:

> New links are added by calling device_link_add() which may happen
> either before the consumer device is probed or when probing it, in
> which case the caller needs to ensure that the driver of the
> supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
> flag should be passed to device_link_add() to reflect that.

The thunderbolt driver cannot call device_link_add() before the
PCIe hotplug ports are bound to a driver unless we amend portdrv
to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
if the thunderbolt driver isn't loaded.

It would therefore be beneficial if device_link_add() can be
called even *after* the consumer is bound.

Thanks,

Lukas

> 
> Link objects are deleted either explicitly, by calling
> device_link_del() on the link object in question, or automatically,
> when the consumer device is unbound from its driver or when one
> of the target devices is deleted, depending on the link type.
> 
> There are two types of link objects, persistent and non-persistent.
> The difference between them is that the persistent links stay around
> until one of the target devices is deleted, which the non-persistent
> ones are deleted when the consumer driver is unbound from its device
> (ie. they are assumed to be valid only as long as the consumer device
> has a driver bound to it).  The DEVICE_LINK_PERSISTENT flag has to
> be passed to device_link_add() so as to create a persistent link.
> 
> One of the actions carried out by device_link_add() is to reorder
> the lists used for device shutdown and system suspend/resume to
> put the consumer device along with all of its children and all of
> its consumers (and so on, recursively) to the ends of those list
> in order to ensure the right ordering between the all of the supplier
> and consumer devices.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
>  drivers/base/base.h    |  11 ++
>  drivers/base/core.c    | 386 +++++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/base/dd.c      |  42 +++++-
>  include/linux/device.h |  36 +++++
>  4 files changed, 470 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index e05db388bd1c..cccb1d211541 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -107,6 +107,9 @@ extern void bus_remove_device(struct device *dev);
>  
>  extern int bus_add_driver(struct device_driver *drv);
>  extern void bus_remove_driver(struct device_driver *drv);
> +extern void device_release_driver_internal(struct device *dev,
> +					   struct device_driver *drv,
> +					   struct device *parent);
>  
>  extern void driver_detach(struct device_driver *drv);
>  extern int driver_probe_device(struct device_driver *drv, struct device *dev);
> @@ -152,3 +155,11 @@ extern int devtmpfs_init(void);
>  #else
>  static inline int devtmpfs_init(void) { return 0; }
>  #endif
> +
> +/* Device links */
> +extern int device_links_check_suppliers(struct device *dev);
> +extern void device_links_driver_bound(struct device *dev);
> +extern void device_links_driver_gone(struct device *dev);
> +extern void device_links_no_driver(struct device *dev);
> +extern bool device_links_busy(struct device *dev);
> +extern void device_links_unbind_consumers(struct device *dev);
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 0a8bdade53f2..416341df3268 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -44,6 +44,367 @@ static int __init sysfs_deprecated_setup(char *arg)
>  early_param("sysfs.deprecated", sysfs_deprecated_setup);
>  #endif
>  
> +/* Device links support. */
> +
> +DEFINE_STATIC_SRCU(device_links_srcu);
> +static DEFINE_MUTEX(device_links_lock);
> +
> +static int device_reorder_to_tail(struct device *dev, void *not_used)
> +{
> +	struct devlink *link;
> +
> +	devices_kset_move_last(dev);
> +	device_pm_move_last(dev);
> +	device_for_each_child(dev, NULL, device_reorder_to_tail);
> +	list_for_each_entry(link, &dev->consumer_links, c_node)
> +		device_reorder_to_tail(link->consumer, NULL);
> +
> +	return 0;
> +}
> +
> +/**
> + * device_link_add - Create a link between two devices.
> + * @consumer: Consumer end of the link.
> + * @supplier: Supplier end of the link.
> + * @flags: Link flags.
> + *
> + * At least one of the flags must be set.  If DEVICE_LINK_PROBE_TIME is set, the
> + * caller is expected to know that (a) the supplier device is present and active
> + * (ie. its driver is functional) and (b) the consumer device is probing at the
> + * moment and therefore the initial state of the link will be "consumer probe"
> + * in that case.  If DEVICE_LINK_PROBE_TIME is not set, DEVICE_LINK_PERSISTENT
> + * must be set (meaning that the link will not go away when the consumer driver
> + * goes away).
> + *
> + * A side effect of the link creation is re-ordering of dpm_list and the
> + * devices_kset list by moving the consumer device and all devices depending
> + * on it to the ends of those lists.
> + */
> +struct devlink *device_link_add(struct device *consumer,
> +				struct device *supplier, u32 flags)
> +{
> +	struct devlink *link;
> +
> +	if (!consumer || !supplier || !flags)
> +		return NULL;
> +
> +	mutex_lock(&device_links_lock);
> +
> +	list_for_each_entry(link, &supplier->supplier_links, s_node)
> +		if (link->consumer == consumer)
> +			goto out;
> +
> +	link = kmalloc(sizeof(*link), GFP_KERNEL);
> +	if (!link)
> +		goto out;
> +
> +	get_device(supplier);
> +	link->supplier = supplier;
> +	INIT_LIST_HEAD(&link->s_node);
> +	get_device(consumer);
> +	link->consumer = consumer;
> +	INIT_LIST_HEAD(&link->c_node);
> +	link->flags = flags;
> +	link->status = (flags & DEVICE_LINK_PROBE_TIME) ?
> +			DEVICE_LINK_CONSUMER_PROBE : DEVICE_LINK_DORMANT;
> +	spin_lock_init(&link->lock);
> +
> +	/*
> +	 * Move the consumer and all of the devices depending on it to the end
> +	 * of dpm_list and the devices_kset list.
> +	 *
> +	 * We have to hold dpm_list locked throughout all that or else we may
> +	 * end up suspending with a wrong ordering of it.
> +	 */
> +	device_pm_lock();
> +	device_reorder_to_tail(consumer, NULL);
> +	device_pm_unlock();
> +
> +	list_add_tail_rcu(&link->s_node, &supplier->supplier_links);
> +	list_add_tail_rcu(&link->c_node, &consumer->consumer_links);
> +
> +	dev_info(consumer, "Linked as a consumer to %s\n", dev_name(supplier));
> +
> + out:
> +	mutex_unlock(&device_links_lock);
> +	return link;
> +}
> +EXPORT_SYMBOL_GPL(device_link_add);
> +
> +static void __devlink_free_srcu(struct rcu_head *rhead)
> +{
> +	struct devlink *link;
> +
> +	link = container_of(rhead, struct devlink, rcu_head);
> +	put_device(link->consumer);
> +	put_device(link->supplier);
> +	kfree(link);
> +}
> +
> +static void devlink_del(struct devlink *link)
> +{
> +	dev_info(link->consumer, "Dropping the link to %s\n",
> +		 dev_name(link->supplier));
> +
> +	list_del_rcu(&link->s_node);
> +	list_del_rcu(&link->c_node);
> +	call_srcu(&device_links_srcu, &link->rcu_head, __devlink_free_srcu);
> +}
> +
> +/**
> + * device_link_del - Delete a link between two devices.
> + * @link: Device link to delete.
> + */
> +void device_link_del(struct devlink *link)
> +{
> +	mutex_lock(&device_links_lock);
> +	devlink_del(link);
> +	mutex_unlock(&device_links_lock);
> +}
> +EXPORT_SYMBOL_GPL(device_link_del);
> +
> +static int device_links_read_lock(void)
> +{
> +	return srcu_read_lock(&device_links_srcu);
> +}
> +
> +static void device_links_read_unlock(int idx)
> +{
> +	return srcu_read_unlock(&device_links_srcu, idx);
> +}
> +
> +static void device_links_missing_supplier(struct device *dev)
> +{
> +	struct devlink *link;
> +
> +	list_for_each_entry_rcu(link, &dev->consumer_links, c_node) {
> +		spin_lock(&link->lock);
> +
> +		if (link->status == DEVICE_LINK_CONSUMER_PROBE)
> +			link->status = DEVICE_LINK_AVAILABLE;
> +
> +		spin_unlock(&link->lock);
> +	}
> +}
> +
> +/**
> + * device_links_check_suppliers - Check supplier devices for this one.
> + * @dev: Consumer device.
> + *
> + * Check links from this device to any suppliers.  Walk the list of the device's
> + * consumer links and see if all of the suppliers are available.  If not, simply
> + * return -EPROBE_DEFER.
> + *
> + * Walk the list under SRCU and check each link's status field under its lock.
> + *
> + * We need to guarantee that the supplier will not go away after the check has
> + * been positive here.  It only can go away in __device_release_driver() and
> + * that function  checks the device's links to consumers.  This means we need to
> + * mark the link as "consumer probe in progress" to make the supplier removal
> + * wait for us to complete (or bad things may happen).
> + */
> +int device_links_check_suppliers(struct device *dev)
> +{
> +	struct devlink *link;
> +	int idx, ret = 0;
> +
> +	idx = device_links_read_lock();
> +
> +	list_for_each_entry_rcu(link, &dev->consumer_links, c_node) {
> +		spin_lock(&link->lock);
> +		if (link->status != DEVICE_LINK_AVAILABLE) {
> +			spin_unlock(&link->lock);
> +			device_links_missing_supplier(dev);
> +			ret = -EPROBE_DEFER;
> +			break;
> +		}
> +		link->status = DEVICE_LINK_CONSUMER_PROBE;
> +		spin_unlock(&link->lock);
> +	}
> +
> +	device_links_read_unlock(idx);
> +	return ret;
> +}
> +
> +/**
> + * device_links_driver_bound - Update device links after probing its driver.
> + * @dev: Device to update the links for.
> + *
> + * The probe has been successful, so update links from this device to any
> + * consumers by changing their status to "available".
> + *
> + * Also change the status of @dev's links to suppliers to "active".
> + */
> +void device_links_driver_bound(struct device *dev)
> +{
> +	struct devlink *link;
> +	int idx;
> +
> +	idx = device_links_read_lock();
> +
> +	list_for_each_entry_rcu(link, &dev->supplier_links, s_node) {
> +		spin_lock(&link->lock);
> +		WARN_ON(link->status != DEVICE_LINK_DORMANT);
> +		link->status = DEVICE_LINK_AVAILABLE;
> +		spin_unlock(&link->lock);
> +	}
> +
> +	list_for_each_entry_rcu(link, &dev->consumer_links, c_node) {
> +		spin_lock(&link->lock);
> +		WARN_ON(link->status != DEVICE_LINK_CONSUMER_PROBE);
> +		link->status = DEVICE_LINK_ACTIVE;
> +		spin_unlock(&link->lock);
> +	}
> +
> +	device_links_read_unlock(idx);
> +}
> +
> +/**
> + * device_links_driver_gone - Update links after driver removal.
> + * @dev: Device whose driver has gone away.
> + *
> + * Update links to consumers for @dev by changing their status to "dormant".
> + */
> +void device_links_driver_gone(struct device *dev)
> +{
> +	struct devlink *link;
> +	int idx;
> +
> +	idx = device_links_read_lock();
> +
> +	list_for_each_entry_rcu(link, &dev->supplier_links, s_node) {
> +		WARN_ON(!(link->flags & DEVICE_LINK_PERSISTENT));
> +		spin_lock(&link->lock);
> +		WARN_ON(link->status != DEVICE_LINK_SUPPLIER_UNBIND);
> +		link->status = DEVICE_LINK_DORMANT;
> +		spin_unlock(&link->lock);
> +	}
> +
> +	device_links_read_unlock(idx);
> +}
> +
> +/**
> + * device_links_no_driver - Update links of a device without a driver.
> + * @dev: Device without a drvier.
> + *
> + * Delete all non-persistent links from this device to any suppliers.
> + * Persistent links stay around, but their status is changed to "available",
> + * unless they already are in the "supplier unbind in progress" state in which
> + * case they need not be updated.
> + */
> +void device_links_no_driver(struct device *dev)
> +{
> +	struct devlink *link, *ln;
> +
> +	mutex_lock(&device_links_lock);
> +
> +	list_for_each_entry_safe_reverse(link, ln, &dev->consumer_links, c_node)
> +		if (link->flags & DEVICE_LINK_PERSISTENT) {
> +			spin_lock(&link->lock);
> +
> +			if (link->status != DEVICE_LINK_SUPPLIER_UNBIND)
> +				link->status = DEVICE_LINK_AVAILABLE;
> +
> +			spin_unlock(&link->lock);
> +		} else {
> +			devlink_del(link);
> +		}
> +
> +	mutex_unlock(&device_links_lock);
> +}
> +
> +/**
> + * device_links_busy - Check if there are any busy links to consumers.
> + * @dev: Device to check.
> + *
> + * Check each consumer of the device and return 'true' it if its link's status
> + * is one of "consumer probe" or "active" (meaning that the given consumer is
> + * probing right now or its driver is present).  Otherwise, change the link
> + * state to "supplier unbind" to prevent the consumer from being probed
> + * successfully going forward.
> + *
> + * Return 'false' if there are no probing or active consumers.
> + */
> +bool device_links_busy(struct device *dev)
> +{
> +	struct devlink *link;
> +	int idx;
> +	bool ret = false;
> +
> +	idx = device_links_read_lock();
> +
> +	list_for_each_entry_rcu(link, &dev->supplier_links, s_node) {
> +		spin_lock(&link->lock);
> +		if (link->status == DEVICE_LINK_CONSUMER_PROBE
> +		    || link->status == DEVICE_LINK_ACTIVE) {
> +			spin_unlock(&link->lock);
> +			ret = true;
> +			break;
> +		}
> +		link->status = DEVICE_LINK_SUPPLIER_UNBIND;
> +		spin_unlock(&link->lock);
> +	}
> +
> +	device_links_read_unlock(idx);
> +	return ret;
> +}
> +
> +/**
> + * device_links_unbind_consumers - Force unbind consumers of the given device.
> + * @dev: Device to unbind the consumers of.
> + *
> + * Walk the list of links to consumers for @dev and if any of them is in the
> + * "consumer probe" state, wait for all device probes in progress to complete
> + * and start over.
> + *
> + * If that's not the case, change the status of the link to "supplier unbind"
> + * and check if the link was in the "active" state.  If so, force the consumer
> + * driver to unbind and start over (the consumer will not re-probe as we have
> + * changed the state of the link already).
> + */
> +void device_links_unbind_consumers(struct device *dev)
> +{
> +	struct devlink *link;
> +	int idx;
> +
> + start:
> +	idx = device_links_read_lock();
> +
> +	list_for_each_entry_rcu(link, &dev->supplier_links, s_node) {
> +		enum devlink_status status;
> +
> +		spin_lock(&link->lock);
> +		status = link->status;
> +		if (status == DEVICE_LINK_CONSUMER_PROBE) {
> +			spin_unlock(&link->lock);
> +
> +			device_links_read_unlock(idx);
> +
> +			wait_for_device_probe();
> +			goto start;
> +		}
> +		link->status = DEVICE_LINK_SUPPLIER_UNBIND;
> +		if (status == DEVICE_LINK_ACTIVE) {
> +			struct device *consumer = link->consumer;
> +
> +			get_device(consumer);
> +			spin_unlock(&link->lock);
> +
> +			device_links_read_unlock(idx);
> +
> +			device_release_driver_internal(consumer, NULL,
> +						       consumer->parent);
> +			put_device(consumer);
> +			goto start;
> +		}
> +		spin_unlock(&link->lock);
> +	}
> +
> +	device_links_read_unlock(idx);
> +}
> +
> +/* Device links support end. */
> +
>  int (*platform_notify)(struct device *dev) = NULL;
>  int (*platform_notify_remove)(struct device *dev) = NULL;
>  static struct kobject *dev_kobj;
> @@ -711,6 +1072,8 @@ void device_initialize(struct device *dev)
>  #ifdef CONFIG_GENERIC_MSI_IRQ
>  	INIT_LIST_HEAD(&dev->msi_list);
>  #endif
> +	INIT_LIST_HEAD(&dev->supplier_links);
> +	INIT_LIST_HEAD(&dev->consumer_links);
>  }
>  EXPORT_SYMBOL_GPL(device_initialize);
>  
> @@ -1233,6 +1596,7 @@ void device_del(struct device *dev)
>  {
>  	struct device *parent = dev->parent;
>  	struct class_interface *class_intf;
> +	struct devlink *link, *ln;
>  
>  	/* Notify clients of device removal.  This call must come
>  	 * before dpm_sysfs_remove().
> @@ -1240,6 +1604,28 @@ void device_del(struct device *dev)
>  	if (dev->bus)
>  		blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>  					     BUS_NOTIFY_DEL_DEVICE, dev);
> +
> +	/*
> +	 * Delete all of the remaining links from this device to any other
> +	 * devices (either consumers or suppliers).
> +	 *
> +	 * This requires that all links be dormant, so warn if that's no the
> +	 * case.
> +	 */
> +	mutex_lock(&device_links_lock);
> +
> +	list_for_each_entry_safe_reverse(link, ln, &dev->consumer_links, c_node) {
> +		WARN_ON(link->status != DEVICE_LINK_DORMANT);
> +		devlink_del(link);
> +	}
> +
> +	list_for_each_entry_safe_reverse(link, ln, &dev->supplier_links, s_node) {
> +		WARN_ON(link->status != DEVICE_LINK_DORMANT);
> +		devlink_del(link);
> +	}
> +
> +	mutex_unlock(&device_links_lock);
> +
>  	dpm_sysfs_remove(dev);
>  	if (parent)
>  		klist_del(&dev->p->knode_parent);
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index d9e76e9205c7..7c0abeba89e9 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -249,6 +249,7 @@ static void driver_bound(struct device *dev)
>  		 __func__, dev_name(dev));
>  
>  	klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
> +	device_links_driver_bound(dev);
>  
>  	device_pm_check_callbacks(dev);
>  
> @@ -399,6 +400,7 @@ probe_failed:
>  		blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>  					     BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
>  pinctrl_bind_failed:
> +	device_links_no_driver(dev);
>  	devres_release_all(dev);
>  	driver_sysfs_remove(dev);
>  	dev->driver = NULL;
> @@ -489,6 +491,10 @@ int driver_probe_device(struct device_driver *drv, struct device *dev)
>  	if (!device_is_registered(dev))
>  		return -ENODEV;
>  
> +	ret = device_links_check_suppliers(dev);
> +	if (ret)
> +		return ret;
> +
>  	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
>  		 drv->bus->name, __func__, dev_name(dev), drv->name);
>  
> @@ -756,7 +762,7 @@ EXPORT_SYMBOL_GPL(driver_attach);
>   * __device_release_driver() must be called with @dev lock held.
>   * When called for a USB interface, @dev->parent lock must be held as well.
>   */
> -static void __device_release_driver(struct device *dev)
> +static void __device_release_driver(struct device *dev, struct device *parent)
>  {
>  	struct device_driver *drv;
>  
> @@ -765,6 +771,25 @@ static void __device_release_driver(struct device *dev)
>  		if (driver_allows_async_probing(drv))
>  			async_synchronize_full();
>  
> +		while (device_links_busy(dev)) {
> +			device_unlock(dev);
> +			if (parent)
> +				device_unlock(parent);
> +
> +			device_links_unbind_consumers(dev);
> +			if (parent)
> +				device_lock(parent);
> +
> +			device_lock(dev);
> +			/*
> +			 * A concurrent invocation of the same function might
> +			 * have released the driver successfully while this one
> +			 * was waiting, so check for that.
> +			 */
> +			if (dev->driver != drv)
> +				return;
> +		}
> +
>  		pm_runtime_get_sync(dev);
>  
>  		driver_sysfs_remove(dev);
> @@ -780,6 +805,9 @@ static void __device_release_driver(struct device *dev)
>  			dev->bus->remove(dev);
>  		else if (drv->remove)
>  			drv->remove(dev);
> +
> +		device_links_driver_gone(dev);
> +		device_links_no_driver(dev);
>  		devres_release_all(dev);
>  		dev->driver = NULL;
>  		dev_set_drvdata(dev, NULL);
> @@ -796,16 +824,16 @@ static void __device_release_driver(struct device *dev)
>  	}
>  }
>  
> -static void device_release_driver_internal(struct device *dev,
> -					   struct device_driver *drv,
> -					   struct device *parent)
> +void device_release_driver_internal(struct device *dev,
> +				    struct device_driver *drv,
> +				    struct device *parent)
>  {
>  	if (parent)
>  		device_lock(parent);
>  
>  	device_lock(dev);
>  	if (!drv || drv == dev->driver)
> -		__device_release_driver(dev);
> +		__device_release_driver(dev, parent);
>  
>  	device_unlock(dev);
>  	if (parent)
> @@ -818,6 +846,10 @@ static void device_release_driver_internal(struct device *dev,
>   *
>   * Manually detach device from driver.
>   * When called for a USB interface, @dev->parent lock must be held.
> + *
> + * If this function is to be called with @dev->parent lock held, ensure that
> + * the device's consumers are unbound in advance or that their locks can be
> + * acquired under the @dev->parent lock.
>   */
>  void device_release_driver(struct device *dev)
>  {
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 38f02814d53a..647204bd74a0 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -706,6 +706,34 @@ struct device_dma_parameters {
>  	unsigned long segment_boundary_mask;
>  };
>  
> +enum devlink_status {
> +	DEVICE_LINK_DORMANT = 0,	/* Link not in use. */
> +	DEVICE_LINK_AVAILABLE,		/* Supplier driver is present. */
> +	DEVICE_LINK_ACTIVE,		/* Consumer driver is present too. */
> +	DEVICE_LINK_CONSUMER_PROBE,	/* Consumer is probing. */
> +	DEVICE_LINK_SUPPLIER_UNBIND,	/* Supplier is unbinding. */
> +};
> +
> +/*
> + * Device link flags.
> + *
> + * PERSISTENT: Do not delete the link on consumer device driver unbind.
> + * PROBE_TIME: Assume supplier device functional when creating the link.
> + */
> +#define DEVICE_LINK_PERSISTENT	(1 << 0)
> +#define DEVICE_LINK_PROBE_TIME	(1 << 1)
> +
> +struct devlink {
> +	struct device *supplier;
> +	struct list_head s_node;
> +	struct device *consumer;
> +	struct list_head c_node;
> +	enum devlink_status status;
> +	u32 flags;
> +	spinlock_t lock;
> +	struct rcu_head rcu_head;
> +};
> +
>  /**
>   * struct device - The basic device structure
>   * @parent:	The device's "parent" device, the device to which it is attached.
> @@ -731,6 +759,8 @@ struct device_dma_parameters {
>   * 		on.  This shrinks the "Board Support Packages" (BSPs) and
>   * 		minimizes board-specific #ifdefs in drivers.
>   * @driver_data: Private pointer for driver specific info.
> + * @supplier_links: Links to consumer devices.
> + * @consumer_links: Links to supplier devices.
>   * @power:	For device power management.
>   * 		See Documentation/power/devices.txt for details.
>   * @pm_domain:	Provide callbacks that are executed during system suspend,
> @@ -797,6 +827,8 @@ struct device {
>  					   core doesn't touch it */
>  	void		*driver_data;	/* Driver data, set and get with
>  					   dev_set/get_drvdata */
> +	struct list_head	supplier_links;
> +	struct list_head	consumer_links;
>  	struct dev_pm_info	power;
>  	struct dev_pm_domain	*pm_domain;
>  
> @@ -1113,6 +1145,10 @@ extern void device_shutdown(void);
>  /* debugging and troubleshooting/diagnostic helpers. */
>  extern const char *dev_driver_string(const struct device *dev);
>  
> +/* Device links interface. */
> +struct devlink *device_link_add(struct device *consumer,
> +				struct device *supplier, u32 flags);
> +void device_link_del(struct devlink *link);
>  
>  #ifdef CONFIG_PRINTK
>  
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki June 17, 2016, 12:54 p.m. UTC | #2
On Fri, Jun 17, 2016 at 12:36 PM, Lukas Wunner <lukas@wunner.de> wrote:
> Hi Marek,
>
> On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
>> From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
>>
>> Currently, there is a problem with handling cases where functional
>> dependencies between devices are involved.
>>
>> What I mean by a "functional dependency" is when the driver of device
>> B needs both device A and its driver to be present and functional to
>> be able to work.  This implies that the driver of A needs to be
>> working for B to be probed successfully and it cannot be unbound from
>> the device before the B's driver.  This also has certain consequences
>> for power management of these devices (suspend/resume and runtime PM
>> ordering).
>>
>> Add support for representing those functional dependencies between
>> devices to allow the driver core to track them and act on them in
>> certain cases where they matter.
>
> Rafael has indicated that he intends to respin this series:
> https://lkml.org/lkml/2016/6/8/1061

That's OK.

> We also have such a functional dependency for Thunderbolt on Macs:
> On resume from system sleep, the PCIe hotplug ports may not resume
> before the thunderbolt driver has reestablished the PCI tunnels.
> Currently this is enforced by quirk_apple_wait_for_thunderbolt()
> in drivers/pci/quirks.c. It would be good if we could represent
> this dependency using something like Rafael's approach instead of
> open coding it, however one detail in Rafael's patches is problematic:
>
>> New links are added by calling device_link_add() which may happen
>> either before the consumer device is probed or when probing it, in
>> which case the caller needs to ensure that the driver of the
>> supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
>> flag should be passed to device_link_add() to reflect that.
>
> The thunderbolt driver cannot call device_link_add() before the
> PCIe hotplug ports are bound to a driver unless we amend portdrv
> to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
> if the thunderbolt driver isn't loaded.
>
> It would therefore be beneficial if device_link_add() can be
> called even *after* the consumer is bound.

I don't quite follow.

Who's the provider and who's the consumer here?

Thanks,
Rafael
Lukas Wunner June 17, 2016, 2:07 p.m. UTC | #3
On Fri, Jun 17, 2016 at 02:54:56PM +0200, Rafael J. Wysocki wrote:
> On Fri, Jun 17, 2016 at 12:36 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
> > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > We also have such a functional dependency for Thunderbolt on Macs:
> > On resume from system sleep, the PCIe hotplug ports may not resume
> > before the thunderbolt driver has reestablished the PCI tunnels.
> > Currently this is enforced by quirk_apple_wait_for_thunderbolt()
> > in drivers/pci/quirks.c. It would be good if we could represent
> > this dependency using something like Rafael's approach instead of
> > open coding it, however one detail in Rafael's patches is problematic:
> >
> > > New links are added by calling device_link_add() which may happen
> > > either before the consumer device is probed or when probing it, in
> > > which case the caller needs to ensure that the driver of the
> > > supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
> > > flag should be passed to device_link_add() to reflect that.
> >
> > The thunderbolt driver cannot call device_link_add() before the
> > PCIe hotplug ports are bound to a driver unless we amend portdrv
> > to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
> > if the thunderbolt driver isn't loaded.
> >
> > It would therefore be beneficial if device_link_add() can be
> > called even *after* the consumer is bound.
> 
> I don't quite follow.
> 
> Who's the provider and who's the consumer here?

thunderbolt.ko is the supplier. The PCIe hotplug ports of Thunderbolt
host controllers are the consumers.

If thunderbolt.ko is not loaded, no special precautions are needed for
the hotplug ports.

However *if* it is loaded and the system is suspended and there are
Thunderbolt devices plugged in, then on resume the hotplug ports must
wait for thunderbolt.ko to reestablish the PCI tunnels. If they do not
wait, things will go south because when pci_pm_resume_noirq() is called
for the devices *below* the hotplug port, we will try to put them into D0
(via pci_pm_default_resume_early()) and call pci_restore_state() even
though those hotplugged devices are not yet reachable.

So what I'd like to do is call device_link_add() when thunderbolt.ko
is loaded to shove a dependency onto the hotplug ports. But the hotplug
ports will long since have been bound at that point (to portdrv).
So it should be allowed to call device_link_add() ex post facto, after
the consumer has been bound.

Let me know if I still failed to convey the problem in an intelligible
way.

I had a similar problem with the Runtime PM for Thunderbolt series
(which BTW is still awaiting reviews):
http://www.spinics.net/lists/linux-pci/msg51158.html

The PM core allows calls to dev_pm_domain_set() only during ->probe or
if the device is unbound. Same constraint as with device_link_add().
This constraint was a major obstacle for me and I had to jump through
numerous additional hoops to work around it:

Thunderbolt controllers on Macs can be put into D3cold, but not with
standard ACPI methods, so platform_pci_power_manageable() is false for
these devices. Now this nothing unusual, the same applies to dual GPU
laptops (custom ACPI DSMs for Nvidia Optimus and AMD PowerXpress).

For such discrete GPUs, a struct dev_pm_domain is set during ->probe
(drivers/gpu/vga/vga_switcheroo.c: vga_switcheroo_init_domain_pm_ops()).
But I can't do that for Thunderbolt because the device in question is
a PCIe upstream port. Ideally I would shove a dev_pm_domain on the
upstream port when thunderbolt.ko loads, but again at that point
the port has long since been bound (to portdrv) so I can't call
dev_pm_domain_set().

The workaround I settled on is to attach to the upstream port as a
service driver, call down to the service driver's ->runtime_suspend
hooks when the port runtime suspends and power the controller down.
In addition, I had to make sure that saved_state isn't clobbered on
resume (patch [09/13]).

It would be good if someone who has a bird's eye view of the PM core
(i.e., you) could make a decision
(1) if it's possible and worthwhile to allow dev_pm_domain_set() for
    already bound devices, and
(2) if the PCI core should be be able to deal with devices which are
    runtime suspended to D3cold but not by the platform (because that's
    what patch [09/13] does).

If the answer to (2) is yes then there's some more stuff to fix:
Discrete GPUs on dual GPU laptops can be manually suspended to D3cold
behind the PM core's back by loading nouveau / radeon / amdgpu with
runpm=0 and writing OFF to the vga_switcheroo debugfs interface.
This is currently broken in conjunction with system sleep (the GPU
is no longer accessible after resume) because pci_pm_resume_noirq()
calls pci_restore_state() even though the device is in D3cold and not
power-manageable by the platform.

Thanks,

Lukas
Rafael J. Wysocki July 20, 2016, 12:33 a.m. UTC | #4
On Friday, June 17, 2016 04:07:38 PM Lukas Wunner wrote:
> On Fri, Jun 17, 2016 at 02:54:56PM +0200, Rafael J. Wysocki wrote:
> > On Fri, Jun 17, 2016 at 12:36 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
> > > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > > We also have such a functional dependency for Thunderbolt on Macs:
> > > On resume from system sleep, the PCIe hotplug ports may not resume
> > > before the thunderbolt driver has reestablished the PCI tunnels.
> > > Currently this is enforced by quirk_apple_wait_for_thunderbolt()
> > > in drivers/pci/quirks.c. It would be good if we could represent
> > > this dependency using something like Rafael's approach instead of
> > > open coding it, however one detail in Rafael's patches is problematic:
> > >
> > > > New links are added by calling device_link_add() which may happen
> > > > either before the consumer device is probed or when probing it, in
> > > > which case the caller needs to ensure that the driver of the
> > > > supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
> > > > flag should be passed to device_link_add() to reflect that.
> > >
> > > The thunderbolt driver cannot call device_link_add() before the
> > > PCIe hotplug ports are bound to a driver unless we amend portdrv
> > > to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
> > > if the thunderbolt driver isn't loaded.
> > >
> > > It would therefore be beneficial if device_link_add() can be
> > > called even *after* the consumer is bound.
> > 
> > I don't quite follow.
> > 
> > Who's the provider and who's the consumer here?
> 
> thunderbolt.ko is the supplier.

But it binds to the children of the ports that are supposed to be its
consumers?

Why is that even expected to work?

Thanks,
Rafael
Lukas Wunner July 20, 2016, 6:24 a.m. UTC | #5
On Wed, Jul 20, 2016 at 02:33:18AM +0200, Rafael J. Wysocki wrote:
> On Friday, June 17, 2016 04:07:38 PM Lukas Wunner wrote:
> > On Fri, Jun 17, 2016 at 02:54:56PM +0200, Rafael J. Wysocki wrote:
> > > On Fri, Jun 17, 2016 at 12:36 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
> > > > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > > > We also have such a functional dependency for Thunderbolt on Macs:
> > > > On resume from system sleep, the PCIe hotplug ports may not resume
> > > > before the thunderbolt driver has reestablished the PCI tunnels.
> > > > Currently this is enforced by quirk_apple_wait_for_thunderbolt()
> > > > in drivers/pci/quirks.c. It would be good if we could represent
> > > > this dependency using something like Rafael's approach instead of
> > > > open coding it, however one detail in Rafael's patches is problematic:
> > > >
> > > > > New links are added by calling device_link_add() which may happen
> > > > > either before the consumer device is probed or when probing it, in
> > > > > which case the caller needs to ensure that the driver of the
> > > > > supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
> > > > > flag should be passed to device_link_add() to reflect that.
> > > >
> > > > The thunderbolt driver cannot call device_link_add() before the
> > > > PCIe hotplug ports are bound to a driver unless we amend portdrv
> > > > to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
> > > > if the thunderbolt driver isn't loaded.
> > > >
> > > > It would therefore be beneficial if device_link_add() can be
> > > > called even *after* the consumer is bound.
> > > 
> > > I don't quite follow.
> > > 
> > > Who's the provider and who's the consumer here?
> > 
> > thunderbolt.ko is the supplier.
> 
> But it binds to the children of the ports that are supposed to be its
> consumers?
> 
> Why is that even expected to work?

No, the consumers are aunts (or uncles) of the supplier, if you will. :-)

The consumers are the hotplug ports (named "Downstream Bridge 1 / 2" in
the drawing below). The supplier is the NHI:

      (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
                                         +-- Downstream Bridge 1 --
                                         +-- Downstream Bridge 2 --
                                         ...

We're calling pci_power_up() and pci_restore_state() from
pci_pm_resume_noirq(). And that will fail for devices below
the hotplug ports if the PCI tunnels haven't been re-established
yet by the NHI.

Currently we achieve that via quirk_apple_wait_for_thunderbolt() in
drivers/pci/quirks.c. It would be more elegant if we could make this
relationship explicit with "device links" and let the core handle it.

Or am I mistaken and this particular use case is not what "device links"
are intended for?

Thanks,

Lukas
Rafael J. Wysocki July 20, 2016, 12:52 p.m. UTC | #6
On Wednesday, July 20, 2016 08:24:50 AM Lukas Wunner wrote:
> On Wed, Jul 20, 2016 at 02:33:18AM +0200, Rafael J. Wysocki wrote:
> > On Friday, June 17, 2016 04:07:38 PM Lukas Wunner wrote:
> > > On Fri, Jun 17, 2016 at 02:54:56PM +0200, Rafael J. Wysocki wrote:
> > > > On Fri, Jun 17, 2016 at 12:36 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > > On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
> > > > > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > > > > We also have such a functional dependency for Thunderbolt on Macs:
> > > > > On resume from system sleep, the PCIe hotplug ports may not resume
> > > > > before the thunderbolt driver has reestablished the PCI tunnels.
> > > > > Currently this is enforced by quirk_apple_wait_for_thunderbolt()
> > > > > in drivers/pci/quirks.c. It would be good if we could represent
> > > > > this dependency using something like Rafael's approach instead of
> > > > > open coding it, however one detail in Rafael's patches is problematic:
> > > > >
> > > > > > New links are added by calling device_link_add() which may happen
> > > > > > either before the consumer device is probed or when probing it, in
> > > > > > which case the caller needs to ensure that the driver of the
> > > > > > supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
> > > > > > flag should be passed to device_link_add() to reflect that.
> > > > >
> > > > > The thunderbolt driver cannot call device_link_add() before the
> > > > > PCIe hotplug ports are bound to a driver unless we amend portdrv
> > > > > to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
> > > > > if the thunderbolt driver isn't loaded.
> > > > >
> > > > > It would therefore be beneficial if device_link_add() can be
> > > > > called even *after* the consumer is bound.
> > > > 
> > > > I don't quite follow.
> > > > 
> > > > Who's the provider and who's the consumer here?
> > > 
> > > thunderbolt.ko is the supplier.
> > 
> > But it binds to the children of the ports that are supposed to be its
> > consumers?
> > 
> > Why is that even expected to work?
> 
> No, the consumers are aunts (or uncles) of the supplier, if you will. :-)
> 
> The consumers are the hotplug ports (named "Downstream Bridge 1 / 2" in
> the drawing below). The supplier is the NHI:
> 
>       (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
>                                          +-- Downstream Bridge 1 --
>                                          +-- Downstream Bridge 2 --
>                                          ...
> 
> We're calling pci_power_up() and pci_restore_state() from
> pci_pm_resume_noirq(). And that will fail for devices below
> the hotplug ports if the PCI tunnels haven't been re-established
> yet by the NHI.

So the NHI is a PCIe device, right?

Does the Thunderbolt driver bind to that device?

> Currently we achieve that via quirk_apple_wait_for_thunderbolt() in
> drivers/pci/quirks.c. It would be more elegant if we could make this
> relationship explicit with "device links" and let the core handle it.
> 
> Or am I mistaken and this particular use case is not what "device links"
> are intended for?

I'm not sure yet.

Thanks,
Rafael
Lukas Wunner July 20, 2016, 3:23 p.m. UTC | #7
On Wed, Jul 20, 2016 at 02:52:42PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, July 20, 2016 08:24:50 AM Lukas Wunner wrote:
> > On Wed, Jul 20, 2016 at 02:33:18AM +0200, Rafael J. Wysocki wrote:
> > > On Friday, June 17, 2016 04:07:38 PM Lukas Wunner wrote:
> > > > On Fri, Jun 17, 2016 at 02:54:56PM +0200, Rafael J. Wysocki wrote:
> > > > > On Fri, Jun 17, 2016 at 12:36 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > > > On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
> > > > > > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > > > > > We also have such a functional dependency for Thunderbolt on Macs:
> > > > > > On resume from system sleep, the PCIe hotplug ports may not resume
> > > > > > before the thunderbolt driver has reestablished the PCI tunnels.
> > > > > > Currently this is enforced by quirk_apple_wait_for_thunderbolt()
> > > > > > in drivers/pci/quirks.c. It would be good if we could represent
> > > > > > this dependency using something like Rafael's approach instead of
> > > > > > open coding it, however one detail in Rafael's patches is problematic:
> > > > > >
> > > > > > > New links are added by calling device_link_add() which may happen
> > > > > > > either before the consumer device is probed or when probing it, in
> > > > > > > which case the caller needs to ensure that the driver of the
> > > > > > > supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
> > > > > > > flag should be passed to device_link_add() to reflect that.
> > > > > >
> > > > > > The thunderbolt driver cannot call device_link_add() before the
> > > > > > PCIe hotplug ports are bound to a driver unless we amend portdrv
> > > > > > to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
> > > > > > if the thunderbolt driver isn't loaded.
> > > > > >
> > > > > > It would therefore be beneficial if device_link_add() can be
> > > > > > called even *after* the consumer is bound.
> > > > > 
> > > > > I don't quite follow.
> > > > > 
> > > > > Who's the provider and who's the consumer here?
> > > > 
> > > > thunderbolt.ko is the supplier.
> > > 
> > > But it binds to the children of the ports that are supposed to be its
> > > consumers?
> > > 
> > > Why is that even expected to work?
> > 
> > No, the consumers are aunts (or uncles) of the supplier, if you will. :-)
> > 
> > The consumers are the hotplug ports (named "Downstream Bridge 1 / 2" in
> > the drawing below). The supplier is the NHI:
> > 
> >       (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
> >                                          +-- Downstream Bridge 1 --
> >                                          +-- Downstream Bridge 2 --
> >                                          ...
> > 
> > We're calling pci_power_up() and pci_restore_state() from
> > pci_pm_resume_noirq(). And that will fail for devices below
> > the hotplug ports if the PCI tunnels haven't been re-established
> > yet by the NHI.
> 
> So the NHI is a PCIe device, right?
> 
> Does the Thunderbolt driver bind to that device?

The NHI is a PCI device but not a bridge. It has class 0x88000.
Yes, thunderbolt.ko binds to the NHI.

And portdrv binds to the upstream bridge and downstream bridges.
Those have class 0x60400.

Best regards,

Lukas
Rafael J. Wysocki July 20, 2016, 10:51 p.m. UTC | #8
On Wednesday, July 20, 2016 05:23:40 PM Lukas Wunner wrote:
> On Wed, Jul 20, 2016 at 02:52:42PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, July 20, 2016 08:24:50 AM Lukas Wunner wrote:
> > > On Wed, Jul 20, 2016 at 02:33:18AM +0200, Rafael J. Wysocki wrote:
> > > > On Friday, June 17, 2016 04:07:38 PM Lukas Wunner wrote:
> > > > > On Fri, Jun 17, 2016 at 02:54:56PM +0200, Rafael J. Wysocki wrote:
> > > > > > On Fri, Jun 17, 2016 at 12:36 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > > > > On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
> > > > > > > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > > > > > > We also have such a functional dependency for Thunderbolt on Macs:
> > > > > > > On resume from system sleep, the PCIe hotplug ports may not resume
> > > > > > > before the thunderbolt driver has reestablished the PCI tunnels.
> > > > > > > Currently this is enforced by quirk_apple_wait_for_thunderbolt()
> > > > > > > in drivers/pci/quirks.c. It would be good if we could represent
> > > > > > > this dependency using something like Rafael's approach instead of
> > > > > > > open coding it, however one detail in Rafael's patches is problematic:
> > > > > > >
> > > > > > > > New links are added by calling device_link_add() which may happen
> > > > > > > > either before the consumer device is probed or when probing it, in
> > > > > > > > which case the caller needs to ensure that the driver of the
> > > > > > > > supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
> > > > > > > > flag should be passed to device_link_add() to reflect that.
> > > > > > >
> > > > > > > The thunderbolt driver cannot call device_link_add() before the
> > > > > > > PCIe hotplug ports are bound to a driver unless we amend portdrv
> > > > > > > to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
> > > > > > > if the thunderbolt driver isn't loaded.
> > > > > > >
> > > > > > > It would therefore be beneficial if device_link_add() can be
> > > > > > > called even *after* the consumer is bound.
> > > > > > 
> > > > > > I don't quite follow.
> > > > > > 
> > > > > > Who's the provider and who's the consumer here?
> > > > > 
> > > > > thunderbolt.ko is the supplier.
> > > > 
> > > > But it binds to the children of the ports that are supposed to be its
> > > > consumers?
> > > > 
> > > > Why is that even expected to work?
> > > 
> > > No, the consumers are aunts (or uncles) of the supplier, if you will. :-)
> > > 
> > > The consumers are the hotplug ports (named "Downstream Bridge 1 / 2" in
> > > the drawing below). The supplier is the NHI:
> > > 
> > >       (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
> > >                                          +-- Downstream Bridge 1 --
> > >                                          +-- Downstream Bridge 2 --
> > >                                          ...
> > > 
> > > We're calling pci_power_up() and pci_restore_state() from
> > > pci_pm_resume_noirq(). And that will fail for devices below
> > > the hotplug ports if the PCI tunnels haven't been re-established
> > > yet by the NHI.
> > 
> > So the NHI is a PCIe device, right?
> > 
> > Does the Thunderbolt driver bind to that device?
> 
> The NHI is a PCI device but not a bridge. It has class 0x88000.
> Yes, thunderbolt.ko binds to the NHI.
> 
> And portdrv binds to the upstream bridge and downstream bridges.
> Those have class 0x60400.

OK, so why would there be a problem with creating links from the NHI (producer)
to the ports (consumers) before binding portdrv to them?

Thanks,
Rafael
Lukas Wunner July 20, 2016, 11:25 p.m. UTC | #9
On Thu, Jul 21, 2016 at 12:51:31AM +0200, Rafael J. Wysocki wrote:
> On Wednesday, July 20, 2016 05:23:40 PM Lukas Wunner wrote:
> > On Wed, Jul 20, 2016 at 02:52:42PM +0200, Rafael J. Wysocki wrote:
> > > On Wednesday, July 20, 2016 08:24:50 AM Lukas Wunner wrote:
> > > > On Wed, Jul 20, 2016 at 02:33:18AM +0200, Rafael J. Wysocki wrote:
> > > > > On Friday, June 17, 2016 04:07:38 PM Lukas Wunner wrote:
> > > > > > On Fri, Jun 17, 2016 at 02:54:56PM +0200, Rafael J. Wysocki wrote:
> > > > > > > On Fri, Jun 17, 2016 at 12:36 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > > > > > On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
> > > > > > > > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > > > > > > > We also have such a functional dependency for Thunderbolt on Macs:
> > > > > > > > On resume from system sleep, the PCIe hotplug ports may not resume
> > > > > > > > before the thunderbolt driver has reestablished the PCI tunnels.
> > > > > > > > Currently this is enforced by quirk_apple_wait_for_thunderbolt()
> > > > > > > > in drivers/pci/quirks.c. It would be good if we could represent
> > > > > > > > this dependency using something like Rafael's approach instead of
> > > > > > > > open coding it, however one detail in Rafael's patches is problematic:
> > > > > > > >
> > > > > > > > > New links are added by calling device_link_add() which may happen
> > > > > > > > > either before the consumer device is probed or when probing it, in
> > > > > > > > > which case the caller needs to ensure that the driver of the
> > > > > > > > > supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
> > > > > > > > > flag should be passed to device_link_add() to reflect that.
> > > > > > > >
> > > > > > > > The thunderbolt driver cannot call device_link_add() before the
> > > > > > > > PCIe hotplug ports are bound to a driver unless we amend portdrv
> > > > > > > > to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
> > > > > > > > if the thunderbolt driver isn't loaded.
> > > > > > > >
> > > > > > > > It would therefore be beneficial if device_link_add() can be
> > > > > > > > called even *after* the consumer is bound.
> > > > > > > 
> > > > > > > I don't quite follow.
> > > > > > > 
> > > > > > > Who's the provider and who's the consumer here?
> > > > > > 
> > > > > > thunderbolt.ko is the supplier.
> > > > > 
> > > > > But it binds to the children of the ports that are supposed to be its
> > > > > consumers?
> > > > > 
> > > > > Why is that even expected to work?
> > > > 
> > > > No, the consumers are aunts (or uncles) of the supplier, if you will. :-)
> > > > 
> > > > The consumers are the hotplug ports (named "Downstream Bridge 1 / 2" in
> > > > the drawing below). The supplier is the NHI:
> > > > 
> > > >       (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
> > > >                                          +-- Downstream Bridge 1 --
> > > >                                          +-- Downstream Bridge 2 --
> > > >                                          ...
> > > > 
> > > > We're calling pci_power_up() and pci_restore_state() from
> > > > pci_pm_resume_noirq(). And that will fail for devices below
> > > > the hotplug ports if the PCI tunnels haven't been re-established
> > > > yet by the NHI.
> > > 
> > > So the NHI is a PCIe device, right?
> > > 
> > > Does the Thunderbolt driver bind to that device?
> > 
> > The NHI is a PCI device but not a bridge. It has class 0x88000.
> > Yes, thunderbolt.ko binds to the NHI.
> > 
> > And portdrv binds to the upstream bridge and downstream bridges.
> > Those have class 0x60400.
> 
> OK, so why would there be a problem with creating links from the NHI
> (producer) to the ports (consumers) before binding portdrv to them?

Because the ordering in which drivers bind isn't guaranteed. At least
on my machine (Debian), portdrv always binds before thunderbolt.

I guess I could amend portdrv to return -EPROBE_DEFER on Macs if
no driver is bound to the NHI. Doesn't feel pretty to me though.

Ultimately this seems to be the same issue as with calling
dev_pm_domain_set() for a bound device. Perhaps device_link_add()
can likewise be allowed if a runtime PM ref is held for the devices
and the call happens under lock_system_sleep()?

Thanks,

Lukas
Rafael J. Wysocki July 21, 2016, 12:25 a.m. UTC | #10
On Thursday, July 21, 2016 01:25:53 AM Lukas Wunner wrote:
> On Thu, Jul 21, 2016 at 12:51:31AM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, July 20, 2016 05:23:40 PM Lukas Wunner wrote:
> > > On Wed, Jul 20, 2016 at 02:52:42PM +0200, Rafael J. Wysocki wrote:
> > > > On Wednesday, July 20, 2016 08:24:50 AM Lukas Wunner wrote:
> > > > > On Wed, Jul 20, 2016 at 02:33:18AM +0200, Rafael J. Wysocki wrote:
> > > > > > On Friday, June 17, 2016 04:07:38 PM Lukas Wunner wrote:
> > > > > > > On Fri, Jun 17, 2016 at 02:54:56PM +0200, Rafael J. Wysocki wrote:
> > > > > > > > On Fri, Jun 17, 2016 at 12:36 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > > > > > > On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
> > > > > > > > > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > > > > > > > > We also have such a functional dependency for Thunderbolt on Macs:
> > > > > > > > > On resume from system sleep, the PCIe hotplug ports may not resume
> > > > > > > > > before the thunderbolt driver has reestablished the PCI tunnels.
> > > > > > > > > Currently this is enforced by quirk_apple_wait_for_thunderbolt()
> > > > > > > > > in drivers/pci/quirks.c. It would be good if we could represent
> > > > > > > > > this dependency using something like Rafael's approach instead of
> > > > > > > > > open coding it, however one detail in Rafael's patches is problematic:
> > > > > > > > >
> > > > > > > > > > New links are added by calling device_link_add() which may happen
> > > > > > > > > > either before the consumer device is probed or when probing it, in
> > > > > > > > > > which case the caller needs to ensure that the driver of the
> > > > > > > > > > supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
> > > > > > > > > > flag should be passed to device_link_add() to reflect that.
> > > > > > > > >
> > > > > > > > > The thunderbolt driver cannot call device_link_add() before the
> > > > > > > > > PCIe hotplug ports are bound to a driver unless we amend portdrv
> > > > > > > > > to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
> > > > > > > > > if the thunderbolt driver isn't loaded.
> > > > > > > > >
> > > > > > > > > It would therefore be beneficial if device_link_add() can be
> > > > > > > > > called even *after* the consumer is bound.
> > > > > > > > 
> > > > > > > > I don't quite follow.
> > > > > > > > 
> > > > > > > > Who's the provider and who's the consumer here?
> > > > > > > 
> > > > > > > thunderbolt.ko is the supplier.
> > > > > > 
> > > > > > But it binds to the children of the ports that are supposed to be its
> > > > > > consumers?
> > > > > > 
> > > > > > Why is that even expected to work?
> > > > > 
> > > > > No, the consumers are aunts (or uncles) of the supplier, if you will. :-)
> > > > > 
> > > > > The consumers are the hotplug ports (named "Downstream Bridge 1 / 2" in
> > > > > the drawing below). The supplier is the NHI:
> > > > > 
> > > > >       (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
> > > > >                                          +-- Downstream Bridge 1 --
> > > > >                                          +-- Downstream Bridge 2 --
> > > > >                                          ...
> > > > > 
> > > > > We're calling pci_power_up() and pci_restore_state() from
> > > > > pci_pm_resume_noirq(). And that will fail for devices below
> > > > > the hotplug ports if the PCI tunnels haven't been re-established
> > > > > yet by the NHI.
> > > > 
> > > > So the NHI is a PCIe device, right?
> > > > 
> > > > Does the Thunderbolt driver bind to that device?
> > > 
> > > The NHI is a PCI device but not a bridge. It has class 0x88000.
> > > Yes, thunderbolt.ko binds to the NHI.
> > > 
> > > And portdrv binds to the upstream bridge and downstream bridges.
> > > Those have class 0x60400.
> > 
> > OK, so why would there be a problem with creating links from the NHI
> > (producer) to the ports (consumers) before binding portdrv to them?
> 
> Because the ordering in which drivers bind isn't guaranteed. At least
> on my machine (Debian), portdrv always binds before thunderbolt.

But what drivers have to do with that really?  Do you need drivers to
know that the dependency is there?

Just add likns *before* even probing for drivers (yes, you can do that)
and the core will handle that for you.

> I guess I could amend portdrv to return -EPROBE_DEFER on Macs if
> no driver is bound to the NHI. Doesn't feel pretty to me though.
> 
> Ultimately this seems to be the same issue as with calling
> dev_pm_domain_set() for a bound device. Perhaps device_link_add()
> can likewise be allowed if a runtime PM ref is held for the devices
> and the call happens under lock_system_sleep()?

No, the whole synchronization scheme in the links code would have had to be
changed for that to really work.

And it really is about what is needed (at least in principle) to run your
device.  If you think you need device X with a driver to handle device Y
correctly, then either you need it all the time, from probe to remove, or
you just don't really need it at all.

Thanks,
Rafael
Lukas Wunner July 24, 2016, 10:48 p.m. UTC | #11
On Thu, Jul 21, 2016 at 02:25:15AM +0200, Rafael J. Wysocki wrote:
> On Thursday, July 21, 2016 01:25:53 AM Lukas Wunner wrote:
> > On Thu, Jul 21, 2016 at 12:51:31AM +0200, Rafael J. Wysocki wrote:
> > > On Wednesday, July 20, 2016 05:23:40 PM Lukas Wunner wrote:
> > > > On Wed, Jul 20, 2016 at 02:52:42PM +0200, Rafael J. Wysocki wrote:
> > > > > On Wednesday, July 20, 2016 08:24:50 AM Lukas Wunner wrote:
> > > > > > On Wed, Jul 20, 2016 at 02:33:18AM +0200, Rafael J. Wysocki wrote:
> > > > > > > On Friday, June 17, 2016 04:07:38 PM Lukas Wunner wrote:
> > > > > > > > On Fri, Jun 17, 2016 at 02:54:56PM +0200, Rafael J. Wysocki wrote:
> > > > > > > > > On Fri, Jun 17, 2016 at 12:36 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > > > > > > > On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
> > > > > > > > > > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > > > > > > > > > We also have such a functional dependency for Thunderbolt on Macs:
> > > > > > > > > > On resume from system sleep, the PCIe hotplug ports may not resume
> > > > > > > > > > before the thunderbolt driver has reestablished the PCI tunnels.
> > > > > > > > > > Currently this is enforced by quirk_apple_wait_for_thunderbolt()
> > > > > > > > > > in drivers/pci/quirks.c. It would be good if we could represent
> > > > > > > > > > this dependency using something like Rafael's approach instead of
> > > > > > > > > > open coding it, however one detail in Rafael's patches is problematic:
> > > > > > > > > >
> > > > > > > > > > > New links are added by calling device_link_add() which may happen
> > > > > > > > > > > either before the consumer device is probed or when probing it, in
> > > > > > > > > > > which case the caller needs to ensure that the driver of the
> > > > > > > > > > > supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
> > > > > > > > > > > flag should be passed to device_link_add() to reflect that.
> > > > > > > > > >
> > > > > > > > > > The thunderbolt driver cannot call device_link_add() before the
> > > > > > > > > > PCIe hotplug ports are bound to a driver unless we amend portdrv
> > > > > > > > > > to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
> > > > > > > > > > if the thunderbolt driver isn't loaded.
> > > > > > > > > >
> > > > > > > > > > It would therefore be beneficial if device_link_add() can be
> > > > > > > > > > called even *after* the consumer is bound.
> > > > > > > > > 
> > > > > > > > > I don't quite follow.
> > > > > > > > > 
> > > > > > > > > Who's the provider and who's the consumer here?
> > > > > > > > 
> > > > > > > > thunderbolt.ko is the supplier.
> > > > > > > 
> > > > > > > But it binds to the children of the ports that are supposed to be its
> > > > > > > consumers?
> > > > > > > 
> > > > > > > Why is that even expected to work?
> > > > > > 
> > > > > > No, the consumers are aunts (or uncles) of the supplier, if you will. :-)
> > > > > > 
> > > > > > The consumers are the hotplug ports (named "Downstream Bridge 1 / 2" in
> > > > > > the drawing below). The supplier is the NHI:
> > > > > > 
> > > > > >       (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
> > > > > >                                          +-- Downstream Bridge 1 --
> > > > > >                                          +-- Downstream Bridge 2 --
> > > > > >                                          ...
> > > > > > 
> > > > > > We're calling pci_power_up() and pci_restore_state() from
> > > > > > pci_pm_resume_noirq(). And that will fail for devices below
> > > > > > the hotplug ports if the PCI tunnels haven't been re-established
> > > > > > yet by the NHI.
> > > > > 
> > > > > So the NHI is a PCIe device, right?
> > > > > 
> > > > > Does the Thunderbolt driver bind to that device?
> > > > 
> > > > The NHI is a PCI device but not a bridge. It has class 0x88000.
> > > > Yes, thunderbolt.ko binds to the NHI.
> > > > 
> > > > And portdrv binds to the upstream bridge and downstream bridges.
> > > > Those have class 0x60400.
> > > 
> > > OK, so why would there be a problem with creating links from the NHI
> > > (producer) to the ports (consumers) before binding portdrv to them?
> > 
> > Because the ordering in which drivers bind isn't guaranteed. At least
> > on my machine (Debian), portdrv always binds before thunderbolt.
> 
> But what drivers have to do with that really?  Do you need drivers to
> know that the dependency is there?
> 
> Just add likns *before* even probing for drivers (yes, you can do that)
> and the core will handle that for you.

Forgive me for being dense: How do you suggest to add links before
probing drivers? Only way I could think of is with a PCI quirk.

Which is what we're already doing right now (see drivers/pci/quirk.c:
quirk_apple_wait_for_thunderbolt()). And it ain't pretty.


> > I guess I could amend portdrv to return -EPROBE_DEFER on Macs if
> > no driver is bound to the NHI. Doesn't feel pretty to me though.
> > 
> > Ultimately this seems to be the same issue as with calling
> > dev_pm_domain_set() for a bound device. Perhaps device_link_add()
> > can likewise be allowed if a runtime PM ref is held for the devices
> > and the call happens under lock_system_sleep()?
> 
> No, the whole synchronization scheme in the links code would have had to be
> changed for that to really work.
> 
> And it really is about what is needed (at least in principle) to run your
> device.  If you think you need device X with a driver to handle device Y
> correctly, then either you need it all the time, from probe to remove, or
> you just don't really need it at all.

Real life isn't as simple as that.

In this case, we have consumers (hotplug ports) which are doing fine
if the driver for the supplier (NHI) is not loaded. But once it loads,
the links must be in place. Seems only logical to put the links in
place when they're needed, i.e. at load time of the supplier's driver.
Which the patch set doesn't allow right now.

Best regards,

Lukas
Rafael J. Wysocki July 28, 2016, 12:30 a.m. UTC | #12
On Monday, July 25, 2016 12:48:32 AM Lukas Wunner wrote:
> On Thu, Jul 21, 2016 at 02:25:15AM +0200, Rafael J. Wysocki wrote:
> > On Thursday, July 21, 2016 01:25:53 AM Lukas Wunner wrote:
> > > On Thu, Jul 21, 2016 at 12:51:31AM +0200, Rafael J. Wysocki wrote:
> > > > On Wednesday, July 20, 2016 05:23:40 PM Lukas Wunner wrote:
> > > > > On Wed, Jul 20, 2016 at 02:52:42PM +0200, Rafael J. Wysocki wrote:
> > > > > > On Wednesday, July 20, 2016 08:24:50 AM Lukas Wunner wrote:
> > > > > > > On Wed, Jul 20, 2016 at 02:33:18AM +0200, Rafael J. Wysocki wrote:
> > > > > > > > On Friday, June 17, 2016 04:07:38 PM Lukas Wunner wrote:
> > > > > > > > > On Fri, Jun 17, 2016 at 02:54:56PM +0200, Rafael J. Wysocki wrote:
> > > > > > > > > > On Fri, Jun 17, 2016 at 12:36 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > > > > > > > > On Fri, Jun 17, 2016 at 08:26:52AM +0200, Marek Szyprowski wrote:
> > > > > > > > > > > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > > > > > > > > > > We also have such a functional dependency for Thunderbolt on Macs:
> > > > > > > > > > > On resume from system sleep, the PCIe hotplug ports may not resume
> > > > > > > > > > > before the thunderbolt driver has reestablished the PCI tunnels.
> > > > > > > > > > > Currently this is enforced by quirk_apple_wait_for_thunderbolt()
> > > > > > > > > > > in drivers/pci/quirks.c. It would be good if we could represent
> > > > > > > > > > > this dependency using something like Rafael's approach instead of
> > > > > > > > > > > open coding it, however one detail in Rafael's patches is problematic:
> > > > > > > > > > >
> > > > > > > > > > > > New links are added by calling device_link_add() which may happen
> > > > > > > > > > > > either before the consumer device is probed or when probing it, in
> > > > > > > > > > > > which case the caller needs to ensure that the driver of the
> > > > > > > > > > > > supplier device is present and functional and the DEVICE_LINK_PROBE_TIME
> > > > > > > > > > > > flag should be passed to device_link_add() to reflect that.
> > > > > > > > > > >
> > > > > > > > > > > The thunderbolt driver cannot call device_link_add() before the
> > > > > > > > > > > PCIe hotplug ports are bound to a driver unless we amend portdrv
> > > > > > > > > > > to return -EPROBE_DEFER for Thunderbolt hotplug ports on Macs
> > > > > > > > > > > if the thunderbolt driver isn't loaded.
> > > > > > > > > > >
> > > > > > > > > > > It would therefore be beneficial if device_link_add() can be
> > > > > > > > > > > called even *after* the consumer is bound.
> > > > > > > > > > 
> > > > > > > > > > I don't quite follow.
> > > > > > > > > > 
> > > > > > > > > > Who's the provider and who's the consumer here?
> > > > > > > > > 
> > > > > > > > > thunderbolt.ko is the supplier.
> > > > > > > > 
> > > > > > > > But it binds to the children of the ports that are supposed to be its
> > > > > > > > consumers?
> > > > > > > > 
> > > > > > > > Why is that even expected to work?
> > > > > > > 
> > > > > > > No, the consumers are aunts (or uncles) of the supplier, if you will. :-)
> > > > > > > 
> > > > > > > The consumers are the hotplug ports (named "Downstream Bridge 1 / 2" in
> > > > > > > the drawing below). The supplier is the NHI:
> > > > > > > 
> > > > > > >       (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
> > > > > > >                                          +-- Downstream Bridge 1 --
> > > > > > >                                          +-- Downstream Bridge 2 --
> > > > > > >                                          ...
> > > > > > > 
> > > > > > > We're calling pci_power_up() and pci_restore_state() from
> > > > > > > pci_pm_resume_noirq(). And that will fail for devices below
> > > > > > > the hotplug ports if the PCI tunnels haven't been re-established
> > > > > > > yet by the NHI.
> > > > > > 
> > > > > > So the NHI is a PCIe device, right?
> > > > > > 
> > > > > > Does the Thunderbolt driver bind to that device?
> > > > > 
> > > > > The NHI is a PCI device but not a bridge. It has class 0x88000.
> > > > > Yes, thunderbolt.ko binds to the NHI.
> > > > > 
> > > > > And portdrv binds to the upstream bridge and downstream bridges.
> > > > > Those have class 0x60400.
> > > > 
> > > > OK, so why would there be a problem with creating links from the NHI
> > > > (producer) to the ports (consumers) before binding portdrv to them?
> > > 
> > > Because the ordering in which drivers bind isn't guaranteed. At least
> > > on my machine (Debian), portdrv always binds before thunderbolt.
> > 
> > But what drivers have to do with that really?  Do you need drivers to
> > know that the dependency is there?
> > 
> > Just add likns *before* even probing for drivers (yes, you can do that)
> > and the core will handle that for you.
> 
> Forgive me for being dense: How do you suggest to add links before
> probing drivers? Only way I could think of is with a PCI quirk.
> 
> Which is what we're already doing right now (see drivers/pci/quirk.c:
> quirk_apple_wait_for_thunderbolt()). And it ain't pretty.

Well, maybe not, but doing it once during enumeration would be better than
on every resume.

Plus there is runtime PM to cover.

> > > I guess I could amend portdrv to return -EPROBE_DEFER on Macs if
> > > no driver is bound to the NHI. Doesn't feel pretty to me though.
> > > 
> > > Ultimately this seems to be the same issue as with calling
> > > dev_pm_domain_set() for a bound device. Perhaps device_link_add()
> > > can likewise be allowed if a runtime PM ref is held for the devices
> > > and the call happens under lock_system_sleep()?
> > 
> > No, the whole synchronization scheme in the links code would have had to be
> > changed for that to really work.
> > 
> > And it really is about what is needed (at least in principle) to run your
> > device.  If you think you need device X with a driver to handle device Y
> > correctly, then either you need it all the time, from probe to remove, or
> > you just don't really need it at all.
> 
> Real life isn't as simple as that.
> 
> In this case, we have consumers (hotplug ports) which are doing fine
> if the driver for the supplier (NHI) is not loaded. But once it loads,
> the links must be in place.

Hmm.

What if it is not loaded and the system suspends.  Will everything work
as expected after the subsequent resume?

Thanks,
Rafael
Lukas Wunner July 28, 2016, 3:28 p.m. UTC | #13
On Thu, Jul 28, 2016 at 02:30:31AM +0200, Rafael J. Wysocki wrote:
> On Monday, July 25, 2016 12:48:32 AM Lukas Wunner wrote:
> > On Thu, Jul 21, 2016 at 02:25:15AM +0200, Rafael J. Wysocki wrote:
> > > On Thursday, July 21, 2016 01:25:53 AM Lukas Wunner wrote:
> > > > I guess I could amend portdrv to return -EPROBE_DEFER on Macs if
> > > > no driver is bound to the NHI. Doesn't feel pretty to me though.
> > > > 
> > > > Ultimately this seems to be the same issue as with calling
> > > > dev_pm_domain_set() for a bound device. Perhaps device_link_add()
> > > > can likewise be allowed if a runtime PM ref is held for the devices
> > > > and the call happens under lock_system_sleep()?
> > > 
> > > No, the whole synchronization scheme in the links code would have had to be
> > > changed for that to really work.
> > > 
> > > And it really is about what is needed (at least in principle) to run your
> > > device.  If you think you need device X with a driver to handle device Y
> > > correctly, then either you need it all the time, from probe to remove, or
> > > you just don't really need it at all.
> > 
> > Real life isn't as simple as that.
> > 
> > In this case, we have consumers (hotplug ports) which are doing fine
> > if the driver for the supplier (NHI) is not loaded. But once it loads,
> > the links must be in place.
> 
> Hmm.
> 
> What if it is not loaded and the system suspends.  Will everything work
> as expected after the subsequent resume?

The short answer is yes.

Long answer:

With Thunderbolt, the switch fabric is told to set up PCI tunnels
through the NHI (Native Host Interface). Once set up, the tunnels
stay as they are and attached devices are reachable. However after
a power cycle of the controller (suspend/resume), the tunnels are
gone and need to be re-established.

On Macs, there are two software components communicating with the
NHI: The first one is an EFI driver which sets up tunnels to all
devices present on boot and lights up all attached DP-over-Thunderbolt
displays. Once ExitBootServices is called, the EFI driver is shut
down but the configured tunnels stay as they are. The kernel is thus
able to enumerate attached PCI devices.

The second component is the OS driver, thunderbolt.ko. It is needed
to set up tunnels to hot-plugged devices (i.e., not present at boot).
It is also needed to re-establish tunnels after suspend/resume.

The necessity of quirk_apple_wait_for_thunderbolt() arises because
we walk the entire PCI hierarchy during ->resume_noirq and call
pci_power_up() and pci_restore_state() for each device. Now remember,
the PCI tunnels are gone after a power cycle, so the attached devices
aren't reachable. Waking them and restoring their state will fail
unless the thunderbolt driver reconfigures the switch fabric first.

=> So if there are no devices attached and thunderbolt.ko isn't loaded,
   everything is fine. No device link needed.

=> If devices are attached and thunderbolt.ko is loaded, then the hotplug
   ports need to wait for re-establishment of the PCI tunnels.
   Device link is needed.

=> If devices were attached on boot and thunderbolt.ko isn't loaded, they
   will be unreachable after resume. Nothing we can do about that.
   No device link needed.

So this is a case of a "weak" device link, "weak" referring to the fact
that it's only needed if the supplier is bound.

All that said, I don't know if this case exists often enough that it's
worth making allowances for it in the driver core.

Sorry for the wall of text, just want to make sure we're on the same page
and all possible use cases of device links are discussed and considered.

Thanks,

Lukas
Rafael J. Wysocki Sept. 6, 2016, 11:57 p.m. UTC | #14
On Thursday, July 28, 2016 05:28:31 PM Lukas Wunner wrote:
> On Thu, Jul 28, 2016 at 02:30:31AM +0200, Rafael J. Wysocki wrote:
> > On Monday, July 25, 2016 12:48:32 AM Lukas Wunner wrote:
> > > On Thu, Jul 21, 2016 at 02:25:15AM +0200, Rafael J. Wysocki wrote:
> > > > On Thursday, July 21, 2016 01:25:53 AM Lukas Wunner wrote:
> > > > > I guess I could amend portdrv to return -EPROBE_DEFER on Macs if
> > > > > no driver is bound to the NHI. Doesn't feel pretty to me though.
> > > > > 
> > > > > Ultimately this seems to be the same issue as with calling
> > > > > dev_pm_domain_set() for a bound device. Perhaps device_link_add()
> > > > > can likewise be allowed if a runtime PM ref is held for the devices
> > > > > and the call happens under lock_system_sleep()?
> > > > 
> > > > No, the whole synchronization scheme in the links code would have had to be
> > > > changed for that to really work.
> > > > 
> > > > And it really is about what is needed (at least in principle) to run your
> > > > device.  If you think you need device X with a driver to handle device Y
> > > > correctly, then either you need it all the time, from probe to remove, or
> > > > you just don't really need it at all.
> > > 
> > > Real life isn't as simple as that.
> > > 
> > > In this case, we have consumers (hotplug ports) which are doing fine
> > > if the driver for the supplier (NHI) is not loaded. But once it loads,
> > > the links must be in place.
> > 
> > Hmm.
> > 
> > What if it is not loaded and the system suspends.  Will everything work
> > as expected after the subsequent resume?
> 
> The short answer is yes.

OK

I think it's possible to add a link flag to address this case.

Namely, if that flag is passed to device_link_add(), the link will be
added in the DEVICE_LINK_ACTIVE state right away, but that will need to
be synchronized against all possible transitions of the consumer device
(at least).

It's better to do that in a separate patch for this reason IMO.

Thanks,
Rafael
diff mbox

Patch

diff --git a/drivers/base/base.h b/drivers/base/base.h
index e05db388bd1c..cccb1d211541 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -107,6 +107,9 @@  extern void bus_remove_device(struct device *dev);
 
 extern int bus_add_driver(struct device_driver *drv);
 extern void bus_remove_driver(struct device_driver *drv);
+extern void device_release_driver_internal(struct device *dev,
+					   struct device_driver *drv,
+					   struct device *parent);
 
 extern void driver_detach(struct device_driver *drv);
 extern int driver_probe_device(struct device_driver *drv, struct device *dev);
@@ -152,3 +155,11 @@  extern int devtmpfs_init(void);
 #else
 static inline int devtmpfs_init(void) { return 0; }
 #endif
+
+/* Device links */
+extern int device_links_check_suppliers(struct device *dev);
+extern void device_links_driver_bound(struct device *dev);
+extern void device_links_driver_gone(struct device *dev);
+extern void device_links_no_driver(struct device *dev);
+extern bool device_links_busy(struct device *dev);
+extern void device_links_unbind_consumers(struct device *dev);
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0a8bdade53f2..416341df3268 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -44,6 +44,367 @@  static int __init sysfs_deprecated_setup(char *arg)
 early_param("sysfs.deprecated", sysfs_deprecated_setup);
 #endif
 
+/* Device links support. */
+
+DEFINE_STATIC_SRCU(device_links_srcu);
+static DEFINE_MUTEX(device_links_lock);
+
+static int device_reorder_to_tail(struct device *dev, void *not_used)
+{
+	struct devlink *link;
+
+	devices_kset_move_last(dev);
+	device_pm_move_last(dev);
+	device_for_each_child(dev, NULL, device_reorder_to_tail);
+	list_for_each_entry(link, &dev->consumer_links, c_node)
+		device_reorder_to_tail(link->consumer, NULL);
+
+	return 0;
+}
+
+/**
+ * device_link_add - Create a link between two devices.
+ * @consumer: Consumer end of the link.
+ * @supplier: Supplier end of the link.
+ * @flags: Link flags.
+ *
+ * At least one of the flags must be set.  If DEVICE_LINK_PROBE_TIME is set, the
+ * caller is expected to know that (a) the supplier device is present and active
+ * (ie. its driver is functional) and (b) the consumer device is probing at the
+ * moment and therefore the initial state of the link will be "consumer probe"
+ * in that case.  If DEVICE_LINK_PROBE_TIME is not set, DEVICE_LINK_PERSISTENT
+ * must be set (meaning that the link will not go away when the consumer driver
+ * goes away).
+ *
+ * A side effect of the link creation is re-ordering of dpm_list and the
+ * devices_kset list by moving the consumer device and all devices depending
+ * on it to the ends of those lists.
+ */
+struct devlink *device_link_add(struct device *consumer,
+				struct device *supplier, u32 flags)
+{
+	struct devlink *link;
+
+	if (!consumer || !supplier || !flags)
+		return NULL;
+
+	mutex_lock(&device_links_lock);
+
+	list_for_each_entry(link, &supplier->supplier_links, s_node)
+		if (link->consumer == consumer)
+			goto out;
+
+	link = kmalloc(sizeof(*link), GFP_KERNEL);
+	if (!link)
+		goto out;
+
+	get_device(supplier);
+	link->supplier = supplier;
+	INIT_LIST_HEAD(&link->s_node);
+	get_device(consumer);
+	link->consumer = consumer;
+	INIT_LIST_HEAD(&link->c_node);
+	link->flags = flags;
+	link->status = (flags & DEVICE_LINK_PROBE_TIME) ?
+			DEVICE_LINK_CONSUMER_PROBE : DEVICE_LINK_DORMANT;
+	spin_lock_init(&link->lock);
+
+	/*
+	 * Move the consumer and all of the devices depending on it to the end
+	 * of dpm_list and the devices_kset list.
+	 *
+	 * We have to hold dpm_list locked throughout all that or else we may
+	 * end up suspending with a wrong ordering of it.
+	 */
+	device_pm_lock();
+	device_reorder_to_tail(consumer, NULL);
+	device_pm_unlock();
+
+	list_add_tail_rcu(&link->s_node, &supplier->supplier_links);
+	list_add_tail_rcu(&link->c_node, &consumer->consumer_links);
+
+	dev_info(consumer, "Linked as a consumer to %s\n", dev_name(supplier));
+
+ out:
+	mutex_unlock(&device_links_lock);
+	return link;
+}
+EXPORT_SYMBOL_GPL(device_link_add);
+
+static void __devlink_free_srcu(struct rcu_head *rhead)
+{
+	struct devlink *link;
+
+	link = container_of(rhead, struct devlink, rcu_head);
+	put_device(link->consumer);
+	put_device(link->supplier);
+	kfree(link);
+}
+
+static void devlink_del(struct devlink *link)
+{
+	dev_info(link->consumer, "Dropping the link to %s\n",
+		 dev_name(link->supplier));
+
+	list_del_rcu(&link->s_node);
+	list_del_rcu(&link->c_node);
+	call_srcu(&device_links_srcu, &link->rcu_head, __devlink_free_srcu);
+}
+
+/**
+ * device_link_del - Delete a link between two devices.
+ * @link: Device link to delete.
+ */
+void device_link_del(struct devlink *link)
+{
+	mutex_lock(&device_links_lock);
+	devlink_del(link);
+	mutex_unlock(&device_links_lock);
+}
+EXPORT_SYMBOL_GPL(device_link_del);
+
+static int device_links_read_lock(void)
+{
+	return srcu_read_lock(&device_links_srcu);
+}
+
+static void device_links_read_unlock(int idx)
+{
+	return srcu_read_unlock(&device_links_srcu, idx);
+}
+
+static void device_links_missing_supplier(struct device *dev)
+{
+	struct devlink *link;
+
+	list_for_each_entry_rcu(link, &dev->consumer_links, c_node) {
+		spin_lock(&link->lock);
+
+		if (link->status == DEVICE_LINK_CONSUMER_PROBE)
+			link->status = DEVICE_LINK_AVAILABLE;
+
+		spin_unlock(&link->lock);
+	}
+}
+
+/**
+ * device_links_check_suppliers - Check supplier devices for this one.
+ * @dev: Consumer device.
+ *
+ * Check links from this device to any suppliers.  Walk the list of the device's
+ * consumer links and see if all of the suppliers are available.  If not, simply
+ * return -EPROBE_DEFER.
+ *
+ * Walk the list under SRCU and check each link's status field under its lock.
+ *
+ * We need to guarantee that the supplier will not go away after the check has
+ * been positive here.  It only can go away in __device_release_driver() and
+ * that function  checks the device's links to consumers.  This means we need to
+ * mark the link as "consumer probe in progress" to make the supplier removal
+ * wait for us to complete (or bad things may happen).
+ */
+int device_links_check_suppliers(struct device *dev)
+{
+	struct devlink *link;
+	int idx, ret = 0;
+
+	idx = device_links_read_lock();
+
+	list_for_each_entry_rcu(link, &dev->consumer_links, c_node) {
+		spin_lock(&link->lock);
+		if (link->status != DEVICE_LINK_AVAILABLE) {
+			spin_unlock(&link->lock);
+			device_links_missing_supplier(dev);
+			ret = -EPROBE_DEFER;
+			break;
+		}
+		link->status = DEVICE_LINK_CONSUMER_PROBE;
+		spin_unlock(&link->lock);
+	}
+
+	device_links_read_unlock(idx);
+	return ret;
+}
+
+/**
+ * device_links_driver_bound - Update device links after probing its driver.
+ * @dev: Device to update the links for.
+ *
+ * The probe has been successful, so update links from this device to any
+ * consumers by changing their status to "available".
+ *
+ * Also change the status of @dev's links to suppliers to "active".
+ */
+void device_links_driver_bound(struct device *dev)
+{
+	struct devlink *link;
+	int idx;
+
+	idx = device_links_read_lock();
+
+	list_for_each_entry_rcu(link, &dev->supplier_links, s_node) {
+		spin_lock(&link->lock);
+		WARN_ON(link->status != DEVICE_LINK_DORMANT);
+		link->status = DEVICE_LINK_AVAILABLE;
+		spin_unlock(&link->lock);
+	}
+
+	list_for_each_entry_rcu(link, &dev->consumer_links, c_node) {
+		spin_lock(&link->lock);
+		WARN_ON(link->status != DEVICE_LINK_CONSUMER_PROBE);
+		link->status = DEVICE_LINK_ACTIVE;
+		spin_unlock(&link->lock);
+	}
+
+	device_links_read_unlock(idx);
+}
+
+/**
+ * device_links_driver_gone - Update links after driver removal.
+ * @dev: Device whose driver has gone away.
+ *
+ * Update links to consumers for @dev by changing their status to "dormant".
+ */
+void device_links_driver_gone(struct device *dev)
+{
+	struct devlink *link;
+	int idx;
+
+	idx = device_links_read_lock();
+
+	list_for_each_entry_rcu(link, &dev->supplier_links, s_node) {
+		WARN_ON(!(link->flags & DEVICE_LINK_PERSISTENT));
+		spin_lock(&link->lock);
+		WARN_ON(link->status != DEVICE_LINK_SUPPLIER_UNBIND);
+		link->status = DEVICE_LINK_DORMANT;
+		spin_unlock(&link->lock);
+	}
+
+	device_links_read_unlock(idx);
+}
+
+/**
+ * device_links_no_driver - Update links of a device without a driver.
+ * @dev: Device without a drvier.
+ *
+ * Delete all non-persistent links from this device to any suppliers.
+ * Persistent links stay around, but their status is changed to "available",
+ * unless they already are in the "supplier unbind in progress" state in which
+ * case they need not be updated.
+ */
+void device_links_no_driver(struct device *dev)
+{
+	struct devlink *link, *ln;
+
+	mutex_lock(&device_links_lock);
+
+	list_for_each_entry_safe_reverse(link, ln, &dev->consumer_links, c_node)
+		if (link->flags & DEVICE_LINK_PERSISTENT) {
+			spin_lock(&link->lock);
+
+			if (link->status != DEVICE_LINK_SUPPLIER_UNBIND)
+				link->status = DEVICE_LINK_AVAILABLE;
+
+			spin_unlock(&link->lock);
+		} else {
+			devlink_del(link);
+		}
+
+	mutex_unlock(&device_links_lock);
+}
+
+/**
+ * device_links_busy - Check if there are any busy links to consumers.
+ * @dev: Device to check.
+ *
+ * Check each consumer of the device and return 'true' it if its link's status
+ * is one of "consumer probe" or "active" (meaning that the given consumer is
+ * probing right now or its driver is present).  Otherwise, change the link
+ * state to "supplier unbind" to prevent the consumer from being probed
+ * successfully going forward.
+ *
+ * Return 'false' if there are no probing or active consumers.
+ */
+bool device_links_busy(struct device *dev)
+{
+	struct devlink *link;
+	int idx;
+	bool ret = false;
+
+	idx = device_links_read_lock();
+
+	list_for_each_entry_rcu(link, &dev->supplier_links, s_node) {
+		spin_lock(&link->lock);
+		if (link->status == DEVICE_LINK_CONSUMER_PROBE
+		    || link->status == DEVICE_LINK_ACTIVE) {
+			spin_unlock(&link->lock);
+			ret = true;
+			break;
+		}
+		link->status = DEVICE_LINK_SUPPLIER_UNBIND;
+		spin_unlock(&link->lock);
+	}
+
+	device_links_read_unlock(idx);
+	return ret;
+}
+
+/**
+ * device_links_unbind_consumers - Force unbind consumers of the given device.
+ * @dev: Device to unbind the consumers of.
+ *
+ * Walk the list of links to consumers for @dev and if any of them is in the
+ * "consumer probe" state, wait for all device probes in progress to complete
+ * and start over.
+ *
+ * If that's not the case, change the status of the link to "supplier unbind"
+ * and check if the link was in the "active" state.  If so, force the consumer
+ * driver to unbind and start over (the consumer will not re-probe as we have
+ * changed the state of the link already).
+ */
+void device_links_unbind_consumers(struct device *dev)
+{
+	struct devlink *link;
+	int idx;
+
+ start:
+	idx = device_links_read_lock();
+
+	list_for_each_entry_rcu(link, &dev->supplier_links, s_node) {
+		enum devlink_status status;
+
+		spin_lock(&link->lock);
+		status = link->status;
+		if (status == DEVICE_LINK_CONSUMER_PROBE) {
+			spin_unlock(&link->lock);
+
+			device_links_read_unlock(idx);
+
+			wait_for_device_probe();
+			goto start;
+		}
+		link->status = DEVICE_LINK_SUPPLIER_UNBIND;
+		if (status == DEVICE_LINK_ACTIVE) {
+			struct device *consumer = link->consumer;
+
+			get_device(consumer);
+			spin_unlock(&link->lock);
+
+			device_links_read_unlock(idx);
+
+			device_release_driver_internal(consumer, NULL,
+						       consumer->parent);
+			put_device(consumer);
+			goto start;
+		}
+		spin_unlock(&link->lock);
+	}
+
+	device_links_read_unlock(idx);
+}
+
+/* Device links support end. */
+
 int (*platform_notify)(struct device *dev) = NULL;
 int (*platform_notify_remove)(struct device *dev) = NULL;
 static struct kobject *dev_kobj;
@@ -711,6 +1072,8 @@  void device_initialize(struct device *dev)
 #ifdef CONFIG_GENERIC_MSI_IRQ
 	INIT_LIST_HEAD(&dev->msi_list);
 #endif
+	INIT_LIST_HEAD(&dev->supplier_links);
+	INIT_LIST_HEAD(&dev->consumer_links);
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
@@ -1233,6 +1596,7 @@  void device_del(struct device *dev)
 {
 	struct device *parent = dev->parent;
 	struct class_interface *class_intf;
+	struct devlink *link, *ln;
 
 	/* Notify clients of device removal.  This call must come
 	 * before dpm_sysfs_remove().
@@ -1240,6 +1604,28 @@  void device_del(struct device *dev)
 	if (dev->bus)
 		blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_DEL_DEVICE, dev);
+
+	/*
+	 * Delete all of the remaining links from this device to any other
+	 * devices (either consumers or suppliers).
+	 *
+	 * This requires that all links be dormant, so warn if that's no the
+	 * case.
+	 */
+	mutex_lock(&device_links_lock);
+
+	list_for_each_entry_safe_reverse(link, ln, &dev->consumer_links, c_node) {
+		WARN_ON(link->status != DEVICE_LINK_DORMANT);
+		devlink_del(link);
+	}
+
+	list_for_each_entry_safe_reverse(link, ln, &dev->supplier_links, s_node) {
+		WARN_ON(link->status != DEVICE_LINK_DORMANT);
+		devlink_del(link);
+	}
+
+	mutex_unlock(&device_links_lock);
+
 	dpm_sysfs_remove(dev);
 	if (parent)
 		klist_del(&dev->p->knode_parent);
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index d9e76e9205c7..7c0abeba89e9 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -249,6 +249,7 @@  static void driver_bound(struct device *dev)
 		 __func__, dev_name(dev));
 
 	klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
+	device_links_driver_bound(dev);
 
 	device_pm_check_callbacks(dev);
 
@@ -399,6 +400,7 @@  probe_failed:
 		blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
 pinctrl_bind_failed:
+	device_links_no_driver(dev);
 	devres_release_all(dev);
 	driver_sysfs_remove(dev);
 	dev->driver = NULL;
@@ -489,6 +491,10 @@  int driver_probe_device(struct device_driver *drv, struct device *dev)
 	if (!device_is_registered(dev))
 		return -ENODEV;
 
+	ret = device_links_check_suppliers(dev);
+	if (ret)
+		return ret;
+
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
@@ -756,7 +762,7 @@  EXPORT_SYMBOL_GPL(driver_attach);
  * __device_release_driver() must be called with @dev lock held.
  * When called for a USB interface, @dev->parent lock must be held as well.
  */
-static void __device_release_driver(struct device *dev)
+static void __device_release_driver(struct device *dev, struct device *parent)
 {
 	struct device_driver *drv;
 
@@ -765,6 +771,25 @@  static void __device_release_driver(struct device *dev)
 		if (driver_allows_async_probing(drv))
 			async_synchronize_full();
 
+		while (device_links_busy(dev)) {
+			device_unlock(dev);
+			if (parent)
+				device_unlock(parent);
+
+			device_links_unbind_consumers(dev);
+			if (parent)
+				device_lock(parent);
+
+			device_lock(dev);
+			/*
+			 * A concurrent invocation of the same function might
+			 * have released the driver successfully while this one
+			 * was waiting, so check for that.
+			 */
+			if (dev->driver != drv)
+				return;
+		}
+
 		pm_runtime_get_sync(dev);
 
 		driver_sysfs_remove(dev);
@@ -780,6 +805,9 @@  static void __device_release_driver(struct device *dev)
 			dev->bus->remove(dev);
 		else if (drv->remove)
 			drv->remove(dev);
+
+		device_links_driver_gone(dev);
+		device_links_no_driver(dev);
 		devres_release_all(dev);
 		dev->driver = NULL;
 		dev_set_drvdata(dev, NULL);
@@ -796,16 +824,16 @@  static void __device_release_driver(struct device *dev)
 	}
 }
 
-static void device_release_driver_internal(struct device *dev,
-					   struct device_driver *drv,
-					   struct device *parent)
+void device_release_driver_internal(struct device *dev,
+				    struct device_driver *drv,
+				    struct device *parent)
 {
 	if (parent)
 		device_lock(parent);
 
 	device_lock(dev);
 	if (!drv || drv == dev->driver)
-		__device_release_driver(dev);
+		__device_release_driver(dev, parent);
 
 	device_unlock(dev);
 	if (parent)
@@ -818,6 +846,10 @@  static void device_release_driver_internal(struct device *dev,
  *
  * Manually detach device from driver.
  * When called for a USB interface, @dev->parent lock must be held.
+ *
+ * If this function is to be called with @dev->parent lock held, ensure that
+ * the device's consumers are unbound in advance or that their locks can be
+ * acquired under the @dev->parent lock.
  */
 void device_release_driver(struct device *dev)
 {
diff --git a/include/linux/device.h b/include/linux/device.h
index 38f02814d53a..647204bd74a0 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -706,6 +706,34 @@  struct device_dma_parameters {
 	unsigned long segment_boundary_mask;
 };
 
+enum devlink_status {
+	DEVICE_LINK_DORMANT = 0,	/* Link not in use. */
+	DEVICE_LINK_AVAILABLE,		/* Supplier driver is present. */
+	DEVICE_LINK_ACTIVE,		/* Consumer driver is present too. */
+	DEVICE_LINK_CONSUMER_PROBE,	/* Consumer is probing. */
+	DEVICE_LINK_SUPPLIER_UNBIND,	/* Supplier is unbinding. */
+};
+
+/*
+ * Device link flags.
+ *
+ * PERSISTENT: Do not delete the link on consumer device driver unbind.
+ * PROBE_TIME: Assume supplier device functional when creating the link.
+ */
+#define DEVICE_LINK_PERSISTENT	(1 << 0)
+#define DEVICE_LINK_PROBE_TIME	(1 << 1)
+
+struct devlink {
+	struct device *supplier;
+	struct list_head s_node;
+	struct device *consumer;
+	struct list_head c_node;
+	enum devlink_status status;
+	u32 flags;
+	spinlock_t lock;
+	struct rcu_head rcu_head;
+};
+
 /**
  * struct device - The basic device structure
  * @parent:	The device's "parent" device, the device to which it is attached.
@@ -731,6 +759,8 @@  struct device_dma_parameters {
  * 		on.  This shrinks the "Board Support Packages" (BSPs) and
  * 		minimizes board-specific #ifdefs in drivers.
  * @driver_data: Private pointer for driver specific info.
+ * @supplier_links: Links to consumer devices.
+ * @consumer_links: Links to supplier devices.
  * @power:	For device power management.
  * 		See Documentation/power/devices.txt for details.
  * @pm_domain:	Provide callbacks that are executed during system suspend,
@@ -797,6 +827,8 @@  struct device {
 					   core doesn't touch it */
 	void		*driver_data;	/* Driver data, set and get with
 					   dev_set/get_drvdata */
+	struct list_head	supplier_links;
+	struct list_head	consumer_links;
 	struct dev_pm_info	power;
 	struct dev_pm_domain	*pm_domain;
 
@@ -1113,6 +1145,10 @@  extern void device_shutdown(void);
 /* debugging and troubleshooting/diagnostic helpers. */
 extern const char *dev_driver_string(const struct device *dev);
 
+/* Device links interface. */
+struct devlink *device_link_add(struct device *consumer,
+				struct device *supplier, u32 flags);
+void device_link_del(struct devlink *link);
 
 #ifdef CONFIG_PRINTK