diff mbox

bisected regression, v3.5 -> next-20120724: PCI PM causes USB hotplug failure

Message ID 1343367309.3874.21.camel@yhuang-dev (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Huang, Ying July 27, 2012, 5:35 a.m. UTC
On Thu, 2012-07-26 at 12:35 +0200, Bjørn Mork wrote:
> Huang Ying <ying.huang@intel.com> writes:
> 
> > Do you have time to try the below patch?
> 
> Sure.  Looks OK wrt the USB problems, but may cause problems with the
> PCIe WiFi card.  Unless those are related to other changes in -next.
> 
> Anyway, for I applied your patch on top of next-20120724 for
> consistency (still without Rafael print fix, so we get the D4 below).  
> This results in different stats for the uhci_hcd and ehci_hcd:
> 
> 
> Jul 26 12:13:42 nemi kernel: [   72.820305] uhci_hcd 0000:00:1a.2: power state changed by ACPI to D2
> Jul 26 12:13:42 nemi kernel: [   72.835101] uhci_hcd 0000:00:1d.0: power state changed by ACPI to D2
> Jul 26 12:13:44 nemi kernel: [   74.808770] uhci_hcd 0000:00:1a.0: power state changed by ACPI to D2
> Jul 26 12:13:44 nemi kernel: [   74.840293] ehci_hcd 0000:00:1d.7: power state changed by ACPI to D4
> Jul 26 12:13:44 nemi kernel: [   74.840326] ehci_hcd 0000:00:1a.7: power state changed by ACPI to D4
> 
> I assume that is expected, based on the lspci output I posted earlier.
> Overall I get a nice mix of allowed/disallowed:
> 
> 
> 
> nemi:/home/bjorn# grep . /sys/bus/pci/devices/*/d3cold_allowed
> /sys/bus/pci/devices/0000:00:00.0/d3cold_allowed:0
> /sys/bus/pci/devices/0000:00:02.0/d3cold_allowed:1
> /sys/bus/pci/devices/0000:00:02.1/d3cold_allowed:1
> /sys/bus/pci/devices/0000:00:03.0/d3cold_allowed:1
> /sys/bus/pci/devices/0000:00:19.0/d3cold_allowed:1
> /sys/bus/pci/devices/0000:00:1a.0/d3cold_allowed:0
> /sys/bus/pci/devices/0000:00:1a.1/d3cold_allowed:0
> /sys/bus/pci/devices/0000:00:1a.2/d3cold_allowed:0
> /sys/bus/pci/devices/0000:00:1a.7/d3cold_allowed:1
> /sys/bus/pci/devices/0000:00:1b.0/d3cold_allowed:1
> /sys/bus/pci/devices/0000:00:1c.0/d3cold_allowed:0
> /sys/bus/pci/devices/0000:00:1c.1/d3cold_allowed:0
> /sys/bus/pci/devices/0000:00:1d.0/d3cold_allowed:0
> /sys/bus/pci/devices/0000:00:1d.1/d3cold_allowed:0
> /sys/bus/pci/devices/0000:00:1d.2/d3cold_allowed:0
> /sys/bus/pci/devices/0000:00:1d.7/d3cold_allowed:1
> /sys/bus/pci/devices/0000:00:1e.0/d3cold_allowed:0
> /sys/bus/pci/devices/0000:00:1f.0/d3cold_allowed:0
> /sys/bus/pci/devices/0000:00:1f.2/d3cold_allowed:1
> /sys/bus/pci/devices/0000:00:1f.3/d3cold_allowed:0
> /sys/bus/pci/devices/0000:03:00.0/d3cold_allowed:1
> 
> 
> USB hotplugging seems to work fine with this.  But attempting to
> unload/reload the wifi drivers resulted in this:
> 
> 
> 
> 
> Jul 26 12:20:43 nemi kernel: [  493.812266] iwlwifi 0000:03:00.0: Refused to change power state, currently in D3

I think this is because the parent bridge (PCIe port) is in lower power
state.  So write to PM registers of child device takes no effect.

> Jul 26 12:20:43 nemi kernel: [  493.812331] iwlwifi 0000:03:00.0: pci_resource_len = 0x00002000
> Jul 26 12:20:43 nemi kernel: [  493.812335] iwlwifi 0000:03:00.0: pci_resource_base = ffffc900055a4000
> Jul 26 12:20:43 nemi kernel: [  493.812339] iwlwifi 0000:03:00.0: HW Revision ID = 0x0
> Jul 26 12:20:43 nemi kernel: [  493.812350] iwlwifi 0000:03:00.0: pci_enable_msi failed(0Xffffffea)
> Jul 26 12:20:43 nemi kernel: [  493.812377] driver: '0000:03:00.0': driver_bound: bound to device 'iwlwifi'
> Jul 26 12:20:43 nemi kernel: [  493.812385] bus: 'pci': really_probe: bound device 0000:03:00.0 to driver iwlwifi
> Jul 26 12:20:43 nemi kernel: [  493.813634] iwldvm: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:
> Jul 26 12:20:43 nemi kernel: [  493.813637] iwldvm: Copyright(c) 2003-2012 Intel Corporation
> Jul 26 12:20:43 nemi kernel: [  493.813912] device: '0000:03:00.0': device_add
> Jul 26 12:20:43 nemi kernel: [  493.813939] PM: Adding info for No Bus:0000:03:00.0
> Jul 26 12:20:43 nemi kernel: [  493.813947] firmware 0000:03:00.0: firmware: requesting iwlwifi-5000-5.ucode
> Jul 26 12:20:43 nemi kernel: [  493.827551] PM: Removing info for No Bus:0000:03:00.0
> Jul 26 12:20:43 nemi kernel: [  493.827615] iwlwifi 0000:03:00.0: loaded firmware version 8.83.5.1 build 33692
> Jul 26 12:20:43 nemi kernel: [  493.828170] iwlwifi 0000:03:00.0: CONFIG_IWLWIFI_DEBUG disabled
> Jul 26 12:20:43 nemi kernel: [  493.828175] iwlwifi 0000:03:00.0: CONFIG_IWLWIFI_DEBUGFS disabled
> Jul 26 12:20:43 nemi kernel: [  493.828178] iwlwifi 0000:03:00.0: CONFIG_IWLWIFI_DEVICE_TRACING disabled
> Jul 26 12:20:43 nemi kernel: [  493.828182] iwlwifi 0000:03:00.0: CONFIG_IWLWIFI_DEVICE_TESTMODE disabled
> Jul 26 12:20:43 nemi kernel: [  493.828185] iwlwifi 0000:03:00.0: CONFIG_IWLWIFI_P2P disabled
> Jul 26 12:20:43 nemi kernel: [  493.828190] iwlwifi 0000:03:00.0: Detected Intel(R) Ultimate N WiFi Link 5300 AGN, REV=0xFFFFFFFF
> Jul 26 12:20:43 nemi kernel: [  493.828218] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S
> Jul 26 12:20:43 nemi kernel: [  493.832013] ------------[ cut here ]------------
> Jul 26 12:20:43 nemi kernel: [  493.832013] WARNING: at drivers/net/wireless/iwlwifi/iwl-io.c:150 iwl_grab_nic_access+0x47/0x54 [iwlwifi]()
> Jul 26 12:20:43 nemi kernel: [  493.832013] Hardware name: 2776LEG
> Jul 26 12:20:43 nemi kernel: [  493.832013] Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
> Jul 26 12:20:43 nemi kernel: [  493.832013] Modules linked in: iwldvm iwlwifi mac80211 cfg80211 cpufreq_userspace cpufreq_stats cpufreq_conservative cpufreq_powersave xt_multiport xt_hl ip6table_filter iptable_filter ip_tables ip6_tables x_tables rfcomm bnep binfmt_misc fuse nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc 8021q garp stp llc tun ext2 loop snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm thinkpad_acpi snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device arc4 snd i915 coretemp kvm_intel qcserial drm_kms_helper usb_wwan acpi_cpufreq drm psmouse lpc_ich i2c_algo_bit soundcore usbserial qmi_wwan usbnet mii cdc_wdm kvm microcode btusb evdev bluetooth i2c_i801 serio_raw mfd_core rfkill mei snd_page_alloc crc16 i2c_core battery nvram ac video wmi mperf processor button ext3 mbcache jbd sha256_generic ablk_helper cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod nbd sd_mod crc_t10dif sr_mod cdrom uhci_hcd ahci libahci libata ehci_hc
> Jul 26 12:20:43 nemi kernel: d scsi_mod thermal thermal_sys usbcore usb_common e1000e [last unloaded: cfg80211]
> Jul 26 12:20:43 nemi kernel: [  493.832013] Pid: 200, comm: kworker/0:2 Not tainted 3.5.0-rc2-next-20120724+ #23
> Jul 26 12:20:43 nemi kernel: [  493.832013] Call Trace:
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff8103d0fd>] ? warn_slowpath_common+0x78/0x8c
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff8103d1af>] ? warn_slowpath_fmt+0x45/0x4a
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffffa03822be>] ? iwl_grab_nic_access+0x47/0x54 [iwlwifi]
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffffa0382590>] ? iwl_write_prph+0x29/0x56 [iwlwifi]
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffffa03882cb>] ? iwl_apm_init+0x13a/0x16b [iwlwifi]
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffffa03883fd>] ? iwl_trans_pcie_start_hw+0x101/0x15b [iwlwifi]
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffffa026aa5b>] ? iwl_op_mode_dvm_start+0x246/0x96a [iwldvm]
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffffa03835d1>] ? iwl_ucode_callback+0x9e5/0xad8 [iwlwifi]
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff812793c2>] ? _request_firmware_prepare+0x1e2/0x1e2
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff81279471>] ? request_firmware_work_func+0xaf/0xe4
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff81054dea>] ? process_one_work+0x1ff/0x311
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff810550f7>] ? worker_thread+0x1fb/0x2fb
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff81054efc>] ? process_one_work+0x311/0x311
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff81054efc>] ? process_one_work+0x311/0x311
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff810588fd>] ? kthread+0x81/0x89
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff813703c4>] ? kernel_thread_helper+0x4/0x10
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff8105887c>] ? kthread_freezable_should_stop+0x53/0x53
> Jul 26 12:20:43 nemi kernel: [  493.832013]  [<ffffffff813703c0>] ? gs_change+0x13/0x13
> Jul 26 12:20:43 nemi kernel: [  493.832013] ---[ end trace fcaaf916dd43b7ca ]---
> Jul 26 12:20:43 nemi kernel: [  493.864785] iwlwifi 0000:03:00.0: bad EEPROM/OTP signature, type=OTP, EEPROM_GP=0x00000007
> Jul 26 12:20:43 nemi kernel: [  493.864791] iwlwifi 0000:03:00.0: EEPROM not found, EEPROM_GP=0xffffffff
> Jul 26 12:20:43 nemi kernel: [  493.864795] iwlwifi 0000:03:00.0: Unable to init EEPROM
> 
> 
> That does not look good...
> 
> The bridge and WiFi devices power status at this point is:
> 
> bjorn@nemi:~$  grep . /sys/bus/pci/devices/0000:00:1c.1/power/*
> /sys/bus/pci/devices/0000:00:1c.1/power/async:enabled
> grep: /sys/bus/pci/devices/0000:00:1c.1/power/autosuspend_delay_ms: Input/output error
> /sys/bus/pci/devices/0000:00:1c.1/power/control:auto
> /sys/bus/pci/devices/0000:00:1c.1/power/runtime_active_kids:0
> /sys/bus/pci/devices/0000:00:1c.1/power/runtime_active_time:390576
> /sys/bus/pci/devices/0000:00:1c.1/power/runtime_enabled:enabled
> /sys/bus/pci/devices/0000:00:1c.1/power/runtime_status:suspended
> /sys/bus/pci/devices/0000:00:1c.1/power/runtime_suspended_time:697004
> /sys/bus/pci/devices/0000:00:1c.1/power/runtime_usage:0
> /sys/bus/pci/devices/0000:00:1c.1/power/wakeup:disabled
> 
> bjorn@nemi:~$ grep . /sys/bus/pci/devices/0000:03:00.0/power/*
> /sys/bus/pci/devices/0000:03:00.0/power/async:enabled
> grep: /sys/bus/pci/devices/0000:03:00.0/power/autosuspend_delay_ms: Input/output error
> /sys/bus/pci/devices/0000:03:00.0/power/control:auto
> /sys/bus/pci/devices/0000:03:00.0/power/runtime_active_kids:0
> /sys/bus/pci/devices/0000:03:00.0/power/runtime_active_time:0
> /sys/bus/pci/devices/0000:03:00.0/power/runtime_enabled:disabled
> /sys/bus/pci/devices/0000:03:00.0/power/runtime_status:unsupported
> /sys/bus/pci/devices/0000:03:00.0/power/runtime_suspended_time:70220
> /sys/bus/pci/devices/0000:03:00.0/power/runtime_usage:0
> /sys/bus/pci/devices/0000:03:00.0/power/wakeup:disabled
> 
> 
> 
> I don't really understand the last one.  How can suspended_time > 0 when
> status is unsupported and autosuspend is disabled?

Do you have time to try the following patch?

Best Regards,
Huang Ying

---
 drivers/pci/pci-driver.c |    6 ++++++
 1 file changed, 6 insertions(+)



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bjørn Mork July 27, 2012, 9:11 a.m. UTC | #1
Huang Ying <ying.huang@intel.com> writes:

> Do you have time to try the following patch?
>
> Best Regards,
> Huang Ying
>
> ---
>  drivers/pci/pci-driver.c |    6 ++++++
>  1 file changed, 6 insertions(+)
>
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -280,8 +280,12 @@ static long local_pci_probe(void *_ddi)
>  {
>  	struct drv_dev_and_id *ddi = _ddi;
>  	struct device *dev = &ddi->dev->dev;
> +	struct device *parent = dev->parent;
>  	int rc;
>  
> +	/* The parent bridge must be in active state when probing */
> +	if (parent)
> +		pm_runtime_get_sync(parent);
>  	/* Unbound PCI devices are always set to disabled and suspended.
>  	 * During probe, the device is set to enabled and active and the
>  	 * usage count is incremented.  If the driver supports runtime PM,
> @@ -298,6 +302,8 @@ static long local_pci_probe(void *_ddi)
>  		pm_runtime_set_suspended(dev);
>  		pm_runtime_put_noidle(dev);
>  	}
> +	if (parent)
> +		pm_runtime_put(parent);
>  	return rc;
>  }
>  


Yup, that worked in the quick test I just did.

 lspci reading the device config will still not wake the bridge, but I
assume that is intentional?  But loading the device driver now wakes
both the bridge and the device, so that works.


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 27, 2012, 3:03 p.m. UTC | #2
On Fri, 27 Jul 2012, Huang Ying wrote:

> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -280,8 +280,12 @@ static long local_pci_probe(void *_ddi)
>  {
>  	struct drv_dev_and_id *ddi = _ddi;
>  	struct device *dev = &ddi->dev->dev;
> +	struct device *parent = dev->parent;
>  	int rc;
>  
> +	/* The parent bridge must be in active state when probing */
> +	if (parent)
> +		pm_runtime_get_sync(parent);

Ooh, this is a very good point.  I completely missed it.

>  	/* Unbound PCI devices are always set to disabled and suspended.
>  	 * During probe, the device is set to enabled and active and the
>  	 * usage count is incremented.  If the driver supports runtime PM,
> @@ -298,6 +302,8 @@ static long local_pci_probe(void *_ddi)
>  		pm_runtime_set_suspended(dev);
>  		pm_runtime_put_noidle(dev);
>  	}
> +	if (parent)
> +		pm_runtime_put(parent);

You should use pm_runtime_put_sync(), not pm_runtime_put().

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 27, 2012, 7:11 p.m. UTC | #3
On Friday, July 27, 2012, Alan Stern wrote:
> On Fri, 27 Jul 2012, Huang Ying wrote:
> 
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -280,8 +280,12 @@ static long local_pci_probe(void *_ddi)
> >  {
> >  	struct drv_dev_and_id *ddi = _ddi;
> >  	struct device *dev = &ddi->dev->dev;
> > +	struct device *parent = dev->parent;
> >  	int rc;
> >  
> > +	/* The parent bridge must be in active state when probing */
> > +	if (parent)
> > +		pm_runtime_get_sync(parent);
> 
> Ooh, this is a very good point.  I completely missed it.
> 
> >  	/* Unbound PCI devices are always set to disabled and suspended.
> >  	 * During probe, the device is set to enabled and active and the
> >  	 * usage count is incremented.  If the driver supports runtime PM,
> > @@ -298,6 +302,8 @@ static long local_pci_probe(void *_ddi)
> >  		pm_runtime_set_suspended(dev);
> >  		pm_runtime_put_noidle(dev);
> >  	}
> > +	if (parent)
> > +		pm_runtime_put(parent);
> 
> You should use pm_runtime_put_sync(), not pm_runtime_put().

Hmm, why exactly?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 27, 2012, 7:39 p.m. UTC | #4
On Fri, 27 Jul 2012, Rafael J. Wysocki wrote:

> On Friday, July 27, 2012, Alan Stern wrote:
> > On Fri, 27 Jul 2012, Huang Ying wrote:
> > 
> > > --- a/drivers/pci/pci-driver.c
> > > +++ b/drivers/pci/pci-driver.c
> > > @@ -280,8 +280,12 @@ static long local_pci_probe(void *_ddi)
> > >  {
> > >  	struct drv_dev_and_id *ddi = _ddi;
> > >  	struct device *dev = &ddi->dev->dev;
> > > +	struct device *parent = dev->parent;
> > >  	int rc;
> > >  
> > > +	/* The parent bridge must be in active state when probing */
> > > +	if (parent)
> > > +		pm_runtime_get_sync(parent);
> > 
> > Ooh, this is a very good point.  I completely missed it.
> > 
> > >  	/* Unbound PCI devices are always set to disabled and suspended.
> > >  	 * During probe, the device is set to enabled and active and the
> > >  	 * usage count is incremented.  If the driver supports runtime PM,
> > > @@ -298,6 +302,8 @@ static long local_pci_probe(void *_ddi)
> > >  		pm_runtime_set_suspended(dev);
> > >  		pm_runtime_put_noidle(dev);
> > >  	}
> > > +	if (parent)
> > > +		pm_runtime_put(parent);
> > 
> > You should use pm_runtime_put_sync(), not pm_runtime_put().
> 
> Hmm, why exactly?

Because it's more efficient to do something directly than to run it in 
a workqueue.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 27, 2012, 7:54 p.m. UTC | #5
On Friday, July 27, 2012, Alan Stern wrote:
> On Fri, 27 Jul 2012, Rafael J. Wysocki wrote:
> 
> > On Friday, July 27, 2012, Alan Stern wrote:
> > > On Fri, 27 Jul 2012, Huang Ying wrote:
> > > 
> > > > --- a/drivers/pci/pci-driver.c
> > > > +++ b/drivers/pci/pci-driver.c
> > > > @@ -280,8 +280,12 @@ static long local_pci_probe(void *_ddi)
> > > >  {
> > > >  	struct drv_dev_and_id *ddi = _ddi;
> > > >  	struct device *dev = &ddi->dev->dev;
> > > > +	struct device *parent = dev->parent;
> > > >  	int rc;
> > > >  
> > > > +	/* The parent bridge must be in active state when probing */
> > > > +	if (parent)
> > > > +		pm_runtime_get_sync(parent);
> > > 
> > > Ooh, this is a very good point.  I completely missed it.
> > > 
> > > >  	/* Unbound PCI devices are always set to disabled and suspended.
> > > >  	 * During probe, the device is set to enabled and active and the
> > > >  	 * usage count is incremented.  If the driver supports runtime PM,
> > > > @@ -298,6 +302,8 @@ static long local_pci_probe(void *_ddi)
> > > >  		pm_runtime_set_suspended(dev);
> > > >  		pm_runtime_put_noidle(dev);
> > > >  	}
> > > > +	if (parent)
> > > > +		pm_runtime_put(parent);
> > > 
> > > You should use pm_runtime_put_sync(), not pm_runtime_put().
> > 
> > Hmm, why exactly?
> 
> Because it's more efficient to do something directly than to run it in 
> a workqueue.

Well, depends.  If that results in a power off (the parent goes into
D3 for example), we may wait as long as 10 ms for that to complete.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 28, 2012, 4:12 p.m. UTC | #6
On Fri, 27 Jul 2012, Rafael J. Wysocki wrote:

> > > > > +	if (parent)
> > > > > +		pm_runtime_put(parent);
> > > > 
> > > > You should use pm_runtime_put_sync(), not pm_runtime_put().
> > > 
> > > Hmm, why exactly?
> > 
> > Because it's more efficient to do something directly than to run it in 
> > a workqueue.
> 
> Well, depends.  If that results in a power off (the parent goes into
> D3 for example), we may wait as long as 10 ms for that to complete.

True, but so what?  You'd also have to wait 10 ms for the workqueue
item to complete if pm_runtime_put() was used, plus the additional
overhead of scheduling the workqueue thread.

Now, if the length of time required for the probe to run were an issue 
then yes, the synchronous routine could make things take longer.  But 
probing is generally done asynchronously anyway, right?  And usually 
what matters is the time before the device becomes available, not the 
time for the probe to run.

Alan Stern


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 28, 2012, 8:26 p.m. UTC | #7
On Saturday, July 28, 2012, Alan Stern wrote:
> On Fri, 27 Jul 2012, Rafael J. Wysocki wrote:
> 
> > > > > > +	if (parent)
> > > > > > +		pm_runtime_put(parent);
> > > > > 
> > > > > You should use pm_runtime_put_sync(), not pm_runtime_put().
> > > > 
> > > > Hmm, why exactly?
> > > 
> > > Because it's more efficient to do something directly than to run it in 
> > > a workqueue.
> > 
> > Well, depends.  If that results in a power off (the parent goes into
> > D3 for example), we may wait as long as 10 ms for that to complete.
> 
> True, but so what?  You'd also have to wait 10 ms for the workqueue
> item to complete if pm_runtime_put() was used,

Are you sure?  pm_runtime_put() leads to rpm_idle() with the RPM_ASYNC
flag set, which causes it to queue up the work item and return immediately
without waiting.  Why exactly do you think I'd need to wait, then?

> plus the additional overhead of scheduling the workqueue thread.
> 
> Now, if the length of time required for the probe to run were an issue 
> then yes, the synchronous routine could make things take longer.  But 
> probing is generally done asynchronously anyway, right?  And usually 
> what matters is the time before the device becomes available, not the 
> time for the probe to run.

Still, I see no reason to wait synchronously for the _parent_ to suspend.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 28, 2012, 9:12 p.m. UTC | #8
On Sat, 28 Jul 2012, Rafael J. Wysocki wrote:

> On Saturday, July 28, 2012, Alan Stern wrote:
> > On Fri, 27 Jul 2012, Rafael J. Wysocki wrote:
> > 
> > > > > > > +	if (parent)
> > > > > > > +		pm_runtime_put(parent);
> > > > > > 
> > > > > > You should use pm_runtime_put_sync(), not pm_runtime_put().
> > > > > 
> > > > > Hmm, why exactly?
> > > > 
> > > > Because it's more efficient to do something directly than to run it in 
> > > > a workqueue.
> > > 
> > > Well, depends.  If that results in a power off (the parent goes into
> > > D3 for example), we may wait as long as 10 ms for that to complete.
> > 
> > True, but so what?  You'd also have to wait 10 ms for the workqueue
> > item to complete if pm_runtime_put() was used,
> 
> Are you sure?  pm_runtime_put() leads to rpm_idle() with the RPM_ASYNC
> flag set, which causes it to queue up the work item and return immediately
> without waiting.  Why exactly do you think I'd need to wait, then?

What I mean is, it takes 10 ms for the parent to suspend regardless of
whether or not you use the _sync form of the call.  If you're waiting
for the parent to suspend, you would therefore have to wait 10 ms in
either case.

Likewise, if you're waiting for the PCI device to become usable then
you don't have to wait for 10 ms, because you can use it as soon as the
probe routine returns.

> > plus the additional overhead of scheduling the workqueue thread.
> > 
> > Now, if the length of time required for the probe to run were an issue 
> > then yes, the synchronous routine could make things take longer.  But 
> > probing is generally done asynchronously anyway, right?  And usually 
> > what matters is the time before the device becomes available, not the 
> > time for the probe to run.
> 
> Still, I see no reason to wait synchronously for the _parent_ to suspend.

Why not?  There's no penalty, particularly since most device probing 
happens asynchronously these days anyway.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 29, 2012, 1:55 p.m. UTC | #9
On Saturday, July 28, 2012, Alan Stern wrote:
> On Sat, 28 Jul 2012, Rafael J. Wysocki wrote:
> 
> > On Saturday, July 28, 2012, Alan Stern wrote:
> > > On Fri, 27 Jul 2012, Rafael J. Wysocki wrote:
> > > 
> > > > > > > > +	if (parent)
> > > > > > > > +		pm_runtime_put(parent);
> > > > > > > 
> > > > > > > You should use pm_runtime_put_sync(), not pm_runtime_put().
> > > > > > 
> > > > > > Hmm, why exactly?
> > > > > 
> > > > > Because it's more efficient to do something directly than to run it in 
> > > > > a workqueue.
> > > > 
> > > > Well, depends.  If that results in a power off (the parent goes into
> > > > D3 for example), we may wait as long as 10 ms for that to complete.
> > > 
> > > True, but so what?  You'd also have to wait 10 ms for the workqueue
> > > item to complete if pm_runtime_put() was used,
> > 
> > Are you sure?  pm_runtime_put() leads to rpm_idle() with the RPM_ASYNC
> > flag set, which causes it to queue up the work item and return immediately
> > without waiting.  Why exactly do you think I'd need to wait, then?
> 
> What I mean is, it takes 10 ms for the parent to suspend regardless of
> whether or not you use the _sync form of the call.  If you're waiting
> for the parent to suspend, you would therefore have to wait 10 ms in
> either case.
> 
> Likewise, if you're waiting for the PCI device to become usable then
> you don't have to wait for 10 ms, because you can use it as soon as the
> probe routine returns.

The difference is, if you use _put_sync(), you need to wait the extra 10 ms
for local_pci_probe() to return (if the parent is actually suspended),
although you might not need to wait for it if you used _put(), right?

Which, to me, means that using _put_sync() may not be always better.
It probably doesn't matter a lot, but then the workqueue overhead shouldn't
matter a lot either.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 29, 2012, 2:55 p.m. UTC | #10
On Sun, 29 Jul 2012, Rafael J. Wysocki wrote:

> The difference is, if you use _put_sync(), you need to wait the extra 10 ms
> for local_pci_probe() to return (if the parent is actually suspended),
> although you might not need to wait for it if you used _put(), right?

Yes, that's the difference.  But who waits for local_pci_probe() to 
return?  :-)

> Which, to me, means that using _put_sync() may not be always better.
> It probably doesn't matter a lot, but then the workqueue overhead shouldn't
> matter a lot either.

It's that in the end, the extra overhead is pretty small.  For me
there's also an issue of style: If you do a synchronous get then it
looks odd not to do a synchronous put.  My feeling has always been that
the async routines are for use in non-process contexts, where the sync
routines can't be used.  Using them just to return a little more
quickly is a foreign idea.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 29, 2012, 7:18 p.m. UTC | #11
On Sunday, July 29, 2012, Alan Stern wrote:
> On Sun, 29 Jul 2012, Rafael J. Wysocki wrote:
> 
> > The difference is, if you use _put_sync(), you need to wait the extra 10 ms
> > for local_pci_probe() to return (if the parent is actually suspended),
> > although you might not need to wait for it if you used _put(), right?
> 
> Yes, that's the difference.  But who waits for local_pci_probe() to 
> return?  :-)

pci_register_driver() might, but that's not a big deal.  Hot-plug might
as well, though.

> > Which, to me, means that using _put_sync() may not be always better.
> > It probably doesn't matter a lot, but then the workqueue overhead shouldn't
> > matter a lot either.
> 
> It's that in the end, the extra overhead is pretty small.  For me
> there's also an issue of style: If you do a synchronous get then it
> looks odd not to do a synchronous put.  My feeling has always been that
> the async routines are for use in non-process contexts, where the sync
> routines can't be used.  Using them just to return a little more
> quickly is a foreign idea.

I see. :-)

The reason for using sync get is quite obvious: we want to be able to access
the device later on.  For sync put that's not so clear, though. There are cases
when we definitely want to do it, like the failing .probe() in which we want to
disable runtime PM next to the put, but usually it doesn't hurt (too much) to
defer it IMO.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jassi Brar July 29, 2012, 8:12 p.m. UTC | #12
On Sun, Jul 29, 2012 at 8:25 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> For me there's also an issue of style: If you do a synchronous get then it
> looks odd not to do a synchronous put.  My feeling has always been that
> the async routines are for use in non-process contexts, where the sync
> routines can't be used.  Using them just to return a little more
> quickly is a foreign idea.
>
Another way of looking at it is - I need h/w to be active in order to
proceed so I call get_sync but I don't care if the h/w is not put down
immediately after I am done using it. So the put could be
relaxed/async - At best, we could avoid an unnecessary suspend-resume
cycle which could be expensive power and time wise. At worst, we
return a bit quicker. Or so do I think.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 29, 2012, 9:44 p.m. UTC | #13
On Mon, 30 Jul 2012, Jassi Brar wrote:

> On Sun, Jul 29, 2012 at 8:25 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > For me there's also an issue of style: If you do a synchronous get then it
> > looks odd not to do a synchronous put.  My feeling has always been that
> > the async routines are for use in non-process contexts, where the sync
> > routines can't be used.  Using them just to return a little more
> > quickly is a foreign idea.
> >
> Another way of looking at it is - I need h/w to be active in order to
> proceed so I call get_sync but I don't care if the h/w is not put down
> immediately after I am done using it. So the put could be
> relaxed/async - At best, we could avoid an unnecessary suspend-resume
> cycle which could be expensive power and time wise. At worst, we
> return a bit quicker. Or so do I think.

If the reason for the choice is to opportunistically delay suspending,
there are better ways of doing it: pm_schedule_suspend,
pm_runtime_put_autosuspend.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 31, 2012, 8:31 p.m. UTC | #14
On Sunday, July 29, 2012, Rafael J. Wysocki wrote:
> On Sunday, July 29, 2012, Alan Stern wrote:
> > On Sun, 29 Jul 2012, Rafael J. Wysocki wrote:
> > 
> > > The difference is, if you use _put_sync(), you need to wait the extra 10 ms
> > > for local_pci_probe() to return (if the parent is actually suspended),
> > > although you might not need to wait for it if you used _put(), right?
> > 
> > Yes, that's the difference.  But who waits for local_pci_probe() to 
> > return?  :-)
> 
> pci_register_driver() might, but that's not a big deal.  Hot-plug might
> as well, though.
> 
> > > Which, to me, means that using _put_sync() may not be always better.
> > > It probably doesn't matter a lot, but then the workqueue overhead shouldn't
> > > matter a lot either.
> > 
> > It's that in the end, the extra overhead is pretty small.  For me
> > there's also an issue of style: If you do a synchronous get then it
> > looks odd not to do a synchronous put.  My feeling has always been that
> > the async routines are for use in non-process contexts, where the sync
> > routines can't be used.  Using them just to return a little more
> > quickly is a foreign idea.
> 
> I see. :-)
> 
> The reason for using sync get is quite obvious: we want to be able to access
> the device later on.  For sync put that's not so clear, though. There are cases
> when we definitely want to do it, like the failing .probe() in which we want to
> disable runtime PM next to the put, but usually it doesn't hurt (too much) to
> defer it IMO.

Now it occured to me that perhaps we don't need the current asynchronous
pm_runtime_get() at all.  The usefulness of it is quite questionable, because
either we want to resume the device immediately, for which pm_runtime_get_sync()
should be used, or we just want to bump up the usage counter, in which cases
pm_runtime_get_noresume() should always be sufficient.  I fail to see any
particularly compelling use case for pm_runtime_get() doing an asynchronous
resume at the moment, but perhaps that's just me.

However, I receive reports of people using pm_runtime_get() where they really
should use pm_runtime_get_sync(), so I wonder if we can simply rename
pm_runtime_get_sync() as pm_runtime_get() and drop the asynchronous version
altogether?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 31, 2012, 9:05 p.m. UTC | #15
[CC: list trimmed]

On Tue, 31 Jul 2012, Rafael J. Wysocki wrote:

> Now it occured to me that perhaps we don't need the current asynchronous
> pm_runtime_get() at all.  The usefulness of it is quite questionable, because
> either we want to resume the device immediately, for which pm_runtime_get_sync()
> should be used, or we just want to bump up the usage counter, in which cases
> pm_runtime_get_noresume() should always be sufficient.  I fail to see any
> particularly compelling use case for pm_runtime_get() doing an asynchronous
> resume at the moment, but perhaps that's just me.

There are indeed valid uses for pm_runtime_get().  We are forced to use
it in non-sleepable contexts when we want to resume the device as
quickly as possible.  Example: a driver receives an I/O request from an
interrupt handler.

> However, I receive reports of people using pm_runtime_get() where they really
> should use pm_runtime_get_sync(), so I wonder if we can simply rename
> pm_runtime_get_sync() as pm_runtime_get() and drop the asynchronous version
> altogether?

Well, IMO the naming should have been the other way around from the
start.  That is, we should have made pm_runtime_get be the synchronous
routine and pm_runtime_get_async be the asynchronous one.  But it's too
late to change now.

And no, we can't get rid of the async version.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 31, 2012, 9:34 p.m. UTC | #16
On Tuesday, July 31, 2012, Alan Stern wrote:
> [CC: list trimmed]
> 
> On Tue, 31 Jul 2012, Rafael J. Wysocki wrote:
> 
> > Now it occured to me that perhaps we don't need the current asynchronous
> > pm_runtime_get() at all.  The usefulness of it is quite questionable, because
> > either we want to resume the device immediately, for which pm_runtime_get_sync()
> > should be used, or we just want to bump up the usage counter, in which cases
> > pm_runtime_get_noresume() should always be sufficient.  I fail to see any
> > particularly compelling use case for pm_runtime_get() doing an asynchronous
> > resume at the moment, but perhaps that's just me.
> 
> There are indeed valid uses for pm_runtime_get().  We are forced to use
> it in non-sleepable contexts when we want to resume the device as
> quickly as possible.  Example: a driver receives an I/O request from an
> interrupt handler.

Is it actually suitable to be used in such contexts?  It doesn't give any
guarantee that the device will be active when it returns, so the caller can't
really rely on it.  The caller always has to check if the device has been
resumed before accessing it anyway, so it may as well do a _get_noresume(),
do the check and do pm_request_resume() if needed directly.

Now, as far as interrupt handlers are concerned, there are two cases: either
the device has to be active to generate an interrupt (like PCI), or the
interrupt is generated by something else on behalf of it.  We don't seem to
handle the first case very well right now, because in that case the interrupt
handler will always know that the device is active, so it should do a
_get_noresume() and then change the device's status to "active" without
calling any kind of "resume", but we don't provide a helper for that.  In the
second case calling pm_runtime_get() doesn't really help either.  I think it's
better to do _get_noresume(), check the status and if "suspended", set a flag
for the resume callback to do something specific before returning successfully,
then call pm_request_resume() and return.

> > However, I receive reports of people using pm_runtime_get() where they really
> > should use pm_runtime_get_sync(), so I wonder if we can simply rename
> > pm_runtime_get_sync() as pm_runtime_get() and drop the asynchronous version
> > altogether?
> 
> Well, IMO the naming should have been the other way around from the
> start.  That is, we should have made pm_runtime_get be the synchronous
> routine and pm_runtime_get_async be the asynchronous one.  But it's too
> late to change now.

I'm not sure it is too late.  If we first change all the instances of
pm_runtime_get() to pm_runtime_get_async() and then all of the instances of
pm_runtime_get_sync() to pm_runtime_get(), it should be technically possible.
Of course, it would be confusing, but that's a different matter. :-)

> And no, we can't get rid of the async version.

I'm still not sure of that.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 31, 2012, 9:49 p.m. UTC | #17
On Tuesday, July 31, 2012, Rafael J. Wysocki wrote:
> On Tuesday, July 31, 2012, Alan Stern wrote:
> > [CC: list trimmed]
> > 
> > On Tue, 31 Jul 2012, Rafael J. Wysocki wrote:
> > 
> > > Now it occured to me that perhaps we don't need the current asynchronous
> > > pm_runtime_get() at all.  The usefulness of it is quite questionable, because
> > > either we want to resume the device immediately, for which pm_runtime_get_sync()
> > > should be used, or we just want to bump up the usage counter, in which cases
> > > pm_runtime_get_noresume() should always be sufficient.  I fail to see any
> > > particularly compelling use case for pm_runtime_get() doing an asynchronous
> > > resume at the moment, but perhaps that's just me.
> > 
> > There are indeed valid uses for pm_runtime_get().  We are forced to use
> > it in non-sleepable contexts when we want to resume the device as
> > quickly as possible.  Example: a driver receives an I/O request from an
> > interrupt handler.
> 
> Is it actually suitable to be used in such contexts?  It doesn't give any
> guarantee that the device will be active when it returns, so the caller can't
> really rely on it.  The caller always has to check if the device has been
> resumed before accessing it anyway, so it may as well do a _get_noresume(),
> do the check and do pm_request_resume() if needed directly.
> 
> Now, as far as interrupt handlers are concerned, there are two cases: either
> the device has to be active to generate an interrupt (like PCI), or the
> interrupt is generated by something else on behalf of it.  We don't seem to
> handle the first case very well right now, because in that case the interrupt
> handler will always know that the device is active, so it should do a
> _get_noresume() and then change the device's status to "active" without
> calling any kind of "resume", but we don't provide a helper for that.

Unless, of course, this is a shared interrupt, in which case the handler may
be invoked, because _another_ device has generated an interrupt, so the
"active" check will have to be done anyway.

> In the second case calling pm_runtime_get() doesn't really help either.
> I think it's better to do _get_noresume(), check the status and if
> "suspended", set a flag for the resume callback to do something specific
> before returning successfully, then call pm_request_resume() and return.

I realize that this may be somewhat racy, so perhaps we need something like
pm_request_resume_and_call(dev, func) that will execute the given function once
the device has been resumed (or just happens to be "active" when the resume
work item is run for it).

> > > However, I receive reports of people using pm_runtime_get() where they really
> > > should use pm_runtime_get_sync(), so I wonder if we can simply rename
> > > pm_runtime_get_sync() as pm_runtime_get() and drop the asynchronous version
> > > altogether?
> > 
> > Well, IMO the naming should have been the other way around from the
> > start.  That is, we should have made pm_runtime_get be the synchronous
> > routine and pm_runtime_get_async be the asynchronous one.  But it's too
> > late to change now.
> 
> I'm not sure it is too late.  If we first change all the instances of
> pm_runtime_get() to pm_runtime_get_async() and then all of the instances of
> pm_runtime_get_sync() to pm_runtime_get(), it should be technically possible.
> Of course, it would be confusing, but that's a different matter. :-)
> 
> > And no, we can't get rid of the async version.
> 
> I'm still not sure of that.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 1, 2012, 2:36 p.m. UTC | #18
On Tue, 31 Jul 2012, Rafael J. Wysocki wrote:

> On Tuesday, July 31, 2012, Alan Stern wrote:
> > [CC: list trimmed]
> > 
> > On Tue, 31 Jul 2012, Rafael J. Wysocki wrote:
> > 
> > > Now it occured to me that perhaps we don't need the current asynchronous
> > > pm_runtime_get() at all.  The usefulness of it is quite questionable, because
> > > either we want to resume the device immediately, for which pm_runtime_get_sync()
> > > should be used, or we just want to bump up the usage counter, in which cases
> > > pm_runtime_get_noresume() should always be sufficient.  I fail to see any
> > > particularly compelling use case for pm_runtime_get() doing an asynchronous
> > > resume at the moment, but perhaps that's just me.
> > 
> > There are indeed valid uses for pm_runtime_get().  We are forced to use
> > it in non-sleepable contexts when we want to resume the device as
> > quickly as possible.  Example: a driver receives an I/O request from an
> > interrupt handler.
> 
> Is it actually suitable to be used in such contexts?  It doesn't give any
> guarantee that the device will be active when it returns, so the caller can't
> really rely on it.

Of course not.  When you're in interrupt context, you can't wait around
to see if the device will actually resume.  (Unless you're using 
irq-safe runtime PM, but we can ignore that case.)

>  The caller always has to check if the device has been
> resumed before accessing it anyway, so it may as well do a _get_noresume(),
> do the check and do pm_request_resume() if needed directly.

This is exactly equivalent to doing pm_runtime_get() followed by a 
check, except that it's longer.

Yes, I agree that pm_runtime_get(dev) is nothing more than a shorthand
way of doing pm_runtime_get_noresume(dev) followed by 
pm_request_resume(dev).  That doesn't mean it should be eliminated.  
Shorthands are convenient.

As another example, pm_runtime_get_sync(dev) is nothing more than a 
shorthand for pm_runtime_get_noresume(dev) followed by 
pm_runtime_resume(dev).  IMO, having these joint functions that do a 
get/put combined with a suspend/resume/idle is very useful.

> Now, as far as interrupt handlers are concerned, there are two cases: either
> the device has to be active to generate an interrupt (like PCI), or the
> interrupt is generated by something else on behalf of it.  We don't seem to
> handle the first case very well right now, because in that case the interrupt
> handler will always know that the device is active, so it should do a
> _get_noresume() and then change the device's status to "active" without
> calling any kind of "resume", but we don't provide a helper for that.

I disagree.  Even if the device has to be active to generate an IRQ,
it's possible for the interrupt handler to be delayed until after the
device is suspended (unless the suspend routine explicitly calls
synchronize_irq(), which would be pretty foolish).  Hence the handler 
can't make any assumptions about the device's state.

>  In the
> second case calling pm_runtime_get() doesn't really help either.  I think it's
> better to do _get_noresume(), check the status and if "suspended", set a flag
> for the resume callback to do something specific before returning successfully,
> then call pm_request_resume() and return.

This is exactly equivalent to: pm_runtime_get(), check the status, if 
"suspended" then set a flag, otherwise do the work.  Except that again, 
it's longer.

Check out the sample driver code in section 9 of 
Documentation/power/runtime_pm.txt, especially the foo_read_or_write() 
routine.

> > Well, IMO the naming should have been the other way around from the
> > start.  That is, we should have made pm_runtime_get be the synchronous
> > routine and pm_runtime_get_async be the asynchronous one.  But it's too
> > late to change now.
> 
> I'm not sure it is too late.  If we first change all the instances of
> pm_runtime_get() to pm_runtime_get_async() and then all of the instances of
> pm_runtime_get_sync() to pm_runtime_get(), it should be technically possible.
> Of course, it would be confusing, but that's a different matter. :-)

If you're willing to risk a certain amount of confusion, count me in.  :-)

While you're changing names around, consider also adding a "_runtime"  
somewhere to:

	pm_children_suspended,
	pm_schedule_suspend,
	pm_request_idle,
	pm_request_resume,
	pm_request_autosuspend.

For example, we could have pm_runtime_idle_async instead of 
pm_request_idle.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 1, 2012, 9:24 p.m. UTC | #19
On Wednesday, August 01, 2012, Alan Stern wrote:
> On Tue, 31 Jul 2012, Rafael J. Wysocki wrote:
> 
> > On Tuesday, July 31, 2012, Alan Stern wrote:
> > > [CC: list trimmed]
> > > 
> > > On Tue, 31 Jul 2012, Rafael J. Wysocki wrote:
> > > 
> > > > Now it occured to me that perhaps we don't need the current asynchronous
> > > > pm_runtime_get() at all.  The usefulness of it is quite questionable, because
> > > > either we want to resume the device immediately, for which pm_runtime_get_sync()
> > > > should be used, or we just want to bump up the usage counter, in which cases
> > > > pm_runtime_get_noresume() should always be sufficient.  I fail to see any
> > > > particularly compelling use case for pm_runtime_get() doing an asynchronous
> > > > resume at the moment, but perhaps that's just me.
> > > 
> > > There are indeed valid uses for pm_runtime_get().  We are forced to use
> > > it in non-sleepable contexts when we want to resume the device as
> > > quickly as possible.  Example: a driver receives an I/O request from an
> > > interrupt handler.
> > 
> > Is it actually suitable to be used in such contexts?  It doesn't give any
> > guarantee that the device will be active when it returns, so the caller can't
> > really rely on it.
> 
> Of course not.  When you're in interrupt context, you can't wait around
> to see if the device will actually resume.  (Unless you're using 
> irq-safe runtime PM, but we can ignore that case.)
> 
> >  The caller always has to check if the device has been
> > resumed before accessing it anyway, so it may as well do a _get_noresume(),
> > do the check and do pm_request_resume() if needed directly.
> 
> This is exactly equivalent to doing pm_runtime_get() followed by a 
> check, except that it's longer.

Except that I meant something different from what I wrote, sorry about that
(must be too much heat).

What I really thought about was to do _get_noresume(), then check if the
device is active and if not, queue up a work item (or another delayed
execution in process context) that will do pm_runtime_resume() and
then access the hardware.

Why do I think it's better than plain pm_runtime_get()?  Because the code
path calling pm_runtime_resume() will know that it is safe to access the
hardware after it has returned (unless it returns an error code, but
that's exceptional).

In contrast, with pm_runtime_get() there is no way to predict when the
device is going to be resumed, so if the device happens to be suspended
when pm_runtime_get() returns, the driver kind of has to poll it until
it becomes active, or use a wait queue woken up from the resume callback,
or do all of the processing in the resume callback itself (like in the
example you mentioned).  I'm not sure if the expectation that all driver
writers will be able to implement any of these options correctly is a realistic
one.

> Yes, I agree that pm_runtime_get(dev) is nothing more than a shorthand
> way of doing pm_runtime_get_noresume(dev) followed by 
> pm_request_resume(dev).  That doesn't mean it should be eliminated.  
> Shorthands are convenient.
> 
> As another example, pm_runtime_get_sync(dev) is nothing more than a 
> shorthand for pm_runtime_get_noresume(dev) followed by 
> pm_runtime_resume(dev).  IMO, having these joint functions that do a 
> get/put combined with a suspend/resume/idle is very useful.
> 
> > Now, as far as interrupt handlers are concerned, there are two cases: either
> > the device has to be active to generate an interrupt (like PCI), or the
> > interrupt is generated by something else on behalf of it.  We don't seem to
> > handle the first case very well right now, because in that case the interrupt
> > handler will always know that the device is active, so it should do a
> > _get_noresume() and then change the device's status to "active" without
> > calling any kind of "resume", but we don't provide a helper for that.
> 
> I disagree.  Even if the device has to be active to generate an IRQ,
> it's possible for the interrupt handler to be delayed until after the
> device is suspended (unless the suspend routine explicitly calls
> synchronize_irq(), which would be pretty foolish).  Hence the handler 
> can't make any assumptions about the device's state.
> 
> >  In the
> > second case calling pm_runtime_get() doesn't really help either.  I think it's
> > better to do _get_noresume(), check the status and if "suspended", set a flag
> > for the resume callback to do something specific before returning successfully,
> > then call pm_request_resume() and return.
> 
> This is exactly equivalent to: pm_runtime_get(), check the status, if 
> "suspended" then set a flag, otherwise do the work.  Except that again, 
> it's longer.
> 
> Check out the sample driver code in section 9 of 
> Documentation/power/runtime_pm.txt, especially the foo_read_or_write() 
> routine.

Well, that shouldn't need the is_suspended flag at all, methinks, and the
reason it does need it is because it uses pm_runtime_get().  Moreover,
processing requests in the resume callback is not something I'd recommend
to anyone, because that's going to happen when the status of the device
is RPM_RESUMING, we're holding a reference on the parent (if there's one) etc.

So, it looks like I don't really agree with the example. :-)

> > > Well, IMO the naming should have been the other way around from the
> > > start.  That is, we should have made pm_runtime_get be the synchronous
> > > routine and pm_runtime_get_async be the asynchronous one.  But it's too
> > > late to change now.
> > 
> > I'm not sure it is too late.  If we first change all the instances of
> > pm_runtime_get() to pm_runtime_get_async() and then all of the instances of
> > pm_runtime_get_sync() to pm_runtime_get(), it should be technically possible.
> > Of course, it would be confusing, but that's a different matter. :-)
> 
> If you're willing to risk a certain amount of confusion, count me in.  :-)
> 
> While you're changing names around, consider also adding a "_runtime"  
> somewhere to:
> 
> 	pm_children_suspended,
> 	pm_schedule_suspend,
> 	pm_request_idle,
> 	pm_request_resume,
> 	pm_request_autosuspend.
> 
> For example, we could have pm_runtime_idle_async instead of 
> pm_request_idle.

Well, these are not as misleading as pm_runtime_get(), at least in principle.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 2, 2012, 8:16 p.m. UTC | #20
On Wed, 1 Aug 2012, Rafael J. Wysocki wrote:

> What I really thought about was to do _get_noresume(), then check if the
> device is active and if not, queue up a work item (or another delayed
> execution in process context) that will do pm_runtime_resume() and
> then access the hardware.
> 
> Why do I think it's better than plain pm_runtime_get()?  Because the code
> path calling pm_runtime_resume() will know that it is safe to access the
> hardware after it has returned (unless it returns an error code, but
> that's exceptional).
> 
> In contrast, with pm_runtime_get() there is no way to predict when the
> device is going to be resumed, so if the device happens to be suspended
> when pm_runtime_get() returns, the driver kind of has to poll it until
> it becomes active, or use a wait queue woken up from the resume callback,
> or do all of the processing in the resume callback itself (like in the
> example you mentioned).  I'm not sure if the expectation that all driver
> writers will be able to implement any of these options correctly is a realistic
> one.

I don't know about that -- the logic involved in doing the processing 
within the resume callback isn't terribly complicated.  At least, not 
much more complicated than the logic involved in setting up a custom 
work routine as you suggest.

Anyway with the existing code, driver writers are free to choose 
whichever approach they prefer.

> > Check out the sample driver code in section 9 of 
> > Documentation/power/runtime_pm.txt, especially the foo_read_or_write() 
> > routine.
> 
> Well, that shouldn't need the is_suspended flag at all, methinks, and the
> reason it does need it is because it uses pm_runtime_get().

Not so.  Consider your scheme.  When starting an I/O request, you call
pm_runtime_get_noresume() and then check to see if the device is
active, say by calling pm_runtime_suspended().  Suppose at that moment
the suspend callback has just finished and has released the private
spinlock.  The device's status is still RPM_SUSPENDING, so
pm_runtime_suspended() returns 0 and you try to carry out the I/O.

To fix this problem you have to synchronize the status checking with
the suspend/resume operations.  This means the status changes have to
occur under the protection of the private lock, which means a private
flag is needed.

>  Moreover,
> processing requests in the resume callback is not something I'd recommend
> to anyone, because that's going to happen when the status of the device
> is RPM_RESUMING, we're holding a reference on the parent (if there's one) etc.

I don't see any problem with that.  The parent's child_count will be 
incremented while the requests are being processed regardless.  And if 
you've been waiting for the device to resume in order to carry out some 
processing, within the resume callback is the logical place to do the 
work.  It avoids the overhead of a second context switch.

> So, it looks like I don't really agree with the example. :-)

Feel free to add your scheme as a second example in the document.  :-)
But please don't remove the first example, unless you can find 
something actually wrong with it.

> > While you're changing names around, consider also adding a "_runtime"  
> > somewhere to:
> > 
> > 	pm_children_suspended,
> > 	pm_schedule_suspend,
> > 	pm_request_idle,
> > 	pm_request_resume,
> > 	pm_request_autosuspend.
> > 
> > For example, we could have pm_runtime_idle_async instead of 
> > pm_request_idle.
> 
> Well, these are not as misleading as pm_runtime_get(), at least in principle.

No, but it would be good to be more consistent about our naming.  
Making sure all the function names contain "_runtime" would help.

I suppose we could keep pm_runtime_get_sync as is, and just change
pm_runtime_get to pm_runtime_get_async (and likewise for _put).  That
could reduce the confusion during the changeover.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 2, 2012, 9:26 p.m. UTC | #21
On Thursday, August 02, 2012, Alan Stern wrote:
> On Wed, 1 Aug 2012, Rafael J. Wysocki wrote:
> 
> > What I really thought about was to do _get_noresume(), then check if the
> > device is active and if not, queue up a work item (or another delayed
> > execution in process context) that will do pm_runtime_resume() and
> > then access the hardware.
> > 
> > Why do I think it's better than plain pm_runtime_get()?  Because the code
> > path calling pm_runtime_resume() will know that it is safe to access the
> > hardware after it has returned (unless it returns an error code, but
> > that's exceptional).
> > 
> > In contrast, with pm_runtime_get() there is no way to predict when the
> > device is going to be resumed, so if the device happens to be suspended
> > when pm_runtime_get() returns, the driver kind of has to poll it until
> > it becomes active, or use a wait queue woken up from the resume callback,
> > or do all of the processing in the resume callback itself (like in the
> > example you mentioned).  I'm not sure if the expectation that all driver
> > writers will be able to implement any of these options correctly is a realistic
> > one.
> 
> I don't know about that -- the logic involved in doing the processing 
> within the resume callback isn't terribly complicated.  At least, not 
> much more complicated than the logic involved in setting up a custom 
> work routine as you suggest.

That's why I had the idea of pm_request_resume_and_call(dev, func)
described in this message:

http://marc.info/?l=linux-usb&m=134377126804170&w=4

although perheps it would be better to call it something like
pm_runtime_async_resume_and_call(dev, func).

> Anyway with the existing code, driver writers are free to choose 
> whichever approach they prefer.

I wonder how many instances of pm_runtime_get() used in a wrong way there
are in the kernel right now.  I guess I'll sacrifice some time to check
that.

> > > Check out the sample driver code in section 9 of 
> > > Documentation/power/runtime_pm.txt, especially the foo_read_or_write() 
> > > routine.
> > 
> > Well, that shouldn't need the is_suspended flag at all, methinks, and the
> > reason it does need it is because it uses pm_runtime_get().
> 
> Not so.  Consider your scheme.  When starting an I/O request, you call
> pm_runtime_get_noresume() and then check to see if the device is
> active, say by calling pm_runtime_suspended().  Suppose at that moment
> the suspend callback has just finished and has released the private
> spinlock.  The device's status is still RPM_SUSPENDING, so
> pm_runtime_suspended() returns 0 and you try to carry out the I/O.
> 
> To fix this problem you have to synchronize the status checking with
> the suspend/resume operations.  This means the status changes have to
> occur under the protection of the private lock, which means a private
> flag is needed.

What about checking if the status is RPM_ACTIVE under dev->power.lock?
That's what rpm_resume() does, more or less.

> >  Moreover,
> > processing requests in the resume callback is not something I'd recommend
> > to anyone, because that's going to happen when the status of the device
> > is RPM_RESUMING, we're holding a reference on the parent (if there's one) etc.
> 
> I don't see any problem with that.  The parent's child_count will be 
> incremented while the requests are being processed regardless.  And if 
> you've been waiting for the device to resume in order to carry out some 
> processing, within the resume callback is the logical place to do the 
> work.

Unless your _driver_ callback is actually executed from within a PM domain
callback, for example, and something else may be waiting for it to complete,
so your data processing is adding latencies to some other threads.  I'm not
making that up, by the way, that really can happen.

And what if you are a parent waited for by a child to resume so that _it_
can process its data?  Would you still want to process your data in the
resume callback in that case?

> It avoids the overhead of a second context switch.

That may be avoided without processing data from within the resume callback,
although not with the current code.

> > So, it looks like I don't really agree with the example. :-)
> 
> Feel free to add your scheme as a second example in the document.  :-)
> But please don't remove the first example, unless you can find 
> something actually wrong with it.

Well, it should work in general, so it is not functionally wrong.  However,
I wouldn't like to regard it as the best thing we can offer. :-)

> > > While you're changing names around, consider also adding a "_runtime"  
> > > somewhere to:
> > > 
> > > 	pm_children_suspended,
> > > 	pm_schedule_suspend,
> > > 	pm_request_idle,
> > > 	pm_request_resume,
> > > 	pm_request_autosuspend.
> > > 
> > > For example, we could have pm_runtime_idle_async instead of 
> > > pm_request_idle.
> > 
> > Well, these are not as misleading as pm_runtime_get(), at least in principle.
> 
> No, but it would be good to be more consistent about our naming.  
> Making sure all the function names contain "_runtime" would help.
> 
> I suppose we could keep pm_runtime_get_sync as is, and just change
> pm_runtime_get to pm_runtime_get_async (and likewise for _put).  That
> could reduce the confusion during the changeover.

Changing pm_runtime_get() to pm_runtime_get_async() would be an improvement,
although pm_runtime_get_but_do_not_resume_immediately() might even be better.
Or even pm_runtime_get_but_do_not_access_hardware_when_it_returns(). ;-)

I see no reason to have any of those things, though.  Yes, they _may_ be
useful to someone knowing the runtime PM core internals to save a few lines
of code in some places, but generally they lead people to make serious mistakes
that tend to be difficult to debug.  For this very reason pm_runtime_get() is a
bad interface and I wish we hadn't introduced it at all.  Even if we give it
a more descriptive name, it won't be much better.

And note how that doesn't apply to the pm_runtime_put*() family.  After all,
doing pm_runtime_put() instead of pm_runtime_put_sync() will never lead to
accessing registers of a suspended device.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 3, 2012, 2:20 a.m. UTC | #22
On Thu, 2 Aug 2012, Rafael J. Wysocki wrote:

> > I don't know about that -- the logic involved in doing the processing 
> > within the resume callback isn't terribly complicated.  At least, not 
> > much more complicated than the logic involved in setting up a custom 
> > work routine as you suggest.
> 
> That's why I had the idea of pm_request_resume_and_call(dev, func)
> described in this message:
> 
> http://marc.info/?l=linux-usb&m=134377126804170&w=4
> 
> although perheps it would be better to call it something like
> pm_runtime_async_resume_and_call(dev, func).

Hmmm.  You'd probably want a version that does a "get" at the same 
time.  I suppose you would call func directly if the device was already 
resumed, without going through the workqueue?

Yes, I agree this would be a better interface then pm_runtime_get.

> > > Well, that shouldn't need the is_suspended flag at all, methinks, and the
> > > reason it does need it is because it uses pm_runtime_get().
> > 
> > Not so.  Consider your scheme.  When starting an I/O request, you call
> > pm_runtime_get_noresume() and then check to see if the device is
> > active, say by calling pm_runtime_suspended().  Suppose at that moment
> > the suspend callback has just finished and has released the private
> > spinlock.  The device's status is still RPM_SUSPENDING, so
> > pm_runtime_suspended() returns 0 and you try to carry out the I/O.
> > 
> > To fix this problem you have to synchronize the status checking with
> > the suspend/resume operations.  This means the status changes have to
> > occur under the protection of the private lock, which means a private
> > flag is needed.
> 
> What about checking if the status is RPM_ACTIVE under dev->power.lock?
> That's what rpm_resume() does, more or less.

That wouldn't solve the race described above.

> > >  Moreover,
> > > processing requests in the resume callback is not something I'd recommend
> > > to anyone, because that's going to happen when the status of the device
> > > is RPM_RESUMING, we're holding a reference on the parent (if there's one) etc.
> > 
> > I don't see any problem with that.  The parent's child_count will be 
> > incremented while the requests are being processed regardless.  And if 
> > you've been waiting for the device to resume in order to carry out some 
> > processing, within the resume callback is the logical place to do the 
> > work.
> 
> Unless your _driver_ callback is actually executed from within a PM domain
> callback, for example, and something else may be waiting for it to complete,
> so your data processing is adding latencies to some other threads.  I'm not
> making that up, by the way, that really can happen.
> 
> And what if you are a parent waited for by a child to resume so that _it_
> can process its data?  Would you still want to process your data in the
> resume callback in that case?

Okay, those are valid reasons.  (Although a device handling I/O 
requests isn't likely to have a child with its own independent I/O 
handling.)

> > I suppose we could keep pm_runtime_get_sync as is, and just change
> > pm_runtime_get to pm_runtime_get_async (and likewise for _put).  That
> > could reduce the confusion during the changeover.
> 
> Changing pm_runtime_get() to pm_runtime_get_async() would be an improvement,
> although pm_runtime_get_but_do_not_resume_immediately() might even be better.
> Or even pm_runtime_get_but_do_not_access_hardware_when_it_returns(). ;-)
> 
> I see no reason to have any of those things, though.  Yes, they _may_ be
> useful to someone knowing the runtime PM core internals to save a few lines
> of code in some places, but generally they lead people to make serious mistakes
> that tend to be difficult to debug.  For this very reason pm_runtime_get() is a
> bad interface and I wish we hadn't introduced it at all.  Even if we give it
> a more descriptive name, it won't be much better.
> 
> And note how that doesn't apply to the pm_runtime_put*() family.  After all,
> doing pm_runtime_put() instead of pm_runtime_put_sync() will never lead to
> accessing registers of a suspended device.

All right.  But I still think "pm_runtime_put_async" is better than
"pm_runtime_put".  At least it forces people to think about what
they're doing.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei Aug. 3, 2012, 3:37 a.m. UTC | #23
On Fri, Aug 3, 2012 at 10:20 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 2 Aug 2012, Rafael J. Wysocki wrote:

> Hmmm.  You'd probably want a version that does a "get" at the same
> time.  I suppose you would call func directly if the device was already
> resumed, without going through the workqueue?

Maybe it isn't good, the 'func' might not be run in the current context
(irq context or some spinlock is held).

>> And what if you are a parent waited for by a child to resume so that _it_
>> can process its data?  Would you still want to process your data in the
>> resume callback in that case?

Looks the child is always resumed after its parent completes the resuming,
and current pm_runtime_get doesn't delay the resume of the parent, and
just make the .runtime_resume run in another context.

Also are there actual examples about the above situation?

>
> Okay, those are valid reasons.  (Although a device handling I/O
> requests isn't likely to have a child with its own independent I/O
> handling.)

IMO, the .runtime_resume callback can handle the IO things easily
via scheduling worker function if the driver don't want to delay
its children's resume.

Thanks,
Alan Stern Aug. 3, 2012, 2:05 p.m. UTC | #24
On Thu, 2 Aug 2012, Alan Stern wrote:

> On Thu, 2 Aug 2012, Rafael J. Wysocki wrote:
> 
> > > I don't know about that -- the logic involved in doing the processing 
> > > within the resume callback isn't terribly complicated.  At least, not 
> > > much more complicated than the logic involved in setting up a custom 
> > > work routine as you suggest.
> > 
> > That's why I had the idea of pm_request_resume_and_call(dev, func)
> > described in this message:
> > 
> > http://marc.info/?l=linux-usb&m=134377126804170&w=4
> > 
> > although perheps it would be better to call it something like
> > pm_runtime_async_resume_and_call(dev, func).
> 
> Hmmm.  You'd probably want a version that does a "get" at the same 
> time.  I suppose you would call func directly if the device was already 
> resumed, without going through the workqueue?
> 
> Yes, I agree this would be a better interface then pm_runtime_get.

A few problems:

What happens if a system sleep begins after 
pm_runtime_async_resume_and_call() but before the runtime resume?  When 
the system wakes up, the device will be back at full power without ever 
executing a runtime PM call.  Then how will func get invoked?

What happens if the driver is unbound after
pm_runtime_async_resume_and_call() but before the runtime resume?  The
When the runtime resume occurs, func will be invoked -- and the driver
might not even be in memory any more.

Alan Stern


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 3, 2012, 2:28 p.m. UTC | #25
On Fri, 3 Aug 2012, Ming Lei wrote:

> On Fri, Aug 3, 2012 at 10:20 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Thu, 2 Aug 2012, Rafael J. Wysocki wrote:
> 
> > Hmmm.  You'd probably want a version that does a "get" at the same
> > time.  I suppose you would call func directly if the device was already
> > resumed, without going through the workqueue?
> 
> Maybe it isn't good, the 'func' might not be run in the current context
> (irq context or some spinlock is held).

True.  But we don't want to wait for the workqueue if the device is 
already at full power.  Suggestions?

> >> And what if you are a parent waited for by a child to resume so that _it_
> >> can process its data?  Would you still want to process your data in the
> >> resume callback in that case?
> 
> Looks the child is always resumed after its parent completes the resuming,
> and current pm_runtime_get doesn't delay the resume of the parent, and
> just make the .runtime_resume run in another context.

I don't understand.  Rafael's point was that if the parent's resume
callback did a bunch of real work, it would delay resuming the child 
because the child can't be resumed until the parent's resume callback 
returns.

> Also are there actual examples about the above situation?

I don't know of any.  I suspect there aren't many examples.  It 
wouldn't make much sense.

> IMO, the .runtime_resume callback can handle the IO things easily
> via scheduling worker function if the driver don't want to delay
> its children's resume.

That was one of Rafael's other suggestions.  But calling 
pm_request_resume_and_call() is easier than scheduling a worker 
function.

Alan Stern


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 4, 2012, 7:35 p.m. UTC | #26
On Friday, August 03, 2012, Alan Stern wrote:
> On Thu, 2 Aug 2012, Rafael J. Wysocki wrote:
> 
> > > I don't know about that -- the logic involved in doing the processing 
> > > within the resume callback isn't terribly complicated.  At least, not 
> > > much more complicated than the logic involved in setting up a custom 
> > > work routine as you suggest.
> > 
> > That's why I had the idea of pm_request_resume_and_call(dev, func)
> > described in this message:
> > 
> > http://marc.info/?l=linux-usb&m=134377126804170&w=4
> > 
> > although perheps it would be better to call it something like
> > pm_runtime_async_resume_and_call(dev, func).
> 
> Hmmm.  You'd probably want a version that does a "get" at the same 
> time.  I suppose you would call func directly if the device was already 
> resumed, without going through the workqueue?

Yes.

> Yes, I agree this would be a better interface then pm_runtime_get.

OK

> > > > Well, that shouldn't need the is_suspended flag at all, methinks, and the
> > > > reason it does need it is because it uses pm_runtime_get().
> > > 
> > > Not so.  Consider your scheme.  When starting an I/O request, you call
> > > pm_runtime_get_noresume() and then check to see if the device is
> > > active, say by calling pm_runtime_suspended().  Suppose at that moment
> > > the suspend callback has just finished and has released the private
> > > spinlock.  The device's status is still RPM_SUSPENDING, so
> > > pm_runtime_suspended() returns 0 and you try to carry out the I/O.
> > > 
> > > To fix this problem you have to synchronize the status checking with
> > > the suspend/resume operations.  This means the status changes have to
> > > occur under the protection of the private lock, which means a private
> > > flag is needed.
> > 
> > What about checking if the status is RPM_ACTIVE under dev->power.lock?
> > That's what rpm_resume() does, more or less.
> 
> That wouldn't solve the race described above.
> 
> > > >  Moreover,
> > > > processing requests in the resume callback is not something I'd recommend
> > > > to anyone, because that's going to happen when the status of the device
> > > > is RPM_RESUMING, we're holding a reference on the parent (if there's one) etc.
> > > 
> > > I don't see any problem with that.  The parent's child_count will be 
> > > incremented while the requests are being processed regardless.  And if 
> > > you've been waiting for the device to resume in order to carry out some 
> > > processing, within the resume callback is the logical place to do the 
> > > work.
> > 
> > Unless your _driver_ callback is actually executed from within a PM domain
> > callback, for example, and something else may be waiting for it to complete,
> > so your data processing is adding latencies to some other threads.  I'm not
> > making that up, by the way, that really can happen.
> > 
> > And what if you are a parent waited for by a child to resume so that _it_
> > can process its data?  Would you still want to process your data in the
> > resume callback in that case?
> 
> Okay, those are valid reasons.  (Although a device handling I/O 
> requests isn't likely to have a child with its own independent I/O 
> handling.)

I agree that this isn't very likely in practice.

> > > I suppose we could keep pm_runtime_get_sync as is, and just change
> > > pm_runtime_get to pm_runtime_get_async (and likewise for _put).  That
> > > could reduce the confusion during the changeover.
> > 
> > Changing pm_runtime_get() to pm_runtime_get_async() would be an improvement,
> > although pm_runtime_get_but_do_not_resume_immediately() might even be better.
> > Or even pm_runtime_get_but_do_not_access_hardware_when_it_returns(). ;-)
> > 
> > I see no reason to have any of those things, though.  Yes, they _may_ be
> > useful to someone knowing the runtime PM core internals to save a few lines
> > of code in some places, but generally they lead people to make serious mistakes
> > that tend to be difficult to debug.  For this very reason pm_runtime_get() is a
> > bad interface and I wish we hadn't introduced it at all.  Even if we give it
> > a more descriptive name, it won't be much better.
> > 
> > And note how that doesn't apply to the pm_runtime_put*() family.  After all,
> > doing pm_runtime_put() instead of pm_runtime_put_sync() will never lead to
> > accessing registers of a suspended device.
> 
> All right.  But I still think "pm_runtime_put_async" is better than
> "pm_runtime_put".  At least it forces people to think about what
> they're doing.

I agree.

My current plan is to provide a better alternative interface, then to change
the name of "pm_runtime_put" to "pm_runtime_put_async" and to document that
it's going to be deprecated in future.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 4, 2012, 7:47 p.m. UTC | #27
On Friday, August 03, 2012, Ming Lei wrote:
> On Fri, Aug 3, 2012 at 10:20 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Thu, 2 Aug 2012, Rafael J. Wysocki wrote:
> 
> > Hmmm.  You'd probably want a version that does a "get" at the same
> > time.  I suppose you would call func directly if the device was already
> > resumed, without going through the workqueue?
> 
> Maybe it isn't good, the 'func' might not be run in the current context
> (irq context or some spinlock is held).

Then I'd say don't use this interface.  If you have code that needs to
be run in a different context, then you have to use a work item (or
equivalent) anyway and you can do a synchronous runtime resume from there.

The problem we want to address here is when there's code that should be
run as soon as the device is active, preferably _without_ a context switch.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 4, 2012, 8:08 p.m. UTC | #28
On Friday, August 03, 2012, Alan Stern wrote:
> On Thu, 2 Aug 2012, Alan Stern wrote:
> 
> > On Thu, 2 Aug 2012, Rafael J. Wysocki wrote:
> > 
> > > > I don't know about that -- the logic involved in doing the processing 
> > > > within the resume callback isn't terribly complicated.  At least, not 
> > > > much more complicated than the logic involved in setting up a custom 
> > > > work routine as you suggest.
> > > 
> > > That's why I had the idea of pm_request_resume_and_call(dev, func)
> > > described in this message:
> > > 
> > > http://marc.info/?l=linux-usb&m=134377126804170&w=4
> > > 
> > > although perheps it would be better to call it something like
> > > pm_runtime_async_resume_and_call(dev, func).
> > 
> > Hmmm.  You'd probably want a version that does a "get" at the same 
> > time.  I suppose you would call func directly if the device was already 
> > resumed, without going through the workqueue?
> > 
> > Yes, I agree this would be a better interface then pm_runtime_get.
> 
> A few problems:

Which are good points. :-)

> What happens if a system sleep begins after 
> pm_runtime_async_resume_and_call() but before the runtime resume?  When 
> the system wakes up, the device will be back at full power without ever 
> executing a runtime PM call.  Then how will func get invoked?

The pm_runtime_barrier() at the beginning of __device_suspend() guarantees
that if there's a resume request pending at this point, the resume will be
carried out.  I think we can modify rpm_resume() to return only after func()
is executed, so there shouldn't be a problem with that particular case.

> What happens if the driver is unbound after
> pm_runtime_async_resume_and_call() but before the runtime resume?  The
> When the runtime resume occurs, func will be invoked -- and the driver
> might not even be in memory any more.

We have no protection against that at the moment, but then there is no
guarantee that the driver's runtime PM callbacks won't be run after
it's been unbound.

Perhaps we should run pm_runtime_barrier() before executing .remove()
in __device_release_driver() (and perhaps run pm_runtime_idle() afterwards
in case the subsystem or PM domain knows how to power down the device even if
there's no driver) and expect .remove() to clean up?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 4, 2012, 8:25 p.m. UTC | #29
On Sat, 4 Aug 2012, Rafael J. Wysocki wrote:

> On Friday, August 03, 2012, Ming Lei wrote:
> > On Fri, Aug 3, 2012 at 10:20 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > On Thu, 2 Aug 2012, Rafael J. Wysocki wrote:
> > 
> > > Hmmm.  You'd probably want a version that does a "get" at the same
> > > time.  I suppose you would call func directly if the device was already
> > > resumed, without going through the workqueue?
> > 
> > Maybe it isn't good, the 'func' might not be run in the current context
> > (irq context or some spinlock is held).
> 
> Then I'd say don't use this interface.  If you have code that needs to
> be run in a different context, then you have to use a work item (or
> equivalent) anyway and you can do a synchronous runtime resume from there.

That wasn't what he meant.  What if the code needs to run in the _same_
context as the caller?  For example, with a spinlock held.  func would
need to know whether it should acquire the lock, which means it needs
to know whether it was called directly or through the workqueue.

I suppose we could pass it an extra argument with this information...  
but that's a little ugly.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 4, 2012, 8:42 p.m. UTC | #30
On Sat, 4 Aug 2012, Rafael J. Wysocki wrote:

> > What happens if a system sleep begins after 
> > pm_runtime_async_resume_and_call() but before the runtime resume?  When 
> > the system wakes up, the device will be back at full power without ever 
> > executing a runtime PM call.  Then how will func get invoked?
> 
> The pm_runtime_barrier() at the beginning of __device_suspend() guarantees
> that if there's a resume request pending at this point, the resume will be
> carried out.  I think we can modify rpm_resume() to return only after func()
> is executed, so there shouldn't be a problem with that particular case.

A resume request can get added to the workqueue after
__device_suspend() runs, in which case the problem will still exist.

Should pm_runtime_set_active() call func (if such a call is pending)?

> > What happens if the driver is unbound after
> > pm_runtime_async_resume_and_call() but before the runtime resume?  The
> > When the runtime resume occurs, func will be invoked -- and the driver
> > might not even be in memory any more.
> 
> We have no protection against that at the moment, but then there is no
> guarantee that the driver's runtime PM callbacks won't be run after
> it's been unbound.

True.  We have been relying on the subsystem to handle this race, but 
obviously that's no good if the callback is directly to the driver.  I 
don't see any way to fix this in the PM core.

> Perhaps we should run pm_runtime_barrier() before executing .remove()
> in __device_release_driver()

We already do pm_runtime_get_sync() and pm_runtime_put_sync().  I'm not 
sure how necessary those calls really are ... but pm_runtime_barrier() 
would be redundant.

>  (and perhaps run pm_runtime_idle() afterwards
> in case the subsystem or PM domain knows how to power down the device even if
> there's no driver) and expect .remove() to clean up?

I can't think of anything wrong with calling pm_runtime_idle() at the
end of __device_release_driver().

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 4, 2012, 8:48 p.m. UTC | #31
On Saturday, August 04, 2012, Alan Stern wrote:
> On Sat, 4 Aug 2012, Rafael J. Wysocki wrote:
> 
> > On Friday, August 03, 2012, Ming Lei wrote:
> > > On Fri, Aug 3, 2012 at 10:20 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > > On Thu, 2 Aug 2012, Rafael J. Wysocki wrote:
> > > 
> > > > Hmmm.  You'd probably want a version that does a "get" at the same
> > > > time.  I suppose you would call func directly if the device was already
> > > > resumed, without going through the workqueue?
> > > 
> > > Maybe it isn't good, the 'func' might not be run in the current context
> > > (irq context or some spinlock is held).
> > 
> > Then I'd say don't use this interface.  If you have code that needs to
> > be run in a different context, then you have to use a work item (or
> > equivalent) anyway and you can do a synchronous runtime resume from there.
> 
> That wasn't what he meant.  What if the code needs to run in the _same_
> context as the caller?  For example, with a spinlock held.

I see.  I think it wouldn't be unreasonable to require that func should take
all of the necessary locks by itself.

However, I believe that if func is going to be called through the workqueue,
the usage counter should be incremented before the preceding resume and
decremented after func returns by the workqueue thread.

> func would need to know whether it should acquire the lock, which means it
> needs to know whether it was called directly or through the workqueue.
> 
> I suppose we could pass it an extra argument with this information...  
> but that's a little ugly.

I agree.  I'd prefer to avoid that, if possible.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 4, 2012, 8:48 p.m. UTC | #32
On Sat, 4 Aug 2012, Rafael J. Wysocki wrote:

> > That wasn't what he meant.  What if the code needs to run in the _same_
> > context as the caller?  For example, with a spinlock held.
> 
> I see.  I think it wouldn't be unreasonable to require that func should take
> all of the necessary locks by itself.

But then if func was called directly, because the device was at full
power, it would deadlock trying to acquire a lock the caller already
holds.

> However, I believe that if func is going to be called through the workqueue,
> the usage counter should be incremented before the preceding resume and
> decremented after func returns by the workqueue thread.

Okay.

> > func would need to know whether it should acquire the lock, which means it
> > needs to know whether it was called directly or through the workqueue.
> > 
> > I suppose we could pass it an extra argument with this information...  
> > but that's a little ugly.
> 
> I agree.  I'd prefer to avoid that, if possible.

Do you have any better suggestions?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 4, 2012, 8:59 p.m. UTC | #33
On Saturday, August 04, 2012, Alan Stern wrote:
> On Sat, 4 Aug 2012, Rafael J. Wysocki wrote:
> 
> > > What happens if a system sleep begins after 
> > > pm_runtime_async_resume_and_call() but before the runtime resume?  When 
> > > the system wakes up, the device will be back at full power without ever 
> > > executing a runtime PM call.  Then how will func get invoked?
> > 
> > The pm_runtime_barrier() at the beginning of __device_suspend() guarantees
> > that if there's a resume request pending at this point, the resume will be
> > carried out.  I think we can modify rpm_resume() to return only after func()
> > is executed, so there shouldn't be a problem with that particular case.
> 
> A resume request can get added to the workqueue after
> __device_suspend() runs, in which case the problem will still exist.

In theory.  But the question is who's going to add that request, since
the driver's (and possibly subsystem's/PM domain's) .suspend() callbacks
will be run by __device_suspend() and we can expect those to ensure
certain level of consistency.

Anyway, today we'll have the same problem with a runtime resume request
queued up during system suspend (after __device_suspend() has run).

> Should pm_runtime_set_active() call func (if such a call is pending)?
> 
> > > What happens if the driver is unbound after
> > > pm_runtime_async_resume_and_call() but before the runtime resume?  The
> > > When the runtime resume occurs, func will be invoked -- and the driver
> > > might not even be in memory any more.
> > 
> > We have no protection against that at the moment, but then there is no
> > guarantee that the driver's runtime PM callbacks won't be run after
> > it's been unbound.
> 
> True.  We have been relying on the subsystem to handle this race, but 
> obviously that's no good if the callback is directly to the driver.  I 
> don't see any way to fix this in the PM core.

Me neither.

> > Perhaps we should run pm_runtime_barrier() before executing .remove()
> > in __device_release_driver()
> 
> We already do pm_runtime_get_sync() and pm_runtime_put_sync().  I'm not 
> sure how necessary those calls really are ...

They are, because some platforms do some unusual things in their platform
bus type notifiers.

> but pm_runtime_barrier() would be redundant.
> 
> >  (and perhaps run pm_runtime_idle() afterwards
> > in case the subsystem or PM domain knows how to power down the device even if
> > there's no driver) and expect .remove() to clean up?
> 
> I can't think of anything wrong with calling pm_runtime_idle() at the
> end of __device_release_driver().

Yeah.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 4, 2012, 9:15 p.m. UTC | #34
On Saturday, August 04, 2012, Alan Stern wrote:
> On Sat, 4 Aug 2012, Rafael J. Wysocki wrote:
> 
> > > That wasn't what he meant.  What if the code needs to run in the _same_
> > > context as the caller?  For example, with a spinlock held.
> > 
> > I see.  I think it wouldn't be unreasonable to require that func should take
> > all of the necessary locks by itself.
> 
> But then if func was called directly, because the device was at full
> power, it would deadlock trying to acquire a lock the caller already
> holds.

I wonder why the caller may want to take any locks beforehand?

> > However, I believe that if func is going to be called through the workqueue,
> > the usage counter should be incremented before the preceding resume and
> > decremented after func returns by the workqueue thread.
> 
> Okay.
> 
> > > func would need to know whether it should acquire the lock, which means it
> > > needs to know whether it was called directly or through the workqueue.
> > > 
> > > I suppose we could pass it an extra argument with this information...  
> > > but that's a little ugly.
> > 
> > I agree.  I'd prefer to avoid that, if possible.
> 
> Do you have any better suggestions?

Hmm.  What about pm_runtime_get_and_call(dev, func_sync, func_async), where
func_sync() is to be called if the device is already active and func_async()
is to be called if it has to be resumed from the workqueue?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 4, 2012, 10:13 p.m. UTC | #35
On Sat, 4 Aug 2012, Rafael J. Wysocki wrote:

> On Saturday, August 04, 2012, Alan Stern wrote:
> > On Sat, 4 Aug 2012, Rafael J. Wysocki wrote:
> > 
> > > > That wasn't what he meant.  What if the code needs to run in the _same_
> > > > context as the caller?  For example, with a spinlock held.
> > > 
> > > I see.  I think it wouldn't be unreasonable to require that func should take
> > > all of the necessary locks by itself.
> > 
> > But then if func was called directly, because the device was at full
> > power, it would deadlock trying to acquire a lock the caller already
> > holds.
> 
> I wonder why the caller may want to take any locks beforehand?

Who knows?  :-)

The best answer may be for the caller not to hold any locks.  In the
runtime-PM document's example driver, the lock would be dropped before
the resume-and-call.

> > Do you have any better suggestions?
> 
> Hmm.  What about pm_runtime_get_and_call(dev, func_sync, func_async), where
> func_sync() is to be called if the device is already active and func_async()
> is to be called if it has to be resumed from the workqueue?

That's a little better but not much.  It would require storing two 
function pointers in the dev->power structure.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -280,8 +280,12 @@  static long local_pci_probe(void *_ddi)
 {
 	struct drv_dev_and_id *ddi = _ddi;
 	struct device *dev = &ddi->dev->dev;
+	struct device *parent = dev->parent;
 	int rc;
 
+	/* The parent bridge must be in active state when probing */
+	if (parent)
+		pm_runtime_get_sync(parent);
 	/* Unbound PCI devices are always set to disabled and suspended.
 	 * During probe, the device is set to enabled and active and the
 	 * usage count is incremented.  If the driver supports runtime PM,
@@ -298,6 +302,8 @@  static long local_pci_probe(void *_ddi)
 		pm_runtime_set_suspended(dev);
 		pm_runtime_put_noidle(dev);
 	}
+	if (parent)
+		pm_runtime_put(parent);
 	return rc;
 }