diff mbox series

[v2] pci: lock the pci_cfg_wait queue for the consistency of data

Message ID 20191119011545.15408-1-zhengxiang9@huawei.com (mailing list archive)
State Superseded, archived
Headers show
Series [v2] pci: lock the pci_cfg_wait queue for the consistency of data | expand

Commit Message

Xiang Zheng Nov. 19, 2019, 1:15 a.m. UTC
Commit "7ea7e98fd8d0" suggests that the "pci_lock" is sufficient,
and all the callers of pci_wait_cfg() are wrapped with the "pci_lock".

However, since the commit "cdcb33f98244" merged, the accesses to
the pci_cfg_wait queue are not safe anymore. A "pci_lock" is
insufficient and we need to hold an additional queue lock while
read/write the wait queue.

So let's use the add_wait_queue()/remove_wait_queue() instead of
__add_wait_queue()/__remove_wait_queue(). Also move the wait queue
functionality around the "schedule()" function to avoid reintroducing
the deadlock addressed by "cdcb33f98244".

Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Cc: Heyi Guo <guoheyi@huawei.com>
Cc: Biaoxiang Ye <yebiaoxiang@huawei.com>
---

v2:
 - Move the wait queue functionality around the "schedule()" function to
   avoid reintroducing the deadlock addressed by "cdcb33f98244"

---

 drivers/pci/access.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Bjorn Helgaas Nov. 19, 2019, 8:23 p.m. UTC | #1
On Tue, Nov 19, 2019 at 09:15:45AM +0800, Xiang Zheng wrote:
> Commit "7ea7e98fd8d0" suggests that the "pci_lock" is sufficient,
> and all the callers of pci_wait_cfg() are wrapped with the "pci_lock".
> 
> However, since the commit "cdcb33f98244" merged, the accesses to
> the pci_cfg_wait queue are not safe anymore. A "pci_lock" is
> insufficient and we need to hold an additional queue lock while
> read/write the wait queue.
> 
> So let's use the add_wait_queue()/remove_wait_queue() instead of
> __add_wait_queue()/__remove_wait_queue(). Also move the wait queue
> functionality around the "schedule()" function to avoid reintroducing
> the deadlock addressed by "cdcb33f98244".

Procedural nits:

  - Run "git log --oneline drivers/pci/access.c" and follow the
    convention, e.g., starts with "PCI: " and first subsequent word is
    capitalized.

  - Use conventional commit references, e.g., 7ea7e98fd8d0 ("PCI:
    Block on access to temporarily unavailable pci device") and
    cdcb33f98244 ("PCI: Avoid possible deadlock on pci_lock and
    p->pi_lock")

  - IIRC you found that this actually caused a panic; please include
    the lore.kernel.org URL to that report.

You can wait for a while to see if there are more substantive comments
to address before posting a v3.

> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Cc: Heyi Guo <guoheyi@huawei.com>
> Cc: Biaoxiang Ye <yebiaoxiang@huawei.com>
> ---
> 
> v2:
>  - Move the wait queue functionality around the "schedule()" function to
>    avoid reintroducing the deadlock addressed by "cdcb33f98244"
> 
> ---
> 
>  drivers/pci/access.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> index 2fccb5762c76..09342a74e5ea 100644
> --- a/drivers/pci/access.c
> +++ b/drivers/pci/access.c
> @@ -207,14 +207,14 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
>  {
>  	DECLARE_WAITQUEUE(wait, current);
>  
> -	__add_wait_queue(&pci_cfg_wait, &wait);
>  	do {
>  		set_current_state(TASK_UNINTERRUPTIBLE);
>  		raw_spin_unlock_irq(&pci_lock);
> +		add_wait_queue(&pci_cfg_wait, &wait);
>  		schedule();
> +		remove_wait_queue(&pci_cfg_wait, &wait);
>  		raw_spin_lock_irq(&pci_lock);
>  	} while (dev->block_cfg_access);
> -	__remove_wait_queue(&pci_cfg_wait, &wait);
>  }
>  
>  /* Returns 0 on success, negative values indicate error. */
> -- 
> 2.19.1
> 
>
Xiang Zheng Nov. 20, 2019, 6:18 a.m. UTC | #2
On 2019/11/20 4:23, Bjorn Helgaas wrote:
> On Tue, Nov 19, 2019 at 09:15:45AM +0800, Xiang Zheng wrote:
>> Commit "7ea7e98fd8d0" suggests that the "pci_lock" is sufficient,
>> and all the callers of pci_wait_cfg() are wrapped with the "pci_lock".
>>
>> However, since the commit "cdcb33f98244" merged, the accesses to
>> the pci_cfg_wait queue are not safe anymore. A "pci_lock" is
>> insufficient and we need to hold an additional queue lock while
>> read/write the wait queue.
>>
>> So let's use the add_wait_queue()/remove_wait_queue() instead of
>> __add_wait_queue()/__remove_wait_queue(). Also move the wait queue
>> functionality around the "schedule()" function to avoid reintroducing
>> the deadlock addressed by "cdcb33f98244".
> 
> Procedural nits:
> 
>   - Run "git log --oneline drivers/pci/access.c" and follow the
>     convention, e.g., starts with "PCI: " and first subsequent word is
>     capitalized.
> 
>   - Use conventional commit references, e.g., 7ea7e98fd8d0 ("PCI:
>     Block on access to temporarily unavailable pci device") and
>     cdcb33f98244 ("PCI: Avoid possible deadlock on pci_lock and
>     p->pi_lock")
> 
>   - IIRC you found that this actually caused a panic; please include
>     the lore.kernel.org URL to that report.
> 

Got it, I will address these nits.

> You can wait for a while to see if there are more substantive comments
> to address before posting a v3.
> 

OK.

>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>> Cc: Heyi Guo <guoheyi@huawei.com>
>> Cc: Biaoxiang Ye <yebiaoxiang@huawei.com>
>> ---
>>
>> v2:
>>  - Move the wait queue functionality around the "schedule()" function to
>>    avoid reintroducing the deadlock addressed by "cdcb33f98244"
>>
>> ---
>>
>>  drivers/pci/access.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
>> index 2fccb5762c76..09342a74e5ea 100644
>> --- a/drivers/pci/access.c
>> +++ b/drivers/pci/access.c
>> @@ -207,14 +207,14 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
>>  {
>>  	DECLARE_WAITQUEUE(wait, current);
>>  
>> -	__add_wait_queue(&pci_cfg_wait, &wait);
>>  	do {
>>  		set_current_state(TASK_UNINTERRUPTIBLE);
>>  		raw_spin_unlock_irq(&pci_lock);
>> +		add_wait_queue(&pci_cfg_wait, &wait);
>>  		schedule();
>> +		remove_wait_queue(&pci_cfg_wait, &wait);
>>  		raw_spin_lock_irq(&pci_lock);
>>  	} while (dev->block_cfg_access);
>> -	__remove_wait_queue(&pci_cfg_wait, &wait);
>>  }
>>  
>>  /* Returns 0 on success, negative values indicate error. */
>> -- 
>> 2.19.1
>>
>>
> 
> .
>
diff mbox series

Patch

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 2fccb5762c76..09342a74e5ea 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -207,14 +207,14 @@  static noinline void pci_wait_cfg(struct pci_dev *dev)
 {
 	DECLARE_WAITQUEUE(wait, current);
 
-	__add_wait_queue(&pci_cfg_wait, &wait);
 	do {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		raw_spin_unlock_irq(&pci_lock);
+		add_wait_queue(&pci_cfg_wait, &wait);
 		schedule();
+		remove_wait_queue(&pci_cfg_wait, &wait);
 		raw_spin_lock_irq(&pci_lock);
 	} while (dev->block_cfg_access);
-	__remove_wait_queue(&pci_cfg_wait, &wait);
 }
 
 /* Returns 0 on success, negative values indicate error. */