diff mbox series

[v3] PCI: Lock the pci_cfg_wait queue for the consistency of data

Message ID 20191210031527.40136-1-zhengxiang9@huawei.com
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series [v3] PCI: Lock the pci_cfg_wait queue for the consistency of data | expand

Commit Message

Xiang Zheng Dec. 10, 2019, 3:15 a.m. UTC
7ea7e98fd8d0 ("PCI: Block on access to temporarily unavailable pci
device") suggests that the "pci_lock" is sufficient, and all the
callers of pci_wait_cfg() are wrapped with the "pci_lock".

However, since the commit cdcb33f98244 ("PCI: Avoid possible deadlock on
pci_lock and p->pi_lock") merged, the accesses to the pci_cfg_wait queue
are not safe anymore. This would cause kernel panic in a very low chance
(See more detailed information from the below link). A "pci_lock" is
insufficient and we need to hold an additional queue lock while read/write
the wait queue.

So let's use the add_wait_queue()/remove_wait_queue() instead of
__add_wait_queue()/__remove_wait_queue(). Also move the wait queue
functionality around the "schedule()" function to avoid reintroducing
the deadlock addressed by "cdcb33f98244".

Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Cc: Heyi Guo <guoheyi@huawei.com>
Cc: Biaoxiang Ye <yebiaoxiang@huawei.com>
Link: https://lore.kernel.org/linux-pci/79827f2f-9b43-4411-1376-b9063b67aee3@huawei.com/
---

v3:
  Improve the commit subject and message.

v2:
  Move the wait queue functionality around the "schedule()".

---
 drivers/pci/access.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Bjorn Helgaas June 24, 2020, 11:23 p.m. UTC | #1
[+cc Stephane]

On Tue, Dec 10, 2019 at 11:15:27AM +0800, Xiang Zheng wrote:
> 7ea7e98fd8d0 ("PCI: Block on access to temporarily unavailable pci
> device") suggests that the "pci_lock" is sufficient, and all the
> callers of pci_wait_cfg() are wrapped with the "pci_lock".
> 
> However, since the commit cdcb33f98244 ("PCI: Avoid possible deadlock on
> pci_lock and p->pi_lock") merged, the accesses to the pci_cfg_wait queue
> are not safe anymore. This would cause kernel panic in a very low chance
> (See more detailed information from the below link). A "pci_lock" is
> insufficient and we need to hold an additional queue lock while read/write
> the wait queue.

<tangent>

  I'm not proud of cdcb33f98244 ("PCI: Avoid possible deadlock on
  pci_lock and p->pi_lock").

  It seems like an ad hoc solution to a problem that shouldn't exist.
  I think what it fixes is reading performance counters from PCI
  config space during a context switch when we're holding the
  task_struct pi_lock.  That doesn't seem like a path that should
  acquire pci_lock.

  I think I should have instead tried to make a lockless PCI config
  accessor that returns failure whenever we aren't allowed to read
  config space, e.g., during the recovery time after a reset or power
  state transition.  We currently *do* use pci_cfg_access_lock() to
  prevent user accesses via /proc or /sys during some of those times,
  but there's nothing that prevents kernel accesses.

  I think we're a little vulnerable there if we read those PCI
  performance counters right after changing the device power state.
  Hopefully it's nothing worse than getting ~0 data back.

</tangent>

> So let's use the add_wait_queue()/remove_wait_queue() instead of
> __add_wait_queue()/__remove_wait_queue(). Also move the wait queue
> functionality around the "schedule()" function to avoid reintroducing
> the deadlock addressed by "cdcb33f98244".
> 
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Cc: Heyi Guo <guoheyi@huawei.com>
> Cc: Biaoxiang Ye <yebiaoxiang@huawei.com>
> Link: https://lore.kernel.org/linux-pci/79827f2f-9b43-4411-1376-b9063b67aee3@huawei.com/
> ---
> 
> v3:
>   Improve the commit subject and message.
> 
> v2:
>   Move the wait queue functionality around the "schedule()".
> 
> ---
>  drivers/pci/access.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> index 2fccb5762c76..09342a74e5ea 100644
> --- a/drivers/pci/access.c
> +++ b/drivers/pci/access.c
> @@ -207,14 +207,14 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
>  {
>  	DECLARE_WAITQUEUE(wait, current);
>  
> -	__add_wait_queue(&pci_cfg_wait, &wait);
>  	do {
>  		set_current_state(TASK_UNINTERRUPTIBLE);
>  		raw_spin_unlock_irq(&pci_lock);
> +		add_wait_queue(&pci_cfg_wait, &wait);
>  		schedule();
> +		remove_wait_queue(&pci_cfg_wait, &wait);
>  		raw_spin_lock_irq(&pci_lock);
>  	} while (dev->block_cfg_access);
> -	__remove_wait_queue(&pci_cfg_wait, &wait);
>  }
>  
>  /* Returns 0 on success, negative values indicate error. */
> -- 
> 2.19.1
> 
>
Bjorn Helgaas June 24, 2020, 11:23 p.m. UTC | #2
On Tue, Dec 10, 2019 at 11:15:27AM +0800, Xiang Zheng wrote:
> 7ea7e98fd8d0 ("PCI: Block on access to temporarily unavailable pci
> device") suggests that the "pci_lock" is sufficient, and all the
> callers of pci_wait_cfg() are wrapped with the "pci_lock".
> 
> However, since the commit cdcb33f98244 ("PCI: Avoid possible deadlock on
> pci_lock and p->pi_lock") merged, the accesses to the pci_cfg_wait queue
> are not safe anymore. This would cause kernel panic in a very low chance
> (See more detailed information from the below link). A "pci_lock" is
> insufficient and we need to hold an additional queue lock while read/write
> the wait queue.
> 
> So let's use the add_wait_queue()/remove_wait_queue() instead of
> __add_wait_queue()/__remove_wait_queue(). Also move the wait queue
> functionality around the "schedule()" function to avoid reintroducing
> the deadlock addressed by "cdcb33f98244".

I see that add_wait_queue() acquires the wq_head->lock, while
__add_wait_queue() does not.

But I don't understand why the existing pci_lock is insufficient.  
pci_cfg_wait is only used in pci_wait_cfg() and
pci_cfg_access_unlock().

In pci_wait_cfg(), both __add_wait_queue() and __remove_wait_queue()
are called while holding pci_lock, so that doesn't seem like the
problem.

In pci_cfg_access_unlock(), we have:

  pci_cfg_access_unlock
    wake_up_all(&pci_cfg_wait)
      __wake_up(&pci_cfg_wait, ...)
        __wake_up_common_lock(&pci_cfg_wait, ...)
	  spin_lock(&pci_cfg_wait->lock)
	  __wake_up_common(&pci_cfg_wait, ...)
	    list_for_each_entry_safe_from(...)
	      list_add_tail(...)                <-- problem?
	  spin_unlock(&pci_cfg_wait->lock)

Is the problem that the wake_up_all() modifies the pci_cfg_wait list
without holding pci_lock?

If so, I don't quite see how the patch below fixes it.  Oh, wait,
maybe I do ... by using add_wait_queue(), we protect the list using
the *same* lock used by __wake_up_common_lock.  Is that it?

> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Cc: Heyi Guo <guoheyi@huawei.com>
> Cc: Biaoxiang Ye <yebiaoxiang@huawei.com>
> Link: https://lore.kernel.org/linux-pci/79827f2f-9b43-4411-1376-b9063b67aee3@huawei.com/
> ---
> 
> v3:
>   Improve the commit subject and message.
> 
> v2:
>   Move the wait queue functionality around the "schedule()".
> 
> ---
>  drivers/pci/access.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> index 2fccb5762c76..09342a74e5ea 100644
> --- a/drivers/pci/access.c
> +++ b/drivers/pci/access.c
> @@ -207,14 +207,14 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
>  {
>  	DECLARE_WAITQUEUE(wait, current);
>  
> -	__add_wait_queue(&pci_cfg_wait, &wait);
>  	do {
>  		set_current_state(TASK_UNINTERRUPTIBLE);
>  		raw_spin_unlock_irq(&pci_lock);
> +		add_wait_queue(&pci_cfg_wait, &wait);
>  		schedule();
> +		remove_wait_queue(&pci_cfg_wait, &wait);
>  		raw_spin_lock_irq(&pci_lock);
>  	} while (dev->block_cfg_access);
> -	__remove_wait_queue(&pci_cfg_wait, &wait);
>  }
>  
>  /* Returns 0 on success, negative values indicate error. */
> -- 
> 2.19.1
> 
>
Bjorn Helgaas June 25, 2020, 11:24 p.m. UTC | #3
On Wed, Jun 24, 2020 at 06:23:09PM -0500, Bjorn Helgaas wrote:
> On Tue, Dec 10, 2019 at 11:15:27AM +0800, Xiang Zheng wrote:
> > 7ea7e98fd8d0 ("PCI: Block on access to temporarily unavailable pci
> > device") suggests that the "pci_lock" is sufficient, and all the
> > callers of pci_wait_cfg() are wrapped with the "pci_lock".
> > 
> > However, since the commit cdcb33f98244 ("PCI: Avoid possible deadlock on
> > pci_lock and p->pi_lock") merged, the accesses to the pci_cfg_wait queue
> > are not safe anymore. This would cause kernel panic in a very low chance
> > (See more detailed information from the below link). A "pci_lock" is
> > insufficient and we need to hold an additional queue lock while read/write
> > the wait queue.
> > 
> > So let's use the add_wait_queue()/remove_wait_queue() instead of
> > __add_wait_queue()/__remove_wait_queue(). Also move the wait queue
> > functionality around the "schedule()" function to avoid reintroducing
> > the deadlock addressed by "cdcb33f98244".
> 
> I see that add_wait_queue() acquires the wq_head->lock, while
> __add_wait_queue() does not.
> 
> But I don't understand why the existing pci_lock is insufficient.  
> pci_cfg_wait is only used in pci_wait_cfg() and
> pci_cfg_access_unlock().
> 
> In pci_wait_cfg(), both __add_wait_queue() and __remove_wait_queue()
> are called while holding pci_lock, so that doesn't seem like the
> problem.
> 
> In pci_cfg_access_unlock(), we have:
> 
>   pci_cfg_access_unlock
>     wake_up_all(&pci_cfg_wait)
>       __wake_up(&pci_cfg_wait, ...)
>         __wake_up_common_lock(&pci_cfg_wait, ...)
> 	  spin_lock(&pci_cfg_wait->lock)
> 	  __wake_up_common(&pci_cfg_wait, ...)
> 	    list_for_each_entry_safe_from(...)
> 	      list_add_tail(...)                <-- problem?
> 	  spin_unlock(&pci_cfg_wait->lock)
> 
> Is the problem that the wake_up_all() modifies the pci_cfg_wait list
> without holding pci_lock?
> 
> If so, I don't quite see how the patch below fixes it.  Oh, wait,
> maybe I do ... by using add_wait_queue(), we protect the list using
> the *same* lock used by __wake_up_common_lock.  Is that it?

Any reaction to the following?  Certainly not as optimized, but also a
little less magic and more in the mainstream of wait_event/wake_up
usage.

I don't claim any real wait queue knowledge and haven't tested it.
There are only a handful of __add_wait_queue() users compared with
over 1600 users of wait_event() and variants, and I don't like being
such a special case.

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 79c4a2ef269a..7c2222bddbff 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -205,16 +205,11 @@ static DECLARE_WAIT_QUEUE_HEAD(pci_cfg_wait);
 
 static noinline void pci_wait_cfg(struct pci_dev *dev)
 {
-	DECLARE_WAITQUEUE(wait, current);
-
-	__add_wait_queue(&pci_cfg_wait, &wait);
 	do {
-		set_current_state(TASK_UNINTERRUPTIBLE);
 		raw_spin_unlock_irq(&pci_lock);
-		schedule();
+		wait_event(pci_cfg_wait, !dev->block_cfg_access);
 		raw_spin_lock_irq(&pci_lock);
 	} while (dev->block_cfg_access);
-	__remove_wait_queue(&pci_cfg_wait, &wait);
 }
 
 /* Returns 0 on success, negative values indicate error. */
Xiang Zheng June 28, 2020, 4:17 a.m. UTC | #4
Hi Bjorn,

Sorry for the late reply, I had Dragon Boat Festival these days.

On 2020/6/25 7:23, Bjorn Helgaas wrote:
> On Tue, Dec 10, 2019 at 11:15:27AM +0800, Xiang Zheng wrote:
>> 7ea7e98fd8d0 ("PCI: Block on access to temporarily unavailable pci
>> device") suggests that the "pci_lock" is sufficient, and all the
>> callers of pci_wait_cfg() are wrapped with the "pci_lock".
>>
>> However, since the commit cdcb33f98244 ("PCI: Avoid possible deadlock on
>> pci_lock and p->pi_lock") merged, the accesses to the pci_cfg_wait queue
>> are not safe anymore. This would cause kernel panic in a very low chance
>> (See more detailed information from the below link). A "pci_lock" is
>> insufficient and we need to hold an additional queue lock while read/write
>> the wait queue.
>>
>> So let's use the add_wait_queue()/remove_wait_queue() instead of
>> __add_wait_queue()/__remove_wait_queue(). Also move the wait queue
>> functionality around the "schedule()" function to avoid reintroducing
>> the deadlock addressed by "cdcb33f98244".
> 
> I see that add_wait_queue() acquires the wq_head->lock, while
> __add_wait_queue() does not.
> 
> But I don't understand why the existing pci_lock is insufficient.  
> pci_cfg_wait is only used in pci_wait_cfg() and
> pci_cfg_access_unlock().
> 
> In pci_wait_cfg(), both __add_wait_queue() and __remove_wait_queue()
> are called while holding pci_lock, so that doesn't seem like the
> problem.
> 
> In pci_cfg_access_unlock(), we have:
> 
>   pci_cfg_access_unlock
>     wake_up_all(&pci_cfg_wait)
>       __wake_up(&pci_cfg_wait, ...)
>         __wake_up_common_lock(&pci_cfg_wait, ...)
> 	  spin_lock(&pci_cfg_wait->lock)
> 	  __wake_up_common(&pci_cfg_wait, ...)
> 	    list_for_each_entry_safe_from(...)
> 	      list_add_tail(...)                <-- problem?
> 	  spin_unlock(&pci_cfg_wait->lock)
> 
> Is the problem that the wake_up_all() modifies the pci_cfg_wait list
> without holding pci_lock?
> 
> If so, I don't quite see how the patch below fixes it.  Oh, wait,
> maybe I do ... by using add_wait_queue(), we protect the list using
> the *same* lock used by __wake_up_common_lock.  Is that it?
> 

Yes, my patch just protects the wait queue list by using add_wait_queue().
Simply using the add_wait_queue() instead of __add_wait_queue() will reintroduce
the deadlock addressed by "cdcb33f98244". So I move add_wait_queue() and
remote_wait_queue() around schedule() since they don't need to hold pci_lock.


>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>> Cc: Heyi Guo <guoheyi@huawei.com>
>> Cc: Biaoxiang Ye <yebiaoxiang@huawei.com>
>> Link: https://lore.kernel.org/linux-pci/79827f2f-9b43-4411-1376-b9063b67aee3@huawei.com/
>> ---
>>
>> v3:
>>   Improve the commit subject and message.
>>
>> v2:
>>   Move the wait queue functionality around the "schedule()".
>>
>> ---
>>  drivers/pci/access.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
>> index 2fccb5762c76..09342a74e5ea 100644
>> --- a/drivers/pci/access.c
>> +++ b/drivers/pci/access.c
>> @@ -207,14 +207,14 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
>>  {
>>  	DECLARE_WAITQUEUE(wait, current);
>>  
>> -	__add_wait_queue(&pci_cfg_wait, &wait);
>>  	do {
>>  		set_current_state(TASK_UNINTERRUPTIBLE);
>>  		raw_spin_unlock_irq(&pci_lock);
>> +		add_wait_queue(&pci_cfg_wait, &wait);
>>  		schedule();
>> +		remove_wait_queue(&pci_cfg_wait, &wait);
>>  		raw_spin_lock_irq(&pci_lock);
>>  	} while (dev->block_cfg_access);
>> -	__remove_wait_queue(&pci_cfg_wait, &wait);
>>  }
>>  
>>  /* Returns 0 on success, negative values indicate error. */
>> -- 
>> 2.19.1
>>
>>
> 
> .
>
Xiang Zheng June 28, 2020, 4:18 a.m. UTC | #5
On 2020/6/26 7:24, Bjorn Helgaas wrote:
> On Wed, Jun 24, 2020 at 06:23:09PM -0500, Bjorn Helgaas wrote:
>> On Tue, Dec 10, 2019 at 11:15:27AM +0800, Xiang Zheng wrote:
>>> 7ea7e98fd8d0 ("PCI: Block on access to temporarily unavailable pci
>>> device") suggests that the "pci_lock" is sufficient, and all the
>>> callers of pci_wait_cfg() are wrapped with the "pci_lock".
>>>
>>> However, since the commit cdcb33f98244 ("PCI: Avoid possible deadlock on
>>> pci_lock and p->pi_lock") merged, the accesses to the pci_cfg_wait queue
>>> are not safe anymore. This would cause kernel panic in a very low chance
>>> (See more detailed information from the below link). A "pci_lock" is
>>> insufficient and we need to hold an additional queue lock while read/write
>>> the wait queue.
>>>
>>> So let's use the add_wait_queue()/remove_wait_queue() instead of
>>> __add_wait_queue()/__remove_wait_queue(). Also move the wait queue
>>> functionality around the "schedule()" function to avoid reintroducing
>>> the deadlock addressed by "cdcb33f98244".
>>
>> I see that add_wait_queue() acquires the wq_head->lock, while
>> __add_wait_queue() does not.
>>
>> But I don't understand why the existing pci_lock is insufficient.  
>> pci_cfg_wait is only used in pci_wait_cfg() and
>> pci_cfg_access_unlock().
>>
>> In pci_wait_cfg(), both __add_wait_queue() and __remove_wait_queue()
>> are called while holding pci_lock, so that doesn't seem like the
>> problem.
>>
>> In pci_cfg_access_unlock(), we have:
>>
>>   pci_cfg_access_unlock
>>     wake_up_all(&pci_cfg_wait)
>>       __wake_up(&pci_cfg_wait, ...)
>>         __wake_up_common_lock(&pci_cfg_wait, ...)
>> 	  spin_lock(&pci_cfg_wait->lock)
>> 	  __wake_up_common(&pci_cfg_wait, ...)
>> 	    list_for_each_entry_safe_from(...)
>> 	      list_add_tail(...)                <-- problem?
>> 	  spin_unlock(&pci_cfg_wait->lock)
>>
>> Is the problem that the wake_up_all() modifies the pci_cfg_wait list
>> without holding pci_lock?
>>
>> If so, I don't quite see how the patch below fixes it.  Oh, wait,
>> maybe I do ... by using add_wait_queue(), we protect the list using
>> the *same* lock used by __wake_up_common_lock.  Is that it?
> 
> Any reaction to the following?  Certainly not as optimized, but also a
> little less magic and more in the mainstream of wait_event/wake_up
> usage.
> 
> I don't claim any real wait queue knowledge and haven't tested it.
> There are only a handful of __add_wait_queue() users compared with
> over 1600 users of wait_event() and variants, and I don't like being
> such a special case.
> 

I think the following patch is OK, even though I prefer mine. :)
I can test your patch on my testcase(with hacked 300ms delay before "curr->func"
in __wake_up_common()). And if James has more efficient testcase or measure for
this problem, then go with James.

> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> index 79c4a2ef269a..7c2222bddbff 100644
> --- a/drivers/pci/access.c
> +++ b/drivers/pci/access.c
> @@ -205,16 +205,11 @@ static DECLARE_WAIT_QUEUE_HEAD(pci_cfg_wait);
>  
>  static noinline void pci_wait_cfg(struct pci_dev *dev)
>  {
> -	DECLARE_WAITQUEUE(wait, current);
> -
> -	__add_wait_queue(&pci_cfg_wait, &wait);
>  	do {
> -		set_current_state(TASK_UNINTERRUPTIBLE);
>  		raw_spin_unlock_irq(&pci_lock);
> -		schedule();
> +		wait_event(pci_cfg_wait, !dev->block_cfg_access);
>  		raw_spin_lock_irq(&pci_lock);
>  	} while (dev->block_cfg_access);
> -	__remove_wait_queue(&pci_cfg_wait, &wait);
>  }
>  
>  /* Returns 0 on success, negative values indicate error. */
> 
> .
>
Bjorn Helgaas June 28, 2020, 4:14 p.m. UTC | #6
On Sun, Jun 28, 2020 at 12:18:10PM +0800, Xiang Zheng wrote:
> On 2020/6/26 7:24, Bjorn Helgaas wrote:
> > On Wed, Jun 24, 2020 at 06:23:09PM -0500, Bjorn Helgaas wrote:
> >> On Tue, Dec 10, 2019 at 11:15:27AM +0800, Xiang Zheng wrote:
> >>> 7ea7e98fd8d0 ("PCI: Block on access to temporarily unavailable pci
> >>> device") suggests that the "pci_lock" is sufficient, and all the
> >>> callers of pci_wait_cfg() are wrapped with the "pci_lock".
> >>>
> >>> However, since the commit cdcb33f98244 ("PCI: Avoid possible deadlock on
> >>> pci_lock and p->pi_lock") merged, the accesses to the pci_cfg_wait queue
> >>> are not safe anymore. This would cause kernel panic in a very low chance
> >>> (See more detailed information from the below link). A "pci_lock" is
> >>> insufficient and we need to hold an additional queue lock while read/write
> >>> the wait queue.
> >>>
> >>> So let's use the add_wait_queue()/remove_wait_queue() instead of
> >>> __add_wait_queue()/__remove_wait_queue(). Also move the wait queue
> >>> functionality around the "schedule()" function to avoid reintroducing
> >>> the deadlock addressed by "cdcb33f98244".
> >>
> >> I see that add_wait_queue() acquires the wq_head->lock, while
> >> __add_wait_queue() does not.
> >>
> >> But I don't understand why the existing pci_lock is insufficient.  
> >> pci_cfg_wait is only used in pci_wait_cfg() and
> >> pci_cfg_access_unlock().
> >>
> >> In pci_wait_cfg(), both __add_wait_queue() and __remove_wait_queue()
> >> are called while holding pci_lock, so that doesn't seem like the
> >> problem.
> >>
> >> In pci_cfg_access_unlock(), we have:
> >>
> >>   pci_cfg_access_unlock
> >>     wake_up_all(&pci_cfg_wait)
> >>       __wake_up(&pci_cfg_wait, ...)
> >>         __wake_up_common_lock(&pci_cfg_wait, ...)
> >> 	  spin_lock(&pci_cfg_wait->lock)
> >> 	  __wake_up_common(&pci_cfg_wait, ...)
> >> 	    list_for_each_entry_safe_from(...)
> >> 	      list_add_tail(...)                <-- problem?
> >> 	  spin_unlock(&pci_cfg_wait->lock)
> >>
> >> Is the problem that the wake_up_all() modifies the pci_cfg_wait list
> >> without holding pci_lock?
> >>
> >> If so, I don't quite see how the patch below fixes it.  Oh, wait,
> >> maybe I do ... by using add_wait_queue(), we protect the list using
> >> the *same* lock used by __wake_up_common_lock.  Is that it?
> > 
> > Any reaction to the following?  Certainly not as optimized, but also a
> > little less magic and more in the mainstream of wait_event/wake_up
> > usage.
> > 
> > I don't claim any real wait queue knowledge and haven't tested it.
> > There are only a handful of __add_wait_queue() users compared with
> > over 1600 users of wait_event() and variants, and I don't like being
> > such a special case.
> 
> I think the following patch is OK, even though I prefer mine. :)

Possibility A:

        do {
                set_current_state(TASK_UNINTERRUPTIBLE);
                raw_spin_unlock_irq(&pci_lock);
                add_wait_queue(&pci_cfg_wait, &wait);
                schedule();
                remove_wait_queue(&pci_cfg_wait, &wait);
                raw_spin_lock_irq(&pci_lock);
        } while (dev->block_cfg_access);

Possibility B:

        do {
                raw_spin_unlock_irq(&pci_lock);
                wait_event(pci_cfg_wait, !dev->block_cfg_access);
                raw_spin_lock_irq(&pci_lock);
        } while (dev->block_cfg_access);

I think both ways probably work.  

I prefer B because there's less chance for error -- it requires less
knowledge of the internals of wait/wake_up and we don't have to worry
about the ordering of set_current_state(), raw_spin_unlock_irq(),
add_wait_queue(), schedule(), and remove_wait_queue().

I really don't know much about wait queues, so I'm interested in why
you prefer A.

> I can test your patch on my testcase(with hacked 300ms delay before
> "curr->func" in __wake_up_common()). And if James has more efficient
> testcase or measure for this problem, then go with James.

That would be great, thank you!  Let me know how it goes.

> > diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> > index 79c4a2ef269a..7c2222bddbff 100644
> > --- a/drivers/pci/access.c
> > +++ b/drivers/pci/access.c
> > @@ -205,16 +205,11 @@ static DECLARE_WAIT_QUEUE_HEAD(pci_cfg_wait);
> >  
> >  static noinline void pci_wait_cfg(struct pci_dev *dev)
> >  {
> > -	DECLARE_WAITQUEUE(wait, current);
> > -
> > -	__add_wait_queue(&pci_cfg_wait, &wait);
> >  	do {
> > -		set_current_state(TASK_UNINTERRUPTIBLE);
> >  		raw_spin_unlock_irq(&pci_lock);
> > -		schedule();
> > +		wait_event(pci_cfg_wait, !dev->block_cfg_access);
> >  		raw_spin_lock_irq(&pci_lock);
> >  	} while (dev->block_cfg_access);
> > -	__remove_wait_queue(&pci_cfg_wait, &wait);
> >  }
> >  
> >  /* Returns 0 on success, negative values indicate error. */
> > 
> > .
> > 
> 
> -- 
> Thanks,
> Xiang
>
Xiang Zheng July 2, 2020, 9:41 a.m. UTC | #7
On 2020/6/29 0:14, Bjorn Helgaas wrote:
> On Sun, Jun 28, 2020 at 12:18:10PM +0800, Xiang Zheng wrote:
>> On 2020/6/26 7:24, Bjorn Helgaas wrote:
>>> On Wed, Jun 24, 2020 at 06:23:09PM -0500, Bjorn Helgaas wrote:
>>>> On Tue, Dec 10, 2019 at 11:15:27AM +0800, Xiang Zheng wrote:
>>>>> 7ea7e98fd8d0 ("PCI: Block on access to temporarily unavailable pci
>>>>> device") suggests that the "pci_lock" is sufficient, and all the
>>>>> callers of pci_wait_cfg() are wrapped with the "pci_lock".
>>>>>
>>>>> However, since the commit cdcb33f98244 ("PCI: Avoid possible deadlock on
>>>>> pci_lock and p->pi_lock") merged, the accesses to the pci_cfg_wait queue
>>>>> are not safe anymore. This would cause kernel panic in a very low chance
>>>>> (See more detailed information from the below link). A "pci_lock" is
>>>>> insufficient and we need to hold an additional queue lock while read/write
>>>>> the wait queue.
>>>>>
>>>>> So let's use the add_wait_queue()/remove_wait_queue() instead of
>>>>> __add_wait_queue()/__remove_wait_queue(). Also move the wait queue
>>>>> functionality around the "schedule()" function to avoid reintroducing
>>>>> the deadlock addressed by "cdcb33f98244".
>>>>
>>>> I see that add_wait_queue() acquires the wq_head->lock, while
>>>> __add_wait_queue() does not.
>>>>
>>>> But I don't understand why the existing pci_lock is insufficient.  
>>>> pci_cfg_wait is only used in pci_wait_cfg() and
>>>> pci_cfg_access_unlock().
>>>>
>>>> In pci_wait_cfg(), both __add_wait_queue() and __remove_wait_queue()
>>>> are called while holding pci_lock, so that doesn't seem like the
>>>> problem.
>>>>
>>>> In pci_cfg_access_unlock(), we have:
>>>>
>>>>   pci_cfg_access_unlock
>>>>     wake_up_all(&pci_cfg_wait)
>>>>       __wake_up(&pci_cfg_wait, ...)
>>>>         __wake_up_common_lock(&pci_cfg_wait, ...)
>>>> 	  spin_lock(&pci_cfg_wait->lock)
>>>> 	  __wake_up_common(&pci_cfg_wait, ...)
>>>> 	    list_for_each_entry_safe_from(...)
>>>> 	      list_add_tail(...)                <-- problem?
>>>> 	  spin_unlock(&pci_cfg_wait->lock)
>>>>
>>>> Is the problem that the wake_up_all() modifies the pci_cfg_wait list
>>>> without holding pci_lock?
>>>>
>>>> If so, I don't quite see how the patch below fixes it.  Oh, wait,
>>>> maybe I do ... by using add_wait_queue(), we protect the list using
>>>> the *same* lock used by __wake_up_common_lock.  Is that it?
>>>
>>> Any reaction to the following?  Certainly not as optimized, but also a
>>> little less magic and more in the mainstream of wait_event/wake_up
>>> usage.
>>>
>>> I don't claim any real wait queue knowledge and haven't tested it.
>>> There are only a handful of __add_wait_queue() users compared with
>>> over 1600 users of wait_event() and variants, and I don't like being
>>> such a special case.
>>
>> I think the following patch is OK, even though I prefer mine. :)
> 
> Possibility A:
> 
>         do {
>                 set_current_state(TASK_UNINTERRUPTIBLE);
>                 raw_spin_unlock_irq(&pci_lock);
>                 add_wait_queue(&pci_cfg_wait, &wait);
>                 schedule();
>                 remove_wait_queue(&pci_cfg_wait, &wait);
>                 raw_spin_lock_irq(&pci_lock);
>         } while (dev->block_cfg_access);
> 
> Possibility B:
> 
>         do {
>                 raw_spin_unlock_irq(&pci_lock);
>                 wait_event(pci_cfg_wait, !dev->block_cfg_access);
>                 raw_spin_lock_irq(&pci_lock);
>         } while (dev->block_cfg_access);
> 
> I think both ways probably work.  
> 
> I prefer B because there's less chance for error -- it requires less
> knowledge of the internals of wait/wake_up and we don't have to worry
> about the ordering of set_current_state(), raw_spin_unlock_irq(),
> add_wait_queue(), schedule(), and remove_wait_queue().
> 
> I really don't know much about wait queues, so I'm interested in why
> you prefer A.
> 

Hmm...I also think B is much better than A as you describe above.
I'am not sure that whether "dev->block_cfg_access" is safe, at least
the "do{...}while" cannot be removed.

>> I can test your patch on my testcase(with hacked 300ms delay before
>> "curr->func" in __wake_up_common()). And if James has more efficient
>> testcase or measure for this problem, then go with James.
> 
> That would be great, thank you!  Let me know how it goes.

I need to make some hacking codes to test your patch, some like:

--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -206,19 +206,12 @@ static DECLARE_WAIT_QUEUE_HEAD(pci_cfg_wait);

 static noinline void pci_wait_cfg(struct pci_dev *dev)
 {
-       DECLARE_WAITQUEUE(wait, current);
-       wait.flags = WQ_FLAG_PCI;
-
-       __add_wait_queue(&pci_cfg_wait, &wait);
        do {
                set_current_state(TASK_UNINTERRUPTIBLE);
                raw_spin_unlock_irq(&pci_lock);
-               schedule();
+               wait_event_flags(pci_cfg_wait, !dev->block_cfg_access, WQ_FLAG_PCI);
                raw_spin_lock_irq(&pci_lock);
        } while (dev->block_cfg_access);
-       __remove_wait_queue(&pci_cfg_wait, &wait);
 }
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -4,8 +4,12 @@
  *
  * (C) 2004 Nadia Yvette Chambers, Oracle
  */
+#include <linux/delay.h>
+
 #include "sched.h"

+unsigned long wake_up_delay_ms;
+
 void __init_waitqueue_head(struct wait_queue_head *wq_head, const char *name, struct lock_class_key *k
ey)
 {
        spin_lock_init(&wq_head->lock);
@@ -90,6 +94,10 @@ static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode,
                if (flags & WQ_FLAG_BOOKMARK)
                        continue;

+               if (flags & WQ_FLAG_PCI && wake_up_delay_ms) {
+                       mdelay(wake_up_delay_ms);
+               }
+
                ret = curr->func(curr, mode, wake_flags, key);
                if (ret < 0)
                        break;


I tested it both on 4.19+ and mainline(5.8.0-rc3+). It's much difficult
to reproduce the kernel panic on mainline(I don't know why).

Anyway, all is well with your patch.

Tested-by: Xiang Zheng <zhengxiang9@huawei.com>

> 
>>> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
>>> index 79c4a2ef269a..7c2222bddbff 100644
>>> --- a/drivers/pci/access.c
>>> +++ b/drivers/pci/access.c
>>> @@ -205,16 +205,11 @@ static DECLARE_WAIT_QUEUE_HEAD(pci_cfg_wait);
>>>  
>>>  static noinline void pci_wait_cfg(struct pci_dev *dev)
>>>  {
>>> -	DECLARE_WAITQUEUE(wait, current);
>>> -
>>> -	__add_wait_queue(&pci_cfg_wait, &wait);
>>>  	do {
>>> -		set_current_state(TASK_UNINTERRUPTIBLE);
>>>  		raw_spin_unlock_irq(&pci_lock);
>>> -		schedule();
>>> +		wait_event(pci_cfg_wait, !dev->block_cfg_access);
>>>  		raw_spin_lock_irq(&pci_lock);
>>>  	} while (dev->block_cfg_access);
>>> -	__remove_wait_queue(&pci_cfg_wait, &wait);
>>>  }
>>>  
>>>  /* Returns 0 on success, negative values indicate error. */
>>>
>>> .
>>>
>>
>> -- 
>> Thanks,
>> Xiang
>>
> 
> .
>
diff mbox series

Patch

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 2fccb5762c76..09342a74e5ea 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -207,14 +207,14 @@  static noinline void pci_wait_cfg(struct pci_dev *dev)
 {
 	DECLARE_WAITQUEUE(wait, current);
 
-	__add_wait_queue(&pci_cfg_wait, &wait);
 	do {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		raw_spin_unlock_irq(&pci_lock);
+		add_wait_queue(&pci_cfg_wait, &wait);
 		schedule();
+		remove_wait_queue(&pci_cfg_wait, &wait);
 		raw_spin_lock_irq(&pci_lock);
 	} while (dev->block_cfg_access);
-	__remove_wait_queue(&pci_cfg_wait, &wait);
 }
 
 /* Returns 0 on success, negative values indicate error. */