[RFC,v2,3/3] vfio-ccw: Release any channel program when releasing/removing vfio-ccw mdev
diff mbox series

Message ID ae9f20dc8873f2027f7b3c5d2aaa0bdfe06850b8.1554756534.git.alifm@linux.ibm.com
State New
Headers show
Series
  • fio-ccw fixes for kernel stacktraces
Related show

Commit Message

Farhan Ali April 8, 2019, 9:05 p.m. UTC
When releasing the vfio-ccw mdev, we currently do not release
any existing channel program and it's pinned pages. This can
lead to the following warning:

[1038876.561565] WARNING: CPU: 2 PID: 144727 at drivers/vfio/vfio_iommu_type1.c:1494 vfio_sanity_check_pfn_list+0x40/0x70 [vfio_iommu_type1]

....

1038876.561921] Call Trace:
[1038876.561935] ([<00000009897fb870>] 0x9897fb870)
[1038876.561949]  [<000003ff8013bf62>] vfio_iommu_type1_detach_group+0xda/0x2f0 [vfio_iommu_type1]
[1038876.561965]  [<000003ff8007b634>] __vfio_group_unset_container+0x64/0x190 [vfio]
[1038876.561978]  [<000003ff8007b87e>] vfio_group_put_external_user+0x26/0x38 [vfio]
[1038876.562024]  [<000003ff806fc608>] kvm_vfio_group_put_external_user+0x40/0x60 [kvm]
[1038876.562045]  [<000003ff806fcb9e>] kvm_vfio_destroy+0x5e/0xd0 [kvm]
[1038876.562065]  [<000003ff806f63fc>] kvm_put_kvm+0x2a4/0x3d0 [kvm]
[1038876.562083]  [<000003ff806f655e>] kvm_vm_release+0x36/0x48 [kvm]
[1038876.562098]  [<00000000003c2dc4>] __fput+0x144/0x228
[1038876.562113]  [<000000000016ee82>] task_work_run+0x8a/0xd8
[1038876.562125]  [<000000000014c7a8>] do_exit+0x5d8/0xd90
[1038876.562140]  [<000000000014d084>] do_group_exit+0xc4/0xc8
[1038876.562155]  [<000000000015c046>] get_signal+0x9ae/0xa68
[1038876.562169]  [<0000000000108d66>] do_signal+0x66/0x768
[1038876.562185]  [<0000000000b9e37e>] system_call+0x1ea/0x2d8
[1038876.562195] 2 locks held by qemu-system-s39/144727:
[1038876.562205]  #0: 00000000537abaf9 (&container->group_lock){++++}, at: __vfio_group_unset_container+0x3c/0x190 [vfio]
[1038876.562230]  #1: 00000000670008b5 (&iommu->lock){+.+.}, at: vfio_iommu_type1_detach_group+0x36/0x2f0 [vfio_iommu_type1]
[1038876.562250] Last Breaking-Event-Address:
[1038876.562262]  [<000003ff8013aa24>] vfio_sanity_check_pfn_list+0x3c/0x70 [vfio_iommu_type1]
[1038876.562272] irq event stamp: 4236481
[1038876.562287] hardirqs last  enabled at (4236489): [<00000000001cee7a>] console_unlock+0x6d2/0x740
[1038876.562299] hardirqs last disabled at (4236496): [<00000000001ce87e>] console_unlock+0xd6/0x740
[1038876.562311] softirqs last  enabled at (4234162): [<0000000000b9fa1e>] __do_softirq+0x556/0x598
[1038876.562325] softirqs last disabled at (4234153): [<000000000014e4cc>] irq_exit+0xac/0x108
[1038876.562337] ---[ end trace 6c96d467b1c3ca06 ]---

Similarly we do not free the channel program when we are removing
the vfio-ccw device. Let's fix this by resetting the device and freeing
the channel program and pinned pages in the release path. For the remove
path we can just quiesce the device, since in the remove path the mediated
device is going away for good and so we don't need to do a full reset.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_ops.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Cornelia Huck April 11, 2019, 4:27 p.m. UTC | #1
On Mon,  8 Apr 2019 17:05:33 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> When releasing the vfio-ccw mdev, we currently do not release
> any existing channel program and it's pinned pages. This can

s/it's/its/

> lead to the following warning:
> 
> [1038876.561565] WARNING: CPU: 2 PID: 144727 at drivers/vfio/vfio_iommu_type1.c:1494 vfio_sanity_check_pfn_list+0x40/0x70 [vfio_iommu_type1]
> 
> ....
> 
> 1038876.561921] Call Trace:
> [1038876.561935] ([<00000009897fb870>] 0x9897fb870)
> [1038876.561949]  [<000003ff8013bf62>] vfio_iommu_type1_detach_group+0xda/0x2f0 [vfio_iommu_type1]
> [1038876.561965]  [<000003ff8007b634>] __vfio_group_unset_container+0x64/0x190 [vfio]
> [1038876.561978]  [<000003ff8007b87e>] vfio_group_put_external_user+0x26/0x38 [vfio]
> [1038876.562024]  [<000003ff806fc608>] kvm_vfio_group_put_external_user+0x40/0x60 [kvm]
> [1038876.562045]  [<000003ff806fcb9e>] kvm_vfio_destroy+0x5e/0xd0 [kvm]
> [1038876.562065]  [<000003ff806f63fc>] kvm_put_kvm+0x2a4/0x3d0 [kvm]
> [1038876.562083]  [<000003ff806f655e>] kvm_vm_release+0x36/0x48 [kvm]
> [1038876.562098]  [<00000000003c2dc4>] __fput+0x144/0x228
> [1038876.562113]  [<000000000016ee82>] task_work_run+0x8a/0xd8
> [1038876.562125]  [<000000000014c7a8>] do_exit+0x5d8/0xd90
> [1038876.562140]  [<000000000014d084>] do_group_exit+0xc4/0xc8
> [1038876.562155]  [<000000000015c046>] get_signal+0x9ae/0xa68
> [1038876.562169]  [<0000000000108d66>] do_signal+0x66/0x768
> [1038876.562185]  [<0000000000b9e37e>] system_call+0x1ea/0x2d8
> [1038876.562195] 2 locks held by qemu-system-s39/144727:
> [1038876.562205]  #0: 00000000537abaf9 (&container->group_lock){++++}, at: __vfio_group_unset_container+0x3c/0x190 [vfio]
> [1038876.562230]  #1: 00000000670008b5 (&iommu->lock){+.+.}, at: vfio_iommu_type1_detach_group+0x36/0x2f0 [vfio_iommu_type1]
> [1038876.562250] Last Breaking-Event-Address:
> [1038876.562262]  [<000003ff8013aa24>] vfio_sanity_check_pfn_list+0x3c/0x70 [vfio_iommu_type1]
> [1038876.562272] irq event stamp: 4236481
> [1038876.562287] hardirqs last  enabled at (4236489): [<00000000001cee7a>] console_unlock+0x6d2/0x740
> [1038876.562299] hardirqs last disabled at (4236496): [<00000000001ce87e>] console_unlock+0xd6/0x740
> [1038876.562311] softirqs last  enabled at (4234162): [<0000000000b9fa1e>] __do_softirq+0x556/0x598
> [1038876.562325] softirqs last disabled at (4234153): [<000000000014e4cc>] irq_exit+0xac/0x108
> [1038876.562337] ---[ end trace 6c96d467b1c3ca06 ]---
> 
> Similarly we do not free the channel program when we are removing
> the vfio-ccw device. Let's fix this by resetting the device and freeing
> the channel program and pinned pages in the release path. For the remove
> path we can just quiesce the device, since in the remove path the mediated
> device is going away for good and so we don't need to do a full reset.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_ops.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index ec2f796c..aacd528 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -133,11 +133,12 @@ static int vfio_ccw_mdev_remove(struct mdev_device *mdev)
>  
>  	if ((private->state != VFIO_CCW_STATE_NOT_OPER) &&
>  	    (private->state != VFIO_CCW_STATE_STANDBY)) {
> -		if (!vfio_ccw_mdev_reset(mdev))
> +		if (!vfio_ccw_sch_quiesce(private->sch))
>  			private->state = VFIO_CCW_STATE_STANDBY;
>  		/* The state will be NOT_OPER on error. */
>  	}
>  
> +	cp_free(&private->cp);
>  	private->mdev = NULL;
>  	atomic_inc(&private->avail);
>  
> @@ -171,6 +172,14 @@ static void vfio_ccw_mdev_release(struct mdev_device *mdev)
>  		dev_get_drvdata(mdev_parent_dev(mdev));
>  	int i;
>  
> +	if ((private->state != VFIO_CCW_STATE_NOT_OPER) &&
> +	    (private->state != VFIO_CCW_STATE_STANDBY)) {
> +		if (!vfio_ccw_mdev_reset(mdev))
> +			private->state = VFIO_CCW_STATE_STANDBY;
> +		/* The state will be NOT_OPER on error. */
> +	}
> +
> +	cp_free(&private->cp);
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>  				 &private->nb);
>  

Looks good to me, would love a review/ack.
Farhan Ali April 11, 2019, 8:39 p.m. UTC | #2
On 04/11/2019 12:27 PM, Cornelia Huck wrote:
> On Mon,  8 Apr 2019 17:05:33 -0400
> Farhan Ali<alifm@linux.ibm.com>  wrote:
> 
>> When releasing the vfio-ccw mdev, we currently do not release
>> any existing channel program and it's pinned pages. This can
> s/it's/its/
> 

Will fix.

Thanks for taking a look at the patches.

Thanks
Farhan
Cornelia Huck April 12, 2019, 8:12 a.m. UTC | #3
On Thu, 11 Apr 2019 16:39:32 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 04/11/2019 12:27 PM, Cornelia Huck wrote:
> > On Mon,  8 Apr 2019 17:05:33 -0400
> > Farhan Ali<alifm@linux.ibm.com>  wrote:
> >   
> >> When releasing the vfio-ccw mdev, we currently do not release
> >> any existing channel program and it's pinned pages. This can  
> > s/it's/its/
> >   
> 
> Will fix.
> 
> Thanks for taking a look at the patches.

NP, I really want to queue them soon :)

Just give me a quick hint about dependencies: Do I need anything else
before I can queue patch 1 and 3? I've lost track a bit...
Farhan Ali April 12, 2019, 2:13 p.m. UTC | #4
On 04/12/2019 04:12 AM, Cornelia Huck wrote:
> On Thu, 11 Apr 2019 16:39:32 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> On 04/11/2019 12:27 PM, Cornelia Huck wrote:
>>> On Mon,  8 Apr 2019 17:05:33 -0400
>>> Farhan Ali<alifm@linux.ibm.com>  wrote:
>>>    
>>>> When releasing the vfio-ccw mdev, we currently do not release
>>>> any existing channel program and it's pinned pages. This can
>>> s/it's/its/
>>>    
>>
>> Will fix.
>>
>> Thanks for taking a look at the patches.
> 
> NP, I really want to queue them soon :)
> 
> Just give me a quick hint about dependencies: Do I need anything else
> before I can queue patch 1 and 3? I've lost track a bit...
> 
> 

Patch 1 can be queued without any problems and it has no dependencies.

Patch 3 would depend on your [PATCH v4 1/6] vfio-ccw: make it safe to 
access channel programs. Since I want to call cp_free without any problems.

I know you are waiting for acks/r-bs for patch 3, so if no one has 
problems then we can go ahead and queue it.


Thanks
Farhan
Eric Farman April 12, 2019, 9:01 p.m. UTC | #5
On 4/8/19 5:05 PM, Farhan Ali wrote:
> When releasing the vfio-ccw mdev, we currently do not release
> any existing channel program and it's pinned pages. This can
> lead to the following warning:
> 
> [1038876.561565] WARNING: CPU: 2 PID: 144727 at drivers/vfio/vfio_iommu_type1.c:1494 vfio_sanity_check_pfn_list+0x40/0x70 [vfio_iommu_type1]
> 
> ....
> 
> 1038876.561921] Call Trace:
> [1038876.561935] ([<00000009897fb870>] 0x9897fb870)
> [1038876.561949]  [<000003ff8013bf62>] vfio_iommu_type1_detach_group+0xda/0x2f0 [vfio_iommu_type1]
> [1038876.561965]  [<000003ff8007b634>] __vfio_group_unset_container+0x64/0x190 [vfio]
> [1038876.561978]  [<000003ff8007b87e>] vfio_group_put_external_user+0x26/0x38 [vfio]
> [1038876.562024]  [<000003ff806fc608>] kvm_vfio_group_put_external_user+0x40/0x60 [kvm]
> [1038876.562045]  [<000003ff806fcb9e>] kvm_vfio_destroy+0x5e/0xd0 [kvm]
> [1038876.562065]  [<000003ff806f63fc>] kvm_put_kvm+0x2a4/0x3d0 [kvm]
> [1038876.562083]  [<000003ff806f655e>] kvm_vm_release+0x36/0x48 [kvm]
> [1038876.562098]  [<00000000003c2dc4>] __fput+0x144/0x228
> [1038876.562113]  [<000000000016ee82>] task_work_run+0x8a/0xd8
> [1038876.562125]  [<000000000014c7a8>] do_exit+0x5d8/0xd90
> [1038876.562140]  [<000000000014d084>] do_group_exit+0xc4/0xc8
> [1038876.562155]  [<000000000015c046>] get_signal+0x9ae/0xa68
> [1038876.562169]  [<0000000000108d66>] do_signal+0x66/0x768
> [1038876.562185]  [<0000000000b9e37e>] system_call+0x1ea/0x2d8
> [1038876.562195] 2 locks held by qemu-system-s39/144727:
> [1038876.562205]  #0: 00000000537abaf9 (&container->group_lock){++++}, at: __vfio_group_unset_container+0x3c/0x190 [vfio]
> [1038876.562230]  #1: 00000000670008b5 (&iommu->lock){+.+.}, at: vfio_iommu_type1_detach_group+0x36/0x2f0 [vfio_iommu_type1]
> [1038876.562250] Last Breaking-Event-Address:
> [1038876.562262]  [<000003ff8013aa24>] vfio_sanity_check_pfn_list+0x3c/0x70 [vfio_iommu_type1]
> [1038876.562272] irq event stamp: 4236481
> [1038876.562287] hardirqs last  enabled at (4236489): [<00000000001cee7a>] console_unlock+0x6d2/0x740
> [1038876.562299] hardirqs last disabled at (4236496): [<00000000001ce87e>] console_unlock+0xd6/0x740
> [1038876.562311] softirqs last  enabled at (4234162): [<0000000000b9fa1e>] __do_softirq+0x556/0x598
> [1038876.562325] softirqs last disabled at (4234153): [<000000000014e4cc>] irq_exit+0xac/0x108
> [1038876.562337] ---[ end trace 6c96d467b1c3ca06 ]---
> 
> Similarly we do not free the channel program when we are removing
> the vfio-ccw device. Let's fix this by resetting the device and freeing
> the channel program and pinned pages in the release path. For the remove
> path we can just quiesce the device, since in the remove path the mediated
> device is going away for good and so we don't need to do a full reset.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>   drivers/s390/cio/vfio_ccw_ops.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index ec2f796c..aacd528 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -133,11 +133,12 @@ static int vfio_ccw_mdev_remove(struct mdev_device *mdev)
>   
>   	if ((private->state != VFIO_CCW_STATE_NOT_OPER) &&
>   	    (private->state != VFIO_CCW_STATE_STANDBY)) {
> -		if (!vfio_ccw_mdev_reset(mdev))
> +		if (!vfio_ccw_sch_quiesce(private->sch))
>   			private->state = VFIO_CCW_STATE_STANDBY;
>   		/* The state will be NOT_OPER on error. */
>   	}
>   
> +	cp_free(&private->cp);
>   	private->mdev = NULL;
>   	atomic_inc(&private->avail);
>   
> @@ -171,6 +172,14 @@ static void vfio_ccw_mdev_release(struct mdev_device *mdev)
>   		dev_get_drvdata(mdev_parent_dev(mdev));
>   	int i;
>   
> +	if ((private->state != VFIO_CCW_STATE_NOT_OPER) &&
> +	    (private->state != VFIO_CCW_STATE_STANDBY)) {
> +		if (!vfio_ccw_mdev_reset(mdev))
> +			private->state = VFIO_CCW_STATE_STANDBY;
> +		/* The state will be NOT_OPER on error. */
> +	}
> +
> +	cp_free(&private->cp);
>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>   				 &private->nb);
>   
> 

This seems sane to me.

Acked-by: Eric Farman <farman@linux.ibm.com>
Eric Farman April 12, 2019, 9:03 p.m. UTC | #6
On 4/12/19 10:13 AM, Farhan Ali wrote:
> 
> 
> On 04/12/2019 04:12 AM, Cornelia Huck wrote:
>> On Thu, 11 Apr 2019 16:39:32 -0400
>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>
>>> On 04/11/2019 12:27 PM, Cornelia Huck wrote:
>>>> On Mon,  8 Apr 2019 17:05:33 -0400
>>>> Farhan Ali<alifm@linux.ibm.com>  wrote:
>>>>> When releasing the vfio-ccw mdev, we currently do not release
>>>>> any existing channel program and it's pinned pages. This can
>>>> s/it's/its/
>>>
>>> Will fix.
>>>
>>> Thanks for taking a look at the patches.
>>
>> NP, I really want to queue them soon :)
>>
>> Just give me a quick hint about dependencies: Do I need anything else
>> before I can queue patch 1 and 3? I've lost track a bit...
>>
>>
> 
> Patch 1 can be queued without any problems and it has no dependencies.
> 
> Patch 3 would depend on your [PATCH v4 1/6] vfio-ccw: make it safe to 
> access channel programs. Since I want to call cp_free without any problems.

I'm sorry; I'm working as a LIFO queue this afternoon.  At least I 
reviewed patch 1 of that series.  :)

> 
> I know you are waiting for acks/r-bs for patch 3, so if no one has 
> problems then we can go ahead and queue it.
> 
> 
> Thanks
> Farhan
>

Patch
diff mbox series

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index ec2f796c..aacd528 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -133,11 +133,12 @@  static int vfio_ccw_mdev_remove(struct mdev_device *mdev)
 
 	if ((private->state != VFIO_CCW_STATE_NOT_OPER) &&
 	    (private->state != VFIO_CCW_STATE_STANDBY)) {
-		if (!vfio_ccw_mdev_reset(mdev))
+		if (!vfio_ccw_sch_quiesce(private->sch))
 			private->state = VFIO_CCW_STATE_STANDBY;
 		/* The state will be NOT_OPER on error. */
 	}
 
+	cp_free(&private->cp);
 	private->mdev = NULL;
 	atomic_inc(&private->avail);
 
@@ -171,6 +172,14 @@  static void vfio_ccw_mdev_release(struct mdev_device *mdev)
 		dev_get_drvdata(mdev_parent_dev(mdev));
 	int i;
 
+	if ((private->state != VFIO_CCW_STATE_NOT_OPER) &&
+	    (private->state != VFIO_CCW_STATE_STANDBY)) {
+		if (!vfio_ccw_mdev_reset(mdev))
+			private->state = VFIO_CCW_STATE_STANDBY;
+		/* The state will be NOT_OPER on error. */
+	}
+
+	cp_free(&private->cp);
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
 				 &private->nb);