diff mbox series

[v4,1/6] vfio-ccw: make it safe to access channel programs

Message ID 20190301093902.27799-2-cohuck@redhat.com (mailing list archive)
State New, archived
Headers show
Series vfio-ccw: support hsch/csch (kernel part) | expand

Commit Message

Cornelia Huck March 1, 2019, 9:38 a.m. UTC
When we get a solicited interrupt, the start function may have
been cleared by a csch, but we still have a channel program
structure allocated. Make it safe to call the cp accessors in
any case, so we can call them unconditionally.

While at it, also make sure that functions called from other parts
of the code return gracefully if the channel program structure
has not been initialized (even though that is a bug in the caller).

Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/vfio_ccw_cp.c  | 21 ++++++++++++++++++++-
 drivers/s390/cio/vfio_ccw_cp.h  |  2 ++
 drivers/s390/cio/vfio_ccw_fsm.c |  5 +++++
 3 files changed, 27 insertions(+), 1 deletion(-)

Comments

Farhan Ali April 8, 2019, 5:02 p.m. UTC | #1
On 03/01/2019 04:38 AM, Cornelia Huck wrote:
> When we get a solicited interrupt, the start function may have
> been cleared by a csch, but we still have a channel program
> structure allocated. Make it safe to call the cp accessors in
> any case, so we can call them unconditionally.
> 
> While at it, also make sure that functions called from other parts
> of the code return gracefully if the channel program structure
> has not been initialized (even though that is a bug in the caller).
> 
> Reviewed-by: Eric Farman<farman@linux.ibm.com>
> Signed-off-by: Cornelia Huck<cohuck@redhat.com>
> ---

Hi Connie,

My series of fixes for vfio-ccw depends on this patch as I would like to 
call cp_free unconditionally :) (I had developed my code on top of your 
patches).

Could we pick this patch up as well when/if you pick up my patch series? 
I am in the process of sending out a v2.

Regarding this patch we could merge it as a stand alone patch, separate 
from this series. And also the patch LGTM

Reviewed-by: Farhan Ali <alifm@linux.ibm.com>

Thanks
Farhan
Cornelia Huck April 8, 2019, 5:07 p.m. UTC | #2
On Mon, 8 Apr 2019 13:02:12 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 03/01/2019 04:38 AM, Cornelia Huck wrote:
> > When we get a solicited interrupt, the start function may have
> > been cleared by a csch, but we still have a channel program
> > structure allocated. Make it safe to call the cp accessors in
> > any case, so we can call them unconditionally.
> > 
> > While at it, also make sure that functions called from other parts
> > of the code return gracefully if the channel program structure
> > has not been initialized (even though that is a bug in the caller).
> > 
> > Reviewed-by: Eric Farman<farman@linux.ibm.com>
> > Signed-off-by: Cornelia Huck<cohuck@redhat.com>
> > ---  
> 
> Hi Connie,
> 
> My series of fixes for vfio-ccw depends on this patch as I would like to 
> call cp_free unconditionally :) (I had developed my code on top of your 
> patches).
> 
> Could we pick this patch up as well when/if you pick up my patch series? 
> I am in the process of sending out a v2.
> 
> Regarding this patch we could merge it as a stand alone patch, separate 
> from this series. And also the patch LGTM
> 
> Reviewed-by: Farhan Ali <alifm@linux.ibm.com>

Actually, I wanted to ask how people felt about merging this whole
series for the next release :) It would be one thing less on my plate...
Farhan Ali April 8, 2019, 5:19 p.m. UTC | #3
On 04/08/2019 01:07 PM, Cornelia Huck wrote:
> On Mon, 8 Apr 2019 13:02:12 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> On 03/01/2019 04:38 AM, Cornelia Huck wrote:
>>> When we get a solicited interrupt, the start function may have
>>> been cleared by a csch, but we still have a channel program
>>> structure allocated. Make it safe to call the cp accessors in
>>> any case, so we can call them unconditionally.
>>>
>>> While at it, also make sure that functions called from other parts
>>> of the code return gracefully if the channel program structure
>>> has not been initialized (even though that is a bug in the caller).
>>>
>>> Reviewed-by: Eric Farman<farman@linux.ibm.com>
>>> Signed-off-by: Cornelia Huck<cohuck@redhat.com>
>>> ---
>>
>> Hi Connie,
>>
>> My series of fixes for vfio-ccw depends on this patch as I would like to
>> call cp_free unconditionally :) (I had developed my code on top of your
>> patches).
>>
>> Could we pick this patch up as well when/if you pick up my patch series?
>> I am in the process of sending out a v2.
>>
>> Regarding this patch we could merge it as a stand alone patch, separate
>> from this series. And also the patch LGTM
>>
>> Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
> 
> Actually, I wanted to ask how people felt about merging this whole
> series for the next release :) It would be one thing less on my plate...
> 
> 

I have been testing with your patches for a while now and I haven't hit 
any problems due to the patches.

IMHO I think we can merge the patches for the next release.

Thanks
Farhan
Eric Farman April 8, 2019, 8:25 p.m. UTC | #4
On 4/8/19 1:07 PM, Cornelia Huck wrote:
> On Mon, 8 Apr 2019 13:02:12 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> On 03/01/2019 04:38 AM, Cornelia Huck wrote:
>>> When we get a solicited interrupt, the start function may have
>>> been cleared by a csch, but we still have a channel program
>>> structure allocated. Make it safe to call the cp accessors in
>>> any case, so we can call them unconditionally.
>>>
>>> While at it, also make sure that functions called from other parts
>>> of the code return gracefully if the channel program structure
>>> has not been initialized (even though that is a bug in the caller).
>>>
>>> Reviewed-by: Eric Farman<farman@linux.ibm.com>
>>> Signed-off-by: Cornelia Huck<cohuck@redhat.com>
>>> ---
>>
>> Hi Connie,
>>
>> My series of fixes for vfio-ccw depends on this patch as I would like to
>> call cp_free unconditionally :) (I had developed my code on top of your
>> patches).
>>
>> Could we pick this patch up as well when/if you pick up my patch series?
>> I am in the process of sending out a v2.
>>
>> Regarding this patch we could merge it as a stand alone patch, separate
>> from this series. And also the patch LGTM
>>
>> Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
> 
> Actually, I wanted to ask how people felt about merging this whole
> series for the next release :) It would be one thing less on my plate...
> 

I'm not opposed to it.  Every time I try to review patches 4 and 6 I get 
an asynchronous interrupt of my own.  :)  I'll at least get you an ack 
in the next day or two.
Halil Pasic April 9, 2019, 11:34 p.m. UTC | #5
On Mon, 8 Apr 2019 19:07:47 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 8 Apr 2019 13:02:12 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
> > On 03/01/2019 04:38 AM, Cornelia Huck wrote:
> > > When we get a solicited interrupt, the start function may have
> > > been cleared by a csch, but we still have a channel program
> > > structure allocated. Make it safe to call the cp accessors in
> > > any case, so we can call them unconditionally.
> > > 
> > > While at it, also make sure that functions called from other parts
> > > of the code return gracefully if the channel program structure
> > > has not been initialized (even though that is a bug in the caller).
> > > 
> > > Reviewed-by: Eric Farman<farman@linux.ibm.com>
> > > Signed-off-by: Cornelia Huck<cohuck@redhat.com>
> > > ---  
> > 
> > Hi Connie,
> > 
> > My series of fixes for vfio-ccw depends on this patch as I would like to 
> > call cp_free unconditionally :) (I had developed my code on top of your 
> > patches).
> > 
> > Could we pick this patch up as well when/if you pick up my patch series? 
> > I am in the process of sending out a v2.
> > 
> > Regarding this patch we could merge it as a stand alone patch, separate 
> > from this series. And also the patch LGTM
> > 
> > Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
> 
> Actually, I wanted to ask how people felt about merging this whole
> series for the next release :) It would be one thing less on my plate...
> 

Sorry I was not able to spend any significant amount of time on this
lately.

Gave the combined set (this + Farhans fio-ccw fixes for kernel
stacktraces v2) it a bit of smoke testing after some minor adjustments
to make it compile:

--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -13,6 +13,7 @@
 #include <linux/vfio.h>
 #include <linux/mdev.h>
 #include <linux/nospec.h>
+#include <linux/slab.h>
 
 #include "vfio_ccw_private.h"


I'm just running fio on a pass-through DASD and on some virto-blk disks
in parallel. My QEMU is today's vfio-ccw-caps from your repo.

I see stuff like this:
qemu-git: vfio-ccw: wirte I/O region failed with errno=16[1811/7332/0 iops] [eta 26m:34s]
[Thread 0x3ff75890910 (LWP 43803) exited]/7932KB/0KB /s] [1930/7932/0 iops] [eta 26m:33s]
[Thread 0x3ff6b7b7910 (LWP 43800) exited]/8030KB/0KB /s] [2031/8030/0 iops] [eta 26m:32s]
dasd-eckd 0.0.1234: An error occurred in the DASD device driver, reason=14 00000000caa27abe
INFO: task kworker/u6:1:26 blocked for more than 122 seconds.ps] [eta 23m:26s]eta 23m:25s]
      Not tainted 5.1.0-rc3-00217-g6ab18dc #598
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u6:1    D    0    26      2 0x00000000
Workqueue: writeback wb_workfn (flush-94:0)
Call Trace:
([<0000000000ae23f2>] __schedule+0x4fa/0xc98)
 [<0000000000ae2bda>] schedule+0x4a/0xb0 
 [<00000000001b30ec>] io_schedule+0x2c/0x50 
 [<000000000071cc9c>] blk_mq_get_tag+0x1bc/0x310 
 [<000000000071571c>] blk_mq_get_request+0x1a4/0x4a8 
 [<0000000000719d38>] blk_mq_make_request+0x100/0x728 
 [<000000000070aa0a>] generic_make_request+0x26a/0x478 
 [<000000000070ac76>] submit_bio+0x5e/0x178 
 [<00000000004cfa2c>] ext4_io_submit+0x74/0x88 
 [<00000000004cfd32>] ext4_bio_write_page+0x2d2/0x4c8 
 [<00000000004aa5b4>] mpage_submit_page+0x74/0xa8 
 [<00000000004aa676>] mpage_process_page_bufs+0x8e/0x1b8 
 [<00000000004aa9bc>] mpage_prepare_extent_to_map+0x21c/0x390 
 [<00000000004b063c>] ext4_writepages+0x4bc/0x11a0 
 [<000000000032ef7a>] do_writepages+0x3a/0xf0 
 [<0000000000416226>] __writeback_single_inode+0x86/0x7a0 
 [<0000000000417154>] writeback_sb_inodes+0x2cc/0x550 
 [<0000000000417476>] __writeback_inodes_wb+0x9e/0xe8 
 [<00000000004179e0>] wb_writeback+0x468/0x598 
 [<0000000000418780>] wb_workfn+0x3b8/0x710 
 [<0000000000199322>] process_one_work+0x25a/0x668 
 [<000000000019977a>] worker_thread+0x4a/0x428 
 [<00000000001a1ae8>] kthread+0x150/0x170 
 [<0000000000aeadda>] kernel_thread_starter+0x6/0xc 
 [<0000000000aeadd4>] kernel_thread_starter+0x0/0xc 
4 locks held by kworker/u6:1/26:
 #0: 00000000792cf224 ((wq_completion)writeback){+.+.}, at: process_one_work+0x19c/0x668
 #1: 000000009888c0e5 ((work_completion)(&(&wb->dwork)->work)){+.+.}, at: process_one_work+0x19c/0x668
 #2: 000000002bfb76f0 (&type->s_umount_key#29){++++}, at: trylock_super+0x2e/0xa8
 #3: 00000000ff47fe1d (&sbi->s_journal_flag_rwsem){.+.+}, at: do_writepages+0x3a/0xf0


Since I haven't had the time to keep up lately, I will just trust Eric
and Farhan on whether this should be merged or not. From a quick look at
the code, and a quick stroll through my remaining memories, I think, there
are a couple of things, that I myself would try to solve differently. But
that is not a valid reason to hold this up.

I would like to spare the hustle of revisiting my old comments for everyone.
From the stability and utility perspective I'm pretty convinced we are
better off than without the patches in question.

TLDR:
If it is good enough for Eric and Farhan, I have no objections against merging.

Regards,
Halil
Eric Farman April 11, 2019, 2:59 a.m. UTC | #6
On 4/9/19 7:34 PM, Halil Pasic wrote:
> On Mon, 8 Apr 2019 19:07:47 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Mon, 8 Apr 2019 13:02:12 -0400
>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>
>>> On 03/01/2019 04:38 AM, Cornelia Huck wrote:
>>>> When we get a solicited interrupt, the start function may have
>>>> been cleared by a csch, but we still have a channel program
>>>> structure allocated. Make it safe to call the cp accessors in
>>>> any case, so we can call them unconditionally.
>>>>
>>>> While at it, also make sure that functions called from other parts
>>>> of the code return gracefully if the channel program structure
>>>> has not been initialized (even though that is a bug in the caller).
>>>>
>>>> Reviewed-by: Eric Farman<farman@linux.ibm.com>
>>>> Signed-off-by: Cornelia Huck<cohuck@redhat.com>
>>>> ---
>>>
>>> Hi Connie,
>>>
>>> My series of fixes for vfio-ccw depends on this patch as I would like to
>>> call cp_free unconditionally :) (I had developed my code on top of your
>>> patches).
>>>
>>> Could we pick this patch up as well when/if you pick up my patch series?
>>> I am in the process of sending out a v2.
>>>
>>> Regarding this patch we could merge it as a stand alone patch, separate
>>> from this series. And also the patch LGTM
>>>
>>> Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
>>
>> Actually, I wanted to ask how people felt about merging this whole
>> series for the next release :) It would be one thing less on my plate...
>>
> 
> Sorry I was not able to spend any significant amount of time on this
> lately.
> 
> Gave the combined set (this + Farhans fio-ccw fixes for kernel
> stacktraces v2) it a bit of smoke testing after some minor adjustments
> to make it compile:
> 
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -13,6 +13,7 @@
>   #include <linux/vfio.h>
>   #include <linux/mdev.h>
>   #include <linux/nospec.h>
> +#include <linux/slab.h>
>   
>   #include "vfio_ccw_private.h"
> 
> 

Hrm...  Taking today's master, and the two series you mention (slight 
adjustment to apply patch 3 of Connie's series, because part of it was 
split out a few weeks ago), and I don't encounter this.  Tried switching 
between SLUB/SLAB, but still compiles fine.

> I'm just running fio on a pass-through DASD and on some virto-blk disks
> in parallel. My QEMU is today's vfio-ccw-caps from your repo.
> 
> I see stuff like this:
> qemu-git: vfio-ccw: wirte I/O region failed with errno=16[1811/7332/0 iops] [eta 26m:34s]

Without knowing what the I/O was that failed, this is a guessing game. 
But I encountered something similar just now running fio.

qemu:
2019-04-11T02:06:09.524838Z qemu-system-s390x: vfio-ccw: wirte I/O 
region failed with errno=16

guest:
[  422.931458] dasd-eckd 0.0.ca8d: An error occurred in the DASD device 
driver, reason=14 00000000730bbe9a
[  553.741554] dasd-eckd 0.0.ca8e: An error occurred in the DASD device 
driver, reason=14 00000000e59b81da
[  554.761552] dasd-eckd 0.0.ca8d: An error occurred in the DASD device 
driver, reason=14 00000000cdf4fb4e
[  554.921518] dasd-eckd 0.0.ca8b: An error occurred in the DASD device 
driver, reason=14 0000000068775082
[  555.271556] dasd-eckd 0.0.ca8d: ERP 00000000cdf4fb4e has run out of 
retries and failed
[  555.271786] dasd(eckd): I/O status report for device 0.0.ca8d:
                dasd(eckd): in req: 00000000cdf4fb4e CC:00 FC:00 AC:00 
SC:00 DS:00 CS:00 RC:-16
                dasd(eckd): device 0.0.ca8d: Failing CCW:           (null)
                dasd(eckd): SORRY - NO VALID SENSE AVAILABLE
[  555.272214] dasd(eckd): Related CP in req: 00000000cdf4fb4e
                dasd(eckd): CCW 000000006434c30f: 03400000 00000000 DAT:
                dasd(eckd): CCW 000000007a65f7e0: 08000000 70E5B700 DAT:
[  555.272508] dasd(eckd):......


 From the associated I/O, I think this is fixed by a series I am nearly 
ready to send for review.  I'll try again with those fixes on top of the 
two series here, and report back.

> [Thread 0x3ff75890910 (LWP 43803) exited]/7932KB/0KB /s] [1930/7932/0 iops] [eta 26m:33s]
> [Thread 0x3ff6b7b7910 (LWP 43800) exited]/8030KB/0KB /s] [2031/8030/0 iops] [eta 26m:32s]
> dasd-eckd 0.0.1234: An error occurred in the DASD device driver, reason=14 00000000caa27abe
> INFO: task kworker/u6:1:26 blocked for more than 122 seconds.ps] [eta 23m:26s]eta 23m:25s]
>        Not tainted 5.1.0-rc3-00217-g6ab18dc #598
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u6:1    D    0    26      2 0x00000000
> Workqueue: writeback wb_workfn (flush-94:0)
> Call Trace:
> ([<0000000000ae23f2>] __schedule+0x4fa/0xc98)
>   [<0000000000ae2bda>] schedule+0x4a/0xb0
>   [<00000000001b30ec>] io_schedule+0x2c/0x50
>   [<000000000071cc9c>] blk_mq_get_tag+0x1bc/0x310
>   [<000000000071571c>] blk_mq_get_request+0x1a4/0x4a8
>   [<0000000000719d38>] blk_mq_make_request+0x100/0x728
>   [<000000000070aa0a>] generic_make_request+0x26a/0x478
>   [<000000000070ac76>] submit_bio+0x5e/0x178
>   [<00000000004cfa2c>] ext4_io_submit+0x74/0x88
>   [<00000000004cfd32>] ext4_bio_write_page+0x2d2/0x4c8
>   [<00000000004aa5b4>] mpage_submit_page+0x74/0xa8
>   [<00000000004aa676>] mpage_process_page_bufs+0x8e/0x1b8
>   [<00000000004aa9bc>] mpage_prepare_extent_to_map+0x21c/0x390
>   [<00000000004b063c>] ext4_writepages+0x4bc/0x11a0
>   [<000000000032ef7a>] do_writepages+0x3a/0xf0
>   [<0000000000416226>] __writeback_single_inode+0x86/0x7a0
>   [<0000000000417154>] writeback_sb_inodes+0x2cc/0x550
>   [<0000000000417476>] __writeback_inodes_wb+0x9e/0xe8
>   [<00000000004179e0>] wb_writeback+0x468/0x598
>   [<0000000000418780>] wb_workfn+0x3b8/0x710
>   [<0000000000199322>] process_one_work+0x25a/0x668
>   [<000000000019977a>] worker_thread+0x4a/0x428
>   [<00000000001a1ae8>] kthread+0x150/0x170
>   [<0000000000aeadda>] kernel_thread_starter+0x6/0xc
>   [<0000000000aeadd4>] kernel_thread_starter+0x0/0xc
> 4 locks held by kworker/u6:1/26:
>   #0: 00000000792cf224 ((wq_completion)writeback){+.+.}, at: process_one_work+0x19c/0x668
>   #1: 000000009888c0e5 ((work_completion)(&(&wb->dwork)->work)){+.+.}, at: process_one_work+0x19c/0x668
>   #2: 000000002bfb76f0 (&type->s_umount_key#29){++++}, at: trylock_super+0x2e/0xa8
>   #3: 00000000ff47fe1d (&sbi->s_journal_flag_rwsem){.+.+}, at: do_writepages+0x3a/0xf0
> 
> 
> Since I haven't had the time to keep up lately, I will just trust Eric
> and Farhan on whether this should be merged or not. From a quick look at
> the code, and a quick stroll through my remaining memories, I think, there
> are a couple of things, that I myself would try to solve differently. But
> that is not a valid reason to hold this up.
> 
> I would like to spare the hustle of revisiting my old comments for everyone.
>  From the stability and utility perspective I'm pretty convinced we are
> better off than without the patches in question.

I agree, both series are an improvement.  I'll focus on both tomorrow.

  - Eric

> 
> TLDR:
> If it is good enough for Eric and Farhan, I have no objections against merging.
> 
> Regards,
> Halil
>
Halil Pasic April 11, 2019, 3:58 p.m. UTC | #7
On Wed, 10 Apr 2019 22:59:41 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> r the next release :) It would be one thing less on my plate...
> >>  
> > 
> > Sorry I was not able to spend any significant amount of time on this
> > lately.
> > 
> > Gave the combined set (this + Farhans fio-ccw fixes for kernel
> > stacktraces v2) it a bit of smoke testing after some minor adjustments
> > to make it compile:
> > 
> > --- a/drivers/s390/cio/vfio_ccw_ops.c
> > +++ b/drivers/s390/cio/vfio_ccw_ops.c
> > @@ -13,6 +13,7 @@
> >   #include <linux/vfio.h>
> >   #include <linux/mdev.h>
> >   #include <linux/nospec.h>
> > +#include <linux/slab.h>
> >   
> >   #include "vfio_ccw_private.h"
> > 
> >   
> 
> Hrm...  Taking today's master, and the two series you mention (slight 
> adjustment to apply patch 3 of Connie's series, because part of it was 
> split out a few weeks ago), and I don't encounter this.  Tried switching 
> between SLUB/SLAB, but still compiles fine.

Let me guess: you have CONFIG_PCI in our .config, right?

In arch/s390/include/asm/pci_io.h we have

#ifndef _ASM_S390_PCI_IO_H
#define _ASM_S390_PCI_IO_H

#ifdef CONFIG_PCI

#include <linux/kernel.h>
#include <linux/slab.h>

and pci_io.h comes in via 

include/linux/vfio.h
include/linux/iommu.h
include/linux/scatterlist.h
arch/s390/include/asm/io.h
arch/s390/include/asm/pci_io.h


Figured it out with help from Farhan. Took me quite some time.

Regards,
Halil
Eric Farman April 11, 2019, 4:25 p.m. UTC | #8
On 4/11/19 11:58 AM, Halil Pasic wrote:
> On Wed, 10 Apr 2019 22:59:41 -0400
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> r the next release :) It would be one thing less on my plate...
>>>>   
>>>
>>> Sorry I was not able to spend any significant amount of time on this
>>> lately.
>>>
>>> Gave the combined set (this + Farhans fio-ccw fixes for kernel
>>> stacktraces v2) it a bit of smoke testing after some minor adjustments
>>> to make it compile:
>>>
>>> --- a/drivers/s390/cio/vfio_ccw_ops.c
>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
>>> @@ -13,6 +13,7 @@
>>>    #include <linux/vfio.h>
>>>    #include <linux/mdev.h>
>>>    #include <linux/nospec.h>
>>> +#include <linux/slab.h>
>>>    
>>>    #include "vfio_ccw_private.h"
>>>
>>>    
>>
>> Hrm...  Taking today's master, and the two series you mention (slight
>> adjustment to apply patch 3 of Connie's series, because part of it was
>> split out a few weeks ago), and I don't encounter this.  Tried switching
>> between SLUB/SLAB, but still compiles fine.
> 
> Let me guess: you have CONFIG_PCI in our .config, right?
> 
> In arch/s390/include/asm/pci_io.h we have
> 
> #ifndef _ASM_S390_PCI_IO_H
> #define _ASM_S390_PCI_IO_H
> 
> #ifdef CONFIG_PCI
> 
> #include <linux/kernel.h>
> #include <linux/slab.h>
> 
> and pci_io.h comes in via
> 
> include/linux/vfio.h
> include/linux/iommu.h
> include/linux/scatterlist.h
> arch/s390/include/asm/io.h
> arch/s390/include/asm/pci_io.h
> 
> 
> Figured it out with help from Farhan. Took me quite some time.

That would have been useful information up front.

> 
> Regards,
> Halil
> 
>
Cornelia Huck April 11, 2019, 4:36 p.m. UTC | #9
On Thu, 11 Apr 2019 12:25:38 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> On 4/11/19 11:58 AM, Halil Pasic wrote:
> > On Wed, 10 Apr 2019 22:59:41 -0400
> > Eric Farman <farman@linux.ibm.com> wrote:
> >   
> >> r the next release :) It would be one thing less on my plate...  
> >>>>     
> >>>
> >>> Sorry I was not able to spend any significant amount of time on this
> >>> lately.
> >>>
> >>> Gave the combined set (this + Farhans fio-ccw fixes for kernel
> >>> stacktraces v2) it a bit of smoke testing after some minor adjustments
> >>> to make it compile:
> >>>
> >>> --- a/drivers/s390/cio/vfio_ccw_ops.c
> >>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> >>> @@ -13,6 +13,7 @@
> >>>    #include <linux/vfio.h>
> >>>    #include <linux/mdev.h>
> >>>    #include <linux/nospec.h>
> >>> +#include <linux/slab.h>
> >>>    
> >>>    #include "vfio_ccw_private.h"
> >>>
> >>>      
> >>
> >> Hrm...  Taking today's master, and the two series you mention (slight
> >> adjustment to apply patch 3 of Connie's series, because part of it was
> >> split out a few weeks ago), and I don't encounter this.  Tried switching
> >> between SLUB/SLAB, but still compiles fine.  
> > 
> > Let me guess: you have CONFIG_PCI in our .config, right?
> > 
> > In arch/s390/include/asm/pci_io.h we have
> > 
> > #ifndef _ASM_S390_PCI_IO_H
> > #define _ASM_S390_PCI_IO_H
> > 
> > #ifdef CONFIG_PCI
> > 
> > #include <linux/kernel.h>
> > #include <linux/slab.h>
> > 
> > and pci_io.h comes in via
> > 
> > include/linux/vfio.h
> > include/linux/iommu.h
> > include/linux/scatterlist.h
> > arch/s390/include/asm/io.h
> > arch/s390/include/asm/pci_io.h
> > 
> > 
> > Figured it out with help from Farhan. Took me quite some time.  
> 
> That would have been useful information up front.

Indeed. It's trivial to fold that change in, though :) (Should be in
patch 4, if I see it correctly.)
Halil Pasic April 11, 2019, 6:07 p.m. UTC | #10
On Thu, 11 Apr 2019 18:36:48 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, 11 Apr 2019 12:25:38 -0400
> Eric Farman <farman@linux.ibm.com> wrote:
> 
> > On 4/11/19 11:58 AM, Halil Pasic wrote:
> > > On Wed, 10 Apr 2019 22:59:41 -0400
> > > Eric Farman <farman@linux.ibm.com> wrote:
> > >   
> > >> r the next release :) It would be one thing less on my plate...  
> > >>>>     
> > >>>
> > >>> Sorry I was not able to spend any significant amount of time on this
> > >>> lately.
> > >>>
> > >>> Gave the combined set (this + Farhans fio-ccw fixes for kernel
> > >>> stacktraces v2) it a bit of smoke testing after some minor adjustments
> > >>> to make it compile:
> > >>>
> > >>> --- a/drivers/s390/cio/vfio_ccw_ops.c
> > >>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> > >>> @@ -13,6 +13,7 @@
> > >>>    #include <linux/vfio.h>
> > >>>    #include <linux/mdev.h>
> > >>>    #include <linux/nospec.h>
> > >>> +#include <linux/slab.h>
> > >>>    
> > >>>    #include "vfio_ccw_private.h"
> > >>>
> > >>>      
> > >>
> > >> Hrm...  Taking today's master, and the two series you mention (slight
> > >> adjustment to apply patch 3 of Connie's series, because part of it was
> > >> split out a few weeks ago), and I don't encounter this.  Tried switching
> > >> between SLUB/SLAB, but still compiles fine.  
> > > 
> > > Let me guess: you have CONFIG_PCI in our .config, right?
> > > 
> > > In arch/s390/include/asm/pci_io.h we have
> > > 
> > > #ifndef _ASM_S390_PCI_IO_H
> > > #define _ASM_S390_PCI_IO_H
> > > 
> > > #ifdef CONFIG_PCI
> > > 
> > > #include <linux/kernel.h>
> > > #include <linux/slab.h>
> > > 
> > > and pci_io.h comes in via
> > > 
> > > include/linux/vfio.h
> > > include/linux/iommu.h
> > > include/linux/scatterlist.h
> > > arch/s390/include/asm/io.h
> > > arch/s390/include/asm/pci_io.h
> > > 
> > > 
> > > Figured it out with help from Farhan. Took me quite some time.  
> > 
> > That would have been useful information up front.
> 
> Indeed. It's trivial to fold that change in, though :) (Should be in
> patch 4, if I see it correctly.)
> 

#4 vfio-ccw: add capabilities chain it is!

Cheers,
Halil
Eric Farman April 11, 2019, 9:27 p.m. UTC | #11
On 4/10/19 10:59 PM, Eric Farman wrote:
> 
> 
> On 4/9/19 7:34 PM, Halil Pasic wrote:
>> On Mon, 8 Apr 2019 19:07:47 +0200

...snip...

>> I'm just running fio on a pass-through DASD and on some virto-blk disks
>> in parallel. My QEMU is today's vfio-ccw-caps from your repo.
>>
>> I see stuff like this:
>> qemu-git: vfio-ccw: wirte I/O region failed with errno=16[1811/7332/0 
>> iops] [eta 26m:34s]
> 
> Without knowing what the I/O was that failed, this is a guessing game. 
> But I encountered something similar just now running fio.
> 
> qemu:
> 2019-04-11T02:06:09.524838Z qemu-system-s390x: vfio-ccw: wirte I/O 
> region failed with errno=16

...snip...

>  From the associated I/O, I think this is fixed by a series I am nearly 
> ready to send for review.  I'll try again with those fixes on top of the 
> two series here, and report back.

So, I've run enough combinations to feel comfortable saying that the 
error (EBUSY) I saw last night (and presumably the one Halil saw) exists 
in today's code and is not introduced by this series.  It also appears 
to be addressed by one of the patches in a series I'm working on, but 
which that series still has some further problems.  Sigh, there are too 
many branches and too many interrupts.

  - Eric
Cornelia Huck April 12, 2019, 8:14 a.m. UTC | #12
On Thu, 11 Apr 2019 17:27:42 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> On 4/10/19 10:59 PM, Eric Farman wrote:
> > 
> > 
> > On 4/9/19 7:34 PM, Halil Pasic wrote:  
> >> On Mon, 8 Apr 2019 19:07:47 +0200  
> 
> ...snip...
> 
> >> I'm just running fio on a pass-through DASD and on some virto-blk disks
> >> in parallel. My QEMU is today's vfio-ccw-caps from your repo.
> >>
> >> I see stuff like this:
> >> qemu-git: vfio-ccw: wirte I/O region failed with errno=16[1811/7332/0 
> >> iops] [eta 26m:34s]  
> > 
> > Without knowing what the I/O was that failed, this is a guessing game. 
> > But I encountered something similar just now running fio.
> > 
> > qemu:
> > 2019-04-11T02:06:09.524838Z qemu-system-s390x: vfio-ccw: wirte I/O 
> > region failed with errno=16  
> 
> ...snip...
> 
> >  From the associated I/O, I think this is fixed by a series I am nearly 
> > ready to send for review.  I'll try again with those fixes on top of the 
> > two series here, and report back.  
> 
> So, I've run enough combinations to feel comfortable saying that the 
> error (EBUSY) I saw last night (and presumably the one Halil saw) exists 
> in today's code and is not introduced by this series.  It also appears 
> to be addressed by one of the patches in a series I'm working on, but 
> which that series still has some further problems.  Sigh, there are too 
> many branches and too many interrupts.

Great, thanks for checking! I know that feeling of being tangled in too
many branches...
diff mbox series

Patch

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 384b3987eeb4..0e79799e9a71 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -362,6 +362,7 @@  static void cp_unpin_free(struct channel_program *cp)
 	struct ccwchain *chain, *temp;
 	int i;
 
+	cp->initialized = false;
 	list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
 		for (i = 0; i < chain->ch_len; i++) {
 			pfn_array_table_unpin_free(chain->ch_pat + i,
@@ -732,6 +733,9 @@  int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
 	 */
 	cp->orb.cmd.c64 = 1;
 
+	if (!ret)
+		cp->initialized = true;
+
 	return ret;
 }
 
@@ -746,7 +750,8 @@  int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
  */
 void cp_free(struct channel_program *cp)
 {
-	cp_unpin_free(cp);
+	if (cp->initialized)
+		cp_unpin_free(cp);
 }
 
 /**
@@ -791,6 +796,10 @@  int cp_prefetch(struct channel_program *cp)
 	struct ccwchain *chain;
 	int len, idx, ret;
 
+	/* this is an error in the caller */
+	if (!cp->initialized)
+		return -EINVAL;
+
 	list_for_each_entry(chain, &cp->ccwchain_list, next) {
 		len = chain->ch_len;
 		for (idx = 0; idx < len; idx++) {
@@ -826,6 +835,10 @@  union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm)
 	struct ccwchain *chain;
 	struct ccw1 *cpa;
 
+	/* this is an error in the caller */
+	if (!cp->initialized)
+		return NULL;
+
 	orb = &cp->orb;
 
 	orb->cmd.intparm = intparm;
@@ -862,6 +875,9 @@  void cp_update_scsw(struct channel_program *cp, union scsw *scsw)
 	u32 cpa = scsw->cmd.cpa;
 	u32 ccw_head;
 
+	if (!cp->initialized)
+		return;
+
 	/*
 	 * LATER:
 	 * For now, only update the cmd.cpa part. We may need to deal with
@@ -898,6 +914,9 @@  bool cp_iova_pinned(struct channel_program *cp, u64 iova)
 	struct ccwchain *chain;
 	int i;
 
+	if (!cp->initialized)
+		return false;
+
 	list_for_each_entry(chain, &cp->ccwchain_list, next) {
 		for (i = 0; i < chain->ch_len; i++)
 			if (pfn_array_table_iova_pinned(chain->ch_pat + i,
diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h
index a4b74fb1aa57..3c20cd208da5 100644
--- a/drivers/s390/cio/vfio_ccw_cp.h
+++ b/drivers/s390/cio/vfio_ccw_cp.h
@@ -21,6 +21,7 @@ 
  * @ccwchain_list: list head of ccwchains
  * @orb: orb for the currently processed ssch request
  * @mdev: the mediated device to perform page pinning/unpinning
+ * @initialized: whether this instance is actually initialized
  *
  * @ccwchain_list is the head of a ccwchain list, that contents the
  * translated result of the guest channel program that pointed out by
@@ -30,6 +31,7 @@  struct channel_program {
 	struct list_head ccwchain_list;
 	union orb orb;
 	struct device *mdev;
+	bool initialized;
 };
 
 extern int cp_init(struct channel_program *cp, struct device *mdev,
diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index cab17865aafe..e7c9877c9f1e 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -31,6 +31,10 @@  static int fsm_io_helper(struct vfio_ccw_private *private)
 	private->state = VFIO_CCW_STATE_BUSY;
 
 	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
+	if (!orb) {
+		ret = -EIO;
+		goto out;
+	}
 
 	/* Issue "Start Subchannel" */
 	ccode = ssch(sch->schid, orb);
@@ -64,6 +68,7 @@  static int fsm_io_helper(struct vfio_ccw_private *private)
 	default:
 		ret = ccode;
 	}
+out:
 	spin_unlock_irqrestore(sch->lock, flags);
 	return ret;
 }