diff mbox series

[v3,2/3] vfio/pci: Support 8-byte PCI loads and stores

Message ID 20240425165604.899447-3-gbayer@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series vfio/pci: Support 8-byte PCI loads and stores | expand

Commit Message

Gerd Bayer April 25, 2024, 4:56 p.m. UTC
From: Ben Segal <bpsegal@us.ibm.com>

Many PCI adapters can benefit or even require full 64bit read
and write access to their registers. In order to enable work on
user-space drivers for these devices add two new variations
vfio_pci_core_io{read|write}64 of the existing access methods
when the architecture supports 64-bit ioreads and iowrites.

Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
---
 drivers/vfio/pci/vfio_pci_rdwr.c | 16 ++++++++++++++++
 include/linux/vfio_pci_core.h    |  3 +++
 2 files changed, 19 insertions(+)

Comments

Alex Williamson April 29, 2024, 4:31 p.m. UTC | #1
On Thu, 25 Apr 2024 18:56:03 +0200
Gerd Bayer <gbayer@linux.ibm.com> wrote:

> From: Ben Segal <bpsegal@us.ibm.com>
> 
> Many PCI adapters can benefit or even require full 64bit read
> and write access to their registers. In order to enable work on
> user-space drivers for these devices add two new variations
> vfio_pci_core_io{read|write}64 of the existing access methods
> when the architecture supports 64-bit ioreads and iowrites.
> 
> Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
> Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> ---
>  drivers/vfio/pci/vfio_pci_rdwr.c | 16 ++++++++++++++++
>  include/linux/vfio_pci_core.h    |  3 +++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 3335f1b868b1..8ed06edaee23 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -89,6 +89,9 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_ioread##size);
>  VFIO_IOREAD(8)
>  VFIO_IOREAD(16)
>  VFIO_IOREAD(32)
> +#ifdef ioread64
> +VFIO_IOREAD(64)
> +#endif
>  
>  #define VFIO_IORDWR(size)						\
>  static int vfio_pci_core_iordwr##size(struct vfio_pci_core_device *vdev,\
> @@ -124,6 +127,10 @@ static int vfio_pci_core_iordwr##size(struct vfio_pci_core_device *vdev,\
>  VFIO_IORDWR(8)
>  VFIO_IORDWR(16)
>  VFIO_IORDWR(32)
> +#if defined(ioread64) && defined(iowrite64)
> +VFIO_IORDWR(64)
> +#endif
> +
>  /*
>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
>   * range which is inaccessible.  The excluded range drops writes and fills
> @@ -148,6 +155,15 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>  		else
>  			fillable = 0;
>  
> +#if defined(ioread64) && defined(iowrite64)

Nit, #ifdef vfio_pci_core_iordwr64

> +		if (fillable >= 8 && !(off % 8)) {
> +			ret = vfio_pci_core_iordwr64(vdev, iswrite, test_mem,
> +						     io, buf, off, &filled);
> +			if (ret)
> +				return ret;
> +
> +		} else
> +#endif /* defined(ioread64) && defined(iowrite64) */

AFAIK, the comment appended to the #endif is really only suggested when
the code block is too long to reasonable fit in a terminal.  That's no
longer the case with the new helper.  Thanks,

Alex

>  		if (fillable >= 4 && !(off % 4)) {
>  			ret = vfio_pci_core_iordwr32(vdev, iswrite, test_mem,
>  						     io, buf, off, &filled);
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index a2c8b8bba711..f4cf5fd2350c 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -157,5 +157,8 @@ int vfio_pci_core_ioread##size(struct vfio_pci_core_device *vdev,	\
>  VFIO_IOREAD_DECLATION(8)
>  VFIO_IOREAD_DECLATION(16)
>  VFIO_IOREAD_DECLATION(32)
> +#ifdef ioread64
> +VFIO_IOREAD_DECLATION(64)
> +#endif
>  
>  #endif /* VFIO_PCI_CORE_H */
Ramesh Thomas May 17, 2024, 10:29 a.m. UTC | #2
On 4/25/2024 9:56 AM, Gerd Bayer wrote:
> From: Ben Segal <bpsegal@us.ibm.com>
> 
> Many PCI adapters can benefit or even require full 64bit read
> and write access to their registers. In order to enable work on
> user-space drivers for these devices add two new variations
> vfio_pci_core_io{read|write}64 of the existing access methods
> when the architecture supports 64-bit ioreads and iowrites.

This is indeed necessary as back to back 32 bit may not be optimal in 
some devices.

> 
> Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
> Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> ---
>   drivers/vfio/pci/vfio_pci_rdwr.c | 16 ++++++++++++++++
>   include/linux/vfio_pci_core.h    |  3 +++
>   2 files changed, 19 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 3335f1b868b1..8ed06edaee23 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -89,6 +89,9 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_ioread##size);
>   VFIO_IOREAD(8)
>   VFIO_IOREAD(16)
>   VFIO_IOREAD(32)
> +#ifdef ioread64
> +VFIO_IOREAD(64)
> +#endif
>   
>   #define VFIO_IORDWR(size)						\
>   static int vfio_pci_core_iordwr##size(struct vfio_pci_core_device *vdev,\
> @@ -124,6 +127,10 @@ static int vfio_pci_core_iordwr##size(struct vfio_pci_core_device *vdev,\
>   VFIO_IORDWR(8)
>   VFIO_IORDWR(16)
>   VFIO_IORDWR(32)
> +#if defined(ioread64) && defined(iowrite64)
> +VFIO_IORDWR(64)
> +#endif
> +
>   /*
>    * Read or write from an __iomem region (MMIO or I/O port) with an excluded
>    * range which is inaccessible.  The excluded range drops writes and fills
> @@ -148,6 +155,15 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>   		else
>   			fillable = 0;
>   
> +#if defined(ioread64) && defined(iowrite64)

Can we check for #ifdef CONFIG_64BIT instead? In x86, ioread64 and 
iowrite64 get declared as extern functions if CONFIG_GENERIC_IOMAP is 
defined and this check always fails. In include/asm-generic/io.h, 
asm-generic/iomap.h gets included which declares them as extern functions.

One more thing to consider io-64-nonatomic-hi-lo.h and 
io-64-nonatomic-lo-hi.h, if included would define it as a macro that 
calls a function that rw 32 bits back to back.

> +		if (fillable >= 8 && !(off % 8)) {
> +			ret = vfio_pci_core_iordwr64(vdev, iswrite, test_mem,
> +						     io, buf, off, &filled);
> +			if (ret)
> +				return ret;
> +
> +		} else
> +#endif /* defined(ioread64) && defined(iowrite64) */
>   		if (fillable >= 4 && !(off % 4)) {
>   			ret = vfio_pci_core_iordwr32(vdev, iswrite, test_mem,
>   						     io, buf, off, &filled);
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index a2c8b8bba711..f4cf5fd2350c 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -157,5 +157,8 @@ int vfio_pci_core_ioread##size(struct vfio_pci_core_device *vdev,	\
>   VFIO_IOREAD_DECLATION(8)
>   VFIO_IOREAD_DECLATION(16)
>   VFIO_IOREAD_DECLATION(32)
> +#ifdef ioread64
> +VFIO_IOREAD_DECLATION(64)
nit: This macro is referenced only in this file. Can the typo be 
corrected (_DECLARATION)?

> +#endif
>   
>   #endif /* VFIO_PCI_CORE_H */
Tian, Kevin May 20, 2024, 9:02 a.m. UTC | #3
> From: Ramesh Thomas <ramesh.thomas@intel.com>
> Sent: Friday, May 17, 2024 6:30 PM
> 
> On 4/25/2024 9:56 AM, Gerd Bayer wrote:
> > From: Ben Segal <bpsegal@us.ibm.com>
> >
> > @@ -148,6 +155,15 @@ ssize_t vfio_pci_core_do_io_rw(struct
> vfio_pci_core_device *vdev, bool test_mem,
> >   		else
> >   			fillable = 0;
> >	
> > +#if defined(ioread64) && defined(iowrite64)
> 
> Can we check for #ifdef CONFIG_64BIT instead? In x86, ioread64 and
> iowrite64 get declared as extern functions if CONFIG_GENERIC_IOMAP is
> defined and this check always fails. In include/asm-generic/io.h,
> asm-generic/iomap.h gets included which declares them as extern functions.
> 
> One more thing to consider io-64-nonatomic-hi-lo.h and
> io-64-nonatomic-lo-hi.h, if included would define it as a macro that
> calls a function that rw 32 bits back to back.

I don't see the problem here. when the defined check fails it falls
back to back-to-back vfio_pci_core_iordwr32(). there is no need to
do it in an indirect way via including io-64-nonatomic-hi-lo.h.
Gerd Bayer May 21, 2024, 3:50 p.m. UTC | #4
On Mon, 2024-04-29 at 10:31 -0600, Alex Williamson wrote:
> On Thu, 25 Apr 2024 18:56:03 +0200
> Gerd Bayer <gbayer@linux.ibm.com> wrote:
> 
> > From: Ben Segal <bpsegal@us.ibm.com>
> > 
> > Many PCI adapters can benefit or even require full 64bit read
> > and write access to their registers. In order to enable work on
> > user-space drivers for these devices add two new variations
> > vfio_pci_core_io{read|write}64 of the existing access methods
> > when the architecture supports 64-bit ioreads and iowrites.
> > 
> > Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
> > Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
> > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_rdwr.c | 16 ++++++++++++++++
> >  include/linux/vfio_pci_core.h    |  3 +++
> >  2 files changed, 19 insertions(+)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c
> > b/drivers/vfio/pci/vfio_pci_rdwr.c
> > index 3335f1b868b1..8ed06edaee23 100644
> > --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> > +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> > @@ -89,6 +89,9 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_ioread##size);
> >  VFIO_IOREAD(8)
> >  VFIO_IOREAD(16)
> >  VFIO_IOREAD(32)
> > +#ifdef ioread64
> > +VFIO_IOREAD(64)
> > +#endif
> >  
> >  #define
> > VFIO_IORDWR(size)						\
> >  static int vfio_pci_core_iordwr##size(struct vfio_pci_core_device
> > *vdev,\
> > @@ -124,6 +127,10 @@ static int vfio_pci_core_iordwr##size(struct
> > vfio_pci_core_device *vdev,\
> >  VFIO_IORDWR(8)
> >  VFIO_IORDWR(16)
> >  VFIO_IORDWR(32)
> > +#if defined(ioread64) && defined(iowrite64)
> > +VFIO_IORDWR(64)
> > +#endif
> > +
> >  /*
> >   * Read or write from an __iomem region (MMIO or I/O port) with an
> > excluded
> >   * range which is inaccessible.  The excluded range drops writes
> > and fills
> > @@ -148,6 +155,15 @@ ssize_t vfio_pci_core_do_io_rw(struct
> > vfio_pci_core_device *vdev, bool test_mem,
> >  		else
> >  			fillable = 0;
> >  
> > +#if defined(ioread64) && defined(iowrite64)
> 
> Nit, #ifdef vfio_pci_core_iordwr64

I'm sorry, but I'm not sure how I should check for the expanded symbol
here. I think I'll have to stick to checking the same condition as for
whether VFIO_IORDWR(64) should be expanded.

> > +		if (fillable >= 8 && !(off % 8)) {
> > +			ret = vfio_pci_core_iordwr64(vdev,
> > iswrite, test_mem,
> > +						     io, buf, off,
> > &filled);
> > +			if (ret)
> > +				return ret;
> > +
> > +		} else
> > +#endif /* defined(ioread64) && defined(iowrite64) */
> 
> AFAIK, the comment appended to the #endif is really only suggested
> when the code block is too long to reasonable fit in a terminal. 
> That's no longer the case with the new helper.

Yes, I'll change that.

> Thanks,
> 
> Alex
> 
> 

Thanks, Gerd
Gerd Bayer May 21, 2024, 4:40 p.m. UTC | #5
On Fri, 2024-05-17 at 03:29 -0700, Ramesh Thomas wrote:
> On 4/25/2024 9:56 AM, Gerd Bayer wrote:
> > From: Ben Segal <bpsegal@us.ibm.com>
> > 
> > Many PCI adapters can benefit or even require full 64bit read
> > and write access to their registers. In order to enable work on
> > user-space drivers for these devices add two new variations
> > vfio_pci_core_io{read|write}64 of the existing access methods
> > when the architecture supports 64-bit ioreads and iowrites.
> 
> This is indeed necessary as back to back 32 bit may not be optimal in
> some devices.
> 
> > 
> > Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
> > Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
> > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> > ---
> >   drivers/vfio/pci/vfio_pci_rdwr.c | 16 ++++++++++++++++
> >   include/linux/vfio_pci_core.h    |  3 +++
> >   2 files changed, 19 insertions(+)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c
> > b/drivers/vfio/pci/vfio_pci_rdwr.c
> > index 3335f1b868b1..8ed06edaee23 100644
> > --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> > +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> > @@ -89,6 +89,9 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_ioread##size);
> >   VFIO_IOREAD(8)
> >   VFIO_IOREAD(16)
> >   VFIO_IOREAD(32)
> > +#ifdef ioread64
> > +VFIO_IOREAD(64)
> > +#endif
> >   
> >   #define
> > VFIO_IORDWR(size)						\
> >   static int vfio_pci_core_iordwr##size(struct vfio_pci_core_device
> > *vdev,\
> > @@ -124,6 +127,10 @@ static int vfio_pci_core_iordwr##size(struct
> > vfio_pci_core_device *vdev,\
> >   VFIO_IORDWR(8)
> >   VFIO_IORDWR(16)
> >   VFIO_IORDWR(32)
> > +#if defined(ioread64) && defined(iowrite64)
> > +VFIO_IORDWR(64)
> > +#endif
> > +
> >   /*
> >    * Read or write from an __iomem region (MMIO or I/O port) with
> > an excluded
> >    * range which is inaccessible.  The excluded range drops writes
> > and fills
> > @@ -148,6 +155,15 @@ ssize_t vfio_pci_core_do_io_rw(struct
> > vfio_pci_core_device *vdev, bool test_mem,
> >   		else
> >   			fillable = 0;
> >   
> > +#if defined(ioread64) && defined(iowrite64)
> 
> Can we check for #ifdef CONFIG_64BIT instead? In x86, ioread64 and 
> iowrite64 get declared as extern functions if CONFIG_GENERIC_IOMAP is
> defined and this check always fails. In include/asm-generic/io.h, 
> asm-generic/iomap.h gets included which declares them as extern
> functions.

I thinks that should be possible - since ioread64/iowrite64 depend on
CONFIG_64BIT themselves.

> One more thing to consider io-64-nonatomic-hi-lo.h and 
> io-64-nonatomic-lo-hi.h, if included would define it as a macro that 
> calls a function that rw 32 bits back to back.

Even today, vfio_pci_core_do_io_rw() makes no guarantees to its users
that register accesses will be done in the granularity they've thought
to use. The vfs layer may coalesce the accesses and this code will then
read/write the largest naturally aligned chunks. I've witnessed in my
testing that one device driver was doing 8-byte writes through the 8-
byte capable vfio layer all of a sudden when run in a KVM guest.

So higher-level code needs to consider how to split register accesses
appropriately to get the intended side-effects. Thus, I'd rather stay
away from splitting 64bit accesses into two 32bit accesses - and decide
if high or low order values should be written first.


> > +		if (fillable >= 8 && !(off % 8)) {
> > +			ret = vfio_pci_core_iordwr64(vdev,
> > iswrite, test_mem,
> > +						     io, buf, off,
> > &filled);
> > +			if (ret)
> > +				return ret;
> > +
> > +		} else
> > +#endif /* defined(ioread64) && defined(iowrite64) */
> >   		if (fillable >= 4 && !(off % 4)) {
> >   			ret = vfio_pci_core_iordwr32(vdev,
> > iswrite, test_mem,
> >   						     io, buf, off,
> > &filled);
> > diff --git a/include/linux/vfio_pci_core.h
> > b/include/linux/vfio_pci_core.h
> > index a2c8b8bba711..f4cf5fd2350c 100644
> > --- a/include/linux/vfio_pci_core.h
> > +++ b/include/linux/vfio_pci_core.h
> > @@ -157,5 +157,8 @@ int vfio_pci_core_ioread##size(struct
> > vfio_pci_core_device *vdev,	\
> >   VFIO_IOREAD_DECLATION(8)
> >   VFIO_IOREAD_DECLATION(16)
> >   VFIO_IOREAD_DECLATION(32)
> > +#ifdef ioread64
> > +VFIO_IOREAD_DECLATION(64)
> nit: This macro is referenced only in this file. Can the typo be 
> corrected (_DECLARATION)?

Sure thanks for pointing this out!
I'll single this editorial change out into a separate patch of the
series, though.

> 
> > +#endif
> >   
> >   #endif /* VFIO_PCI_CORE_H */
> 
> 

Thanks, Gerd
Jason Gunthorpe May 22, 2024, 1:48 p.m. UTC | #6
On Tue, May 21, 2024 at 06:40:13PM +0200, Gerd Bayer wrote:
> > > @@ -148,6 +155,15 @@ ssize_t vfio_pci_core_do_io_rw(struct
> > > vfio_pci_core_device *vdev, bool test_mem,
> > >   		else
> > >   			fillable = 0;
> > >   
> > > +#if defined(ioread64) && defined(iowrite64)
> > 
> > Can we check for #ifdef CONFIG_64BIT instead? In x86, ioread64 and 
> > iowrite64 get declared as extern functions if CONFIG_GENERIC_IOMAP is
> > defined and this check always fails. In include/asm-generic/io.h, 
> > asm-generic/iomap.h gets included which declares them as extern
> > functions.
> 
> I thinks that should be possible - since ioread64/iowrite64 depend on
> CONFIG_64BIT themselves.
> 
> > One more thing to consider io-64-nonatomic-hi-lo.h and 
> > io-64-nonatomic-lo-hi.h, if included would define it as a macro that 
> > calls a function that rw 32 bits back to back.

This might be a better way to go than trying to have vfio provide its
own emulation.

> Even today, vfio_pci_core_do_io_rw() makes no guarantees to its users
> that register accesses will be done in the granularity they've thought
> to use. The vfs layer may coalesce the accesses and this code will then
> read/write the largest naturally aligned chunks. I've witnessed in my
> testing that one device driver was doing 8-byte writes through the 8-
> byte capable vfio layer all of a sudden when run in a KVM guest.

Sure, KVM has emulation for various byte sizes, and does invoke vfio
with the raw size it got from guest, including larger than 8 sizes
from things like SSE instructions. This has nothing to do with the VFS
layer.

> So higher-level code needs to consider how to split register accesses
> appropriately to get the intended side-effects. Thus, I'd rather stay
> away from splitting 64bit accesses into two 32bit accesses - and decide
> if high or low order values should be written first.

The VFIO API is a byte for byte memcpy. VFIO should try to do the
largest single instruction accesses it knows how to do because some HW
is sensitive to that. Otherwise it does a memcpy loop.

Jason
Ramesh Thomas May 22, 2024, 11:57 p.m. UTC | #7
On 5/21/2024 9:40 AM, Gerd Bayer wrote:
> On Fri, 2024-05-17 at 03:29 -0700, Ramesh Thomas wrote:
>> On 4/25/2024 9:56 AM, Gerd Bayer wrote:
>>> From: Ben Segal <bpsegal@us.ibm.com>
>>>
>>> Many PCI adapters can benefit or even require full 64bit read
>>> and write access to their registers. In order to enable work on
>>> user-space drivers for these devices add two new variations
>>> vfio_pci_core_io{read|write}64 of the existing access methods
>>> when the architecture supports 64-bit ioreads and iowrites.
>>
>> This is indeed necessary as back to back 32 bit may not be optimal in
>> some devices.
>>
>>>
>>> Signed-off-by: Ben Segal <bpsegal@us.ibm.com>
>>> Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com>
>>> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
>>> ---
>>>    drivers/vfio/pci/vfio_pci_rdwr.c | 16 ++++++++++++++++
>>>    include/linux/vfio_pci_core.h    |  3 +++
>>>    2 files changed, 19 insertions(+)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c
>>> b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> index 3335f1b868b1..8ed06edaee23 100644
>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> @@ -89,6 +89,9 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_ioread##size);
>>>    VFIO_IOREAD(8)
>>>    VFIO_IOREAD(16)
>>>    VFIO_IOREAD(32)
>>> +#ifdef ioread64
>>> +VFIO_IOREAD(64)
>>> +#endif
>>>    
>>>    #define
>>> VFIO_IORDWR(size)						\
>>>    static int vfio_pci_core_iordwr##size(struct vfio_pci_core_device
>>> *vdev,\
>>> @@ -124,6 +127,10 @@ static int vfio_pci_core_iordwr##size(struct
>>> vfio_pci_core_device *vdev,\
>>>    VFIO_IORDWR(8)
>>>    VFIO_IORDWR(16)
>>>    VFIO_IORDWR(32)
>>> +#if defined(ioread64) && defined(iowrite64)
>>> +VFIO_IORDWR(64)
>>> +#endif
>>> +
>>>    /*
>>>     * Read or write from an __iomem region (MMIO or I/O port) with
>>> an excluded
>>>     * range which is inaccessible.  The excluded range drops writes
>>> and fills
>>> @@ -148,6 +155,15 @@ ssize_t vfio_pci_core_do_io_rw(struct
>>> vfio_pci_core_device *vdev, bool test_mem,
>>>    		else
>>>    			fillable = 0;
>>>    
>>> +#if defined(ioread64) && defined(iowrite64)
>>
>> Can we check for #ifdef CONFIG_64BIT instead? In x86, ioread64 and
>> iowrite64 get declared as extern functions if CONFIG_GENERIC_IOMAP is
>> defined and this check always fails. In include/asm-generic/io.h,
>> asm-generic/iomap.h gets included which declares them as extern
>> functions.
> 
> I thinks that should be possible - since ioread64/iowrite64 depend on
> CONFIG_64BIT themselves.
> 
>> One more thing to consider io-64-nonatomic-hi-lo.h and
>> io-64-nonatomic-lo-hi.h, if included would define it as a macro that
>> calls a function that rw 32 bits back to back.
> 
> Even today, vfio_pci_core_do_io_rw() makes no guarantees to its users
> that register accesses will be done in the granularity they've thought
> to use. The vfs layer may coalesce the accesses and this code will then
> read/write the largest naturally aligned chunks. I've witnessed in my
> testing that one device driver was doing 8-byte writes through the 8-
> byte capable vfio layer all of a sudden when run in a KVM guest.
> 
> So higher-level code needs to consider how to split register accesses
> appropriately to get the intended side-effects. Thus, I'd rather stay
> away from splitting 64bit accesses into two 32bit accesses - and decide
> if high or low order values should be written first.

Sorry, I was not clear. The main reason to include 
io-64-nonatomic-hi-lo.h or io-64-nonatomic-lo-hi.h is to get the 
iowrite64 and ioread64 macros defined mapping to functions that 
implement them. They define and map them to generic functions in 
lib/iomap.c The 64 bit rw functions (readq/writeq) get called if 
present. If any architecture has not implemented them, then it maps them 
to functions that do 32 bit back to back.

> 
> 
>>> +		if (fillable >= 8 && !(off % 8)) {
>>> +			ret = vfio_pci_core_iordwr64(vdev,
>>> iswrite, test_mem,
>>> +						     io, buf, off,
>>> &filled);
>>> +			if (ret)
>>> +				return ret;
>>> +
>>> +		} else
>>> +#endif /* defined(ioread64) && defined(iowrite64) */
>>>    		if (fillable >= 4 && !(off % 4)) {
>>>    			ret = vfio_pci_core_iordwr32(vdev,
>>> iswrite, test_mem,
>>>    						     io, buf, off,
>>> &filled);
>>> diff --git a/include/linux/vfio_pci_core.h
>>> b/include/linux/vfio_pci_core.h
>>> index a2c8b8bba711..f4cf5fd2350c 100644
>>> --- a/include/linux/vfio_pci_core.h
>>> +++ b/include/linux/vfio_pci_core.h
>>> @@ -157,5 +157,8 @@ int vfio_pci_core_ioread##size(struct
>>> vfio_pci_core_device *vdev,	\
>>>    VFIO_IOREAD_DECLATION(8)
>>>    VFIO_IOREAD_DECLATION(16)
>>>    VFIO_IOREAD_DECLATION(32)
>>> +#ifdef ioread64
>>> +VFIO_IOREAD_DECLATION(64)
>> nit: This macro is referenced only in this file. Can the typo be
>> corrected (_DECLARATION)?
> 
> Sure thanks for pointing this out!
> I'll single this editorial change out into a separate patch of the
> series, though.
> 
>>
>>> +#endif
>>>    
>>>    #endif /* VFIO_PCI_CORE_H */
>>
>>
> 
> Thanks, Gerd
Ramesh Thomas May 23, 2024, 12:11 a.m. UTC | #8
On 5/20/2024 2:02 AM, Tian, Kevin wrote:
>> From: Ramesh Thomas <ramesh.thomas@intel.com>
>> Sent: Friday, May 17, 2024 6:30 PM
>>
>> On 4/25/2024 9:56 AM, Gerd Bayer wrote:
>>> From: Ben Segal <bpsegal@us.ibm.com>
>>>
>>> @@ -148,6 +155,15 @@ ssize_t vfio_pci_core_do_io_rw(struct
>> vfio_pci_core_device *vdev, bool test_mem,
>>>    		else
>>>    			fillable = 0;
>>> 	
>>> +#if defined(ioread64) && defined(iowrite64)
>>
>> Can we check for #ifdef CONFIG_64BIT instead? In x86, ioread64 and
>> iowrite64 get declared as extern functions if CONFIG_GENERIC_IOMAP is
>> defined and this check always fails. In include/asm-generic/io.h,
>> asm-generic/iomap.h gets included which declares them as extern functions.
>>
>> One more thing to consider io-64-nonatomic-hi-lo.h and
>> io-64-nonatomic-lo-hi.h, if included would define it as a macro that
>> calls a function that rw 32 bits back to back.
> 
> I don't see the problem here. when the defined check fails it falls
> back to back-to-back vfio_pci_core_iordwr32(). there is no need to
> do it in an indirect way via including io-64-nonatomic-hi-lo.h.

The issue is iowrite64 and iowrite64 was not getting defined when 
CONFIG_GENERIC_IOMAP was not defined, even though the architecture 
implemented the 64 bit rw functions readq and writeq. 
io-64-nonatomic-hi-lo.h and io-64-nonatomic-lo-hi.h define them and map 
them to generic implementations in lib/iomap.c. The implementation calls 
the 64 bit rw functions if present, otherwise does 32 bit back to back 
rw. Besides it also has sanity checks for port numbers in the iomap 
path. I think it is better to rely on this existing generic method than 
implementing the checks at places where iowrite64 and ioread64 get 
called, at least in the IOMAP path.
Ramesh Thomas May 23, 2024, 9:52 p.m. UTC | #9
On 5/22/2024 5:11 PM, Ramesh Thomas wrote:
> On 5/20/2024 2:02 AM, Tian, Kevin wrote:
>>> From: Ramesh Thomas <ramesh.thomas@intel.com>
>>> Sent: Friday, May 17, 2024 6:30 PM
>>>
>>> On 4/25/2024 9:56 AM, Gerd Bayer wrote:
>>>> From: Ben Segal <bpsegal@us.ibm.com>
>>>>
>>>> @@ -148,6 +155,15 @@ ssize_t vfio_pci_core_do_io_rw(struct
>>> vfio_pci_core_device *vdev, bool test_mem,
>>>>            else
>>>>                fillable = 0;
>>>>
>>>> +#if defined(ioread64) && defined(iowrite64)
>>>
>>> Can we check for #ifdef CONFIG_64BIT instead? In x86, ioread64 and
>>> iowrite64 get declared as extern functions if CONFIG_GENERIC_IOMAP is
>>> defined and this check always fails. In include/asm-generic/io.h,
>>> asm-generic/iomap.h gets included which declares them as extern 
>>> functions.
>>>
>>> One more thing to consider io-64-nonatomic-hi-lo.h and
>>> io-64-nonatomic-lo-hi.h, if included would define it as a macro that
>>> calls a function that rw 32 bits back to back.
>>
>> I don't see the problem here. when the defined check fails it falls
>> back to back-to-back vfio_pci_core_iordwr32(). there is no need to
>> do it in an indirect way via including io-64-nonatomic-hi-lo.h.
> 
> The issue is iowrite64 and iowrite64 was not getting defined when 
> CONFIG_GENERIC_IOMAP was not defined, even though the architecture 

Sorry, I meant they were not getting defined when CONFIG_GENERIC_IOMAP 
*was defined*. The only definitions of ioread64 and iowrite64 in the 
code path are in asm/io.h where they are surrounded by #ifndef 
CONFIG_GENERIC_IOMAP

> implemented the 64 bit rw functions readq and writeq. 
> io-64-nonatomic-hi-lo.h and io-64-nonatomic-lo-hi.h define them and map 
> them to generic implementations in lib/iomap.c. The implementation calls 
> the 64 bit rw functions if present, otherwise does 32 bit back to back 
> rw. Besides it also has sanity checks for port numbers in the iomap 
> path. I think it is better to rely on this existing generic method than 
> implementing the checks at places where iowrite64 and ioread64 get 
> called, at least in the IOMAP path.
diff mbox series

Patch

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 3335f1b868b1..8ed06edaee23 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -89,6 +89,9 @@  EXPORT_SYMBOL_GPL(vfio_pci_core_ioread##size);
 VFIO_IOREAD(8)
 VFIO_IOREAD(16)
 VFIO_IOREAD(32)
+#ifdef ioread64
+VFIO_IOREAD(64)
+#endif
 
 #define VFIO_IORDWR(size)						\
 static int vfio_pci_core_iordwr##size(struct vfio_pci_core_device *vdev,\
@@ -124,6 +127,10 @@  static int vfio_pci_core_iordwr##size(struct vfio_pci_core_device *vdev,\
 VFIO_IORDWR(8)
 VFIO_IORDWR(16)
 VFIO_IORDWR(32)
+#if defined(ioread64) && defined(iowrite64)
+VFIO_IORDWR(64)
+#endif
+
 /*
  * Read or write from an __iomem region (MMIO or I/O port) with an excluded
  * range which is inaccessible.  The excluded range drops writes and fills
@@ -148,6 +155,15 @@  ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 		else
 			fillable = 0;
 
+#if defined(ioread64) && defined(iowrite64)
+		if (fillable >= 8 && !(off % 8)) {
+			ret = vfio_pci_core_iordwr64(vdev, iswrite, test_mem,
+						     io, buf, off, &filled);
+			if (ret)
+				return ret;
+
+		} else
+#endif /* defined(ioread64) && defined(iowrite64) */
 		if (fillable >= 4 && !(off % 4)) {
 			ret = vfio_pci_core_iordwr32(vdev, iswrite, test_mem,
 						     io, buf, off, &filled);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index a2c8b8bba711..f4cf5fd2350c 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -157,5 +157,8 @@  int vfio_pci_core_ioread##size(struct vfio_pci_core_device *vdev,	\
 VFIO_IOREAD_DECLATION(8)
 VFIO_IOREAD_DECLATION(16)
 VFIO_IOREAD_DECLATION(32)
+#ifdef ioread64
+VFIO_IOREAD_DECLATION(64)
+#endif
 
 #endif /* VFIO_PCI_CORE_H */