diff mbox series

[v4] vfio/pci: Add support for opregion v2.1+

Message ID 20210302130220.9349-1-fred.gao@intel.com (mailing list archive)
State New, archived
Headers show
Series [v4] vfio/pci: Add support for opregion v2.1+ | expand

Commit Message

Gao, Fred March 2, 2021, 1:02 p.m. UTC
Before opregion version 2.0 VBT data is stored in opregion mailbox #4,
However, When VBT data exceeds 6KB size and cannot be within mailbox #4
starting from opregion v2.0+, Extended VBT region, next to opregion, is
used to hold the VBT data, so the total size will be opregion size plus
extended VBT region size.

since opregion v2.0 with physical host VBT address should not be
practically available for end user, it is not supported.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Swee Yee Fonn <swee.yee.fonn@intel.com>
Signed-off-by: Fred Gao <fred.gao@intel.com>
---
 drivers/vfio/pci/vfio_pci_igd.c | 49 +++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

Comments

Alex Williamson March 19, 2021, 7:26 p.m. UTC | #1
On Tue,  2 Mar 2021 21:02:20 +0800
Fred Gao <fred.gao@intel.com> wrote:

> Before opregion version 2.0 VBT data is stored in opregion mailbox #4,
> However, When VBT data exceeds 6KB size and cannot be within mailbox #4
> starting from opregion v2.0+, Extended VBT region, next to opregion, is
> used to hold the VBT data, so the total size will be opregion size plus
> extended VBT region size.
> 
> since opregion v2.0 with physical host VBT address should not be
> practically available for end user, it is not supported.
> 
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Signed-off-by: Swee Yee Fonn <swee.yee.fonn@intel.com>
> Signed-off-by: Fred Gao <fred.gao@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_igd.c | 49 +++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
> index 53d97f459252..4edb8afcdbfc 100644
> --- a/drivers/vfio/pci/vfio_pci_igd.c
> +++ b/drivers/vfio/pci/vfio_pci_igd.c
> @@ -21,6 +21,10 @@
>  #define OPREGION_SIZE		(8 * 1024)
>  #define OPREGION_PCI_ADDR	0xfc
>  
> +#define OPREGION_RVDA		0x3ba
> +#define OPREGION_RVDS		0x3c2
> +#define OPREGION_VERSION	0x16
> +
>  static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
>  			      size_t count, loff_t *ppos, bool iswrite)
>  {
> @@ -58,6 +62,7 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
>  	u32 addr, size;
>  	void *base;
>  	int ret;
> +	u16 version;
>  
>  	ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, &addr);
>  	if (ret)
> @@ -83,6 +88,50 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
>  
>  	size *= 1024; /* In KB */
>  
> +	/*
> +	 * Support opregion v2.1+
> +	 * When VBT data exceeds 6KB size and cannot be within mailbox #4

s/#4/#4, then the/

> +	 * Extended VBT region, next to opregion, is used to hold the VBT data.
> +	 * RVDA (Relative Address of VBT Data from Opregion Base) and RVDS
> +	 * (VBT Data Size) from opregion structure member are used to hold the
> +	 * address from region base and size of VBT data while RVDA/RVDS
> +	 * are not defined before opregion 2.0.
> +	 *
> +	 * opregion 2.0: rvda is the physical VBT address.

Let's expand the comment to include why this is a problem to support
(virtualization of this register would be required in userspace) and why
we're choosing not to manipulate this into a 2.1+ table, which I think
is both the practical lack of v2.0 tables in use and any implicit
dependencies software may have on the OpRegion version.

> +	 *
> +	 * opregion 2.1+: rvda is unsigned, relative offset from
> +	 * opregion base, and should never point within opregion.

And for our purposes must exactly follow the base opregion to avoid
exposing unknown host memory to userspace, ie. provide a more
descriptive justification for the 2nd error condition below.

> +	 */
> +	version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION));
> +	if (version >= 0x0200) {
> +		u64 rvda;
> +		u32 rvds;
> +
> +		rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA));
> +		rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS));
> +		if (rvda && rvds) {
> +			/* no support for opregion v2.0 with physical VBT address */
> +			if (version == 0x0200) {
> +				memunmap(base);
> +				pci_err(vdev->pdev,
> +					"IGD passthrough does not support opregion\n"
> +					"version 0x%x with physical rvda 0x%llx\n", version, rvda);


Why do we need a new line midway through this log message?

s/passthrough/assignment/

In testing the version you include the leading zero, do you also want
that leading zero in the printed version, ie. %04x?

If we get to this code, we already know that both rvda and rvds are
non-zero, why is it useful to print the rvda value in this error
message?  For example, we could print:

 "IGD assignment does not support opregion version 0x%04x with an extended VBT region"

> +				return -EINVAL;
> +			}
> +
> +			if ((u32)rvda != size) {

What allows us to assume rvda is a 32bit value given that it's a 64bit
register?  It seems safer not to include this cast.

> +				memunmap(base);
> +				pci_err(vdev->pdev,
> +					"Extended VBT does not follow opregion !\n"
> +					"opregion version 0x%x:rvda 0x%llx\n", version, rvda);

Again I'm not sure about the usefulness of printing the rvda value on
its own.  Without knowing the size value it seems meaningless.  Like
above, get rid of the mid-error new line and random space if you keep
the exclamation point.

> +				return -EINVAL;
> +			}
> +
> +			/* region size for opregion v2.0+: opregion and VBT size */
> +			size += rvds;

RVDS is defined as size in bytes, not in kilobytes like the base
opregion size, right?  Let's include that clarification in the comment
since the spec is private.  Thanks,

Alex


> +		}
> +	}
> +
>  	if (size != OPREGION_SIZE) {
>  		memunmap(base);
>  		base = memremap(addr, size, MEMREMAP_WB);
Gao, Fred March 25, 2021, 8:50 a.m. UTC | #2
Thank you for offering your valuable advice.
Will send the updated version soon.

> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Saturday, March 20, 2021 3:27 AM
> To: Gao, Fred <fred.gao@intel.com>
> Cc: kvm@vger.kernel.org; intel-gfx@lists.freedesktop.org; Zhenyu Wang
> <zhenyuw@linux.intel.com>; Fonn, Swee Yee <swee.yee.fonn@intel.com>
> Subject: Re: [PATCH v4] vfio/pci: Add support for opregion v2.1+
> 
> On Tue,  2 Mar 2021 21:02:20 +0800
> Fred Gao <fred.gao@intel.com> wrote:
> 
> > Before opregion version 2.0 VBT data is stored in opregion mailbox #4,
> > However, When VBT data exceeds 6KB size and cannot be within mailbox
> > #4 starting from opregion v2.0+, Extended VBT region, next to
> > opregion, is used to hold the VBT data, so the total size will be
> > opregion size plus extended VBT region size.
> >
> > since opregion v2.0 with physical host VBT address should not be
> > practically available for end user, it is not supported.
> >
> > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Signed-off-by: Swee Yee Fonn <swee.yee.fonn@intel.com>
> > Signed-off-by: Fred Gao <fred.gao@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_igd.c | 49
> > +++++++++++++++++++++++++++++++++
> >  1 file changed, 49 insertions(+)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_igd.c
> > b/drivers/vfio/pci/vfio_pci_igd.c index 53d97f459252..4edb8afcdbfc
> > 100644
> > --- a/drivers/vfio/pci/vfio_pci_igd.c
> > +++ b/drivers/vfio/pci/vfio_pci_igd.c
> > @@ -21,6 +21,10 @@
> >  #define OPREGION_SIZE		(8 * 1024)
> >  #define OPREGION_PCI_ADDR	0xfc
> >
> > +#define OPREGION_RVDA		0x3ba
> > +#define OPREGION_RVDS		0x3c2
> > +#define OPREGION_VERSION	0x16
> > +
> >  static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user
> *buf,
> >  			      size_t count, loff_t *ppos, bool iswrite)  { @@ -
> 58,6 +62,7
> > @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
> >  	u32 addr, size;
> >  	void *base;
> >  	int ret;
> > +	u16 version;
> >
> >  	ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR,
> &addr);
> >  	if (ret)
> > @@ -83,6 +88,50 @@ static int vfio_pci_igd_opregion_init(struct
> > vfio_pci_device *vdev)
> >
> >  	size *= 1024; /* In KB */
> >
> > +	/*
> > +	 * Support opregion v2.1+
> > +	 * When VBT data exceeds 6KB size and cannot be within mailbox #4
> 
> s/#4/#4, then the/
> 
> > +	 * Extended VBT region, next to opregion, is used to hold the VBT
> data.
> > +	 * RVDA (Relative Address of VBT Data from Opregion Base) and
> RVDS
> > +	 * (VBT Data Size) from opregion structure member are used to hold
> the
> > +	 * address from region base and size of VBT data while RVDA/RVDS
> > +	 * are not defined before opregion 2.0.
> > +	 *
> > +	 * opregion 2.0: rvda is the physical VBT address.
> 
> Let's expand the comment to include why this is a problem to support
> (virtualization of this register would be required in userspace) and why we're
> choosing not to manipulate this into a 2.1+ table, which I think is both the
> practical lack of v2.0 tables in use and any implicit dependencies software
> may have on the OpRegion version.
> 
> > +	 *
> > +	 * opregion 2.1+: rvda is unsigned, relative offset from
> > +	 * opregion base, and should never point within opregion.
> 
> And for our purposes must exactly follow the base opregion to avoid
> exposing unknown host memory to userspace, ie. provide a more descriptive
> justification for the 2nd error condition below.
> 
> > +	 */
> > +	version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION));
> > +	if (version >= 0x0200) {
> > +		u64 rvda;
> > +		u32 rvds;
> > +
> > +		rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA));
> > +		rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS));
> > +		if (rvda && rvds) {
> > +			/* no support for opregion v2.0 with physical VBT
> address */
> > +			if (version == 0x0200) {
> > +				memunmap(base);
> > +				pci_err(vdev->pdev,
> > +					"IGD passthrough does not support
> opregion\n"
> > +					"version 0x%x with physical rvda
> 0x%llx\n", version, rvda);
> 
> 
> Why do we need a new line midway through this log message?
> 
> s/passthrough/assignment/
> 
> In testing the version you include the leading zero, do you also want that
> leading zero in the printed version, ie. %04x?
> 
> If we get to this code, we already know that both rvda and rvds are non-zero,
> why is it useful to print the rvda value in this error message?  For example,
> we could print:
> 
>  "IGD assignment does not support opregion version 0x%04x with an
> extended VBT region"
> 
> > +				return -EINVAL;
> > +			}
> > +
> > +			if ((u32)rvda != size) {
> 
> What allows us to assume rvda is a 32bit value given that it's a 64bit register?
> It seems safer not to include this cast.
> 
> > +				memunmap(base);
> > +				pci_err(vdev->pdev,
> > +					"Extended VBT does not follow
> opregion !\n"
> > +					"opregion version 0x%x:rvda
> 0x%llx\n", version, rvda);
> 
> Again I'm not sure about the usefulness of printing the rvda value on its own.
> Without knowing the size value it seems meaningless.  Like above, get rid of
> the mid-error new line and random space if you keep the exclamation point.
> 
> > +				return -EINVAL;
> > +			}
> > +
> > +			/* region size for opregion v2.0+: opregion and VBT
> size */
> > +			size += rvds;
> 
> RVDS is defined as size in bytes, not in kilobytes like the base opregion size,
> right?  Let's include that clarification in the comment since the spec is private.
> Thanks,
> 
> Alex
> 
> 
> > +		}
> > +	}
> > +
> >  	if (size != OPREGION_SIZE) {
> >  		memunmap(base);
> >  		base = memremap(addr, size, MEMREMAP_WB);
diff mbox series

Patch

diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 53d97f459252..4edb8afcdbfc 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -21,6 +21,10 @@ 
 #define OPREGION_SIZE		(8 * 1024)
 #define OPREGION_PCI_ADDR	0xfc
 
+#define OPREGION_RVDA		0x3ba
+#define OPREGION_RVDS		0x3c2
+#define OPREGION_VERSION	0x16
+
 static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
 			      size_t count, loff_t *ppos, bool iswrite)
 {
@@ -58,6 +62,7 @@  static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
 	u32 addr, size;
 	void *base;
 	int ret;
+	u16 version;
 
 	ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, &addr);
 	if (ret)
@@ -83,6 +88,50 @@  static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
 
 	size *= 1024; /* In KB */
 
+	/*
+	 * Support opregion v2.1+
+	 * When VBT data exceeds 6KB size and cannot be within mailbox #4
+	 * Extended VBT region, next to opregion, is used to hold the VBT data.
+	 * RVDA (Relative Address of VBT Data from Opregion Base) and RVDS
+	 * (VBT Data Size) from opregion structure member are used to hold the
+	 * address from region base and size of VBT data while RVDA/RVDS
+	 * are not defined before opregion 2.0.
+	 *
+	 * opregion 2.0: rvda is the physical VBT address.
+	 *
+	 * opregion 2.1+: rvda is unsigned, relative offset from
+	 * opregion base, and should never point within opregion.
+	 */
+	version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION));
+	if (version >= 0x0200) {
+		u64 rvda;
+		u32 rvds;
+
+		rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA));
+		rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS));
+		if (rvda && rvds) {
+			/* no support for opregion v2.0 with physical VBT address */
+			if (version == 0x0200) {
+				memunmap(base);
+				pci_err(vdev->pdev,
+					"IGD passthrough does not support opregion\n"
+					"version 0x%x with physical rvda 0x%llx\n", version, rvda);
+				return -EINVAL;
+			}
+
+			if ((u32)rvda != size) {
+				memunmap(base);
+				pci_err(vdev->pdev,
+					"Extended VBT does not follow opregion !\n"
+					"opregion version 0x%x:rvda 0x%llx\n", version, rvda);
+				return -EINVAL;
+			}
+
+			/* region size for opregion v2.0+: opregion and VBT size */
+			size += rvds;
+		}
+	}
+
 	if (size != OPREGION_SIZE) {
 		memunmap(base);
 		base = memremap(addr, size, MEMREMAP_WB);