diff mbox series

[v2] vfio/pci: Add support for opregion v2.0+

Message ID 20210118123834.5991-1-fred.gao@intel.com (mailing list archive)
State New, archived
Headers show
Series [v2] vfio/pci: Add support for opregion v2.0+ | expand

Commit Message

Gao, Fred Jan. 18, 2021, 12:38 p.m. UTC
Before opregion version 2.0 VBT data is stored in opregion mailbox #4,
However, When VBT data exceeds 6KB size and cannot be within mailbox #4
starting from opregion v2.0+, Extended VBT region, next to opregion, is
used to hold the VBT data, so the total size will be opregion size plus
extended VBT region size.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Swee Yee Fonn <swee.yee.fonn@intel.com>
Signed-off-by: Fred Gao <fred.gao@intel.com>
---
 drivers/vfio/pci/vfio_pci_igd.c | 59 +++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

Comments

Alex Williamson Jan. 21, 2021, 8:33 p.m. UTC | #1
On Mon, 18 Jan 2021 20:38:34 +0800
Fred Gao <fred.gao@intel.com> wrote:

> Before opregion version 2.0 VBT data is stored in opregion mailbox #4,
> However, When VBT data exceeds 6KB size and cannot be within mailbox #4
> starting from opregion v2.0+, Extended VBT region, next to opregion, is
> used to hold the VBT data, so the total size will be opregion size plus
> extended VBT region size.
> 
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Signed-off-by: Swee Yee Fonn <swee.yee.fonn@intel.com>
> Signed-off-by: Fred Gao <fred.gao@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_igd.c | 59 +++++++++++++++++++++++++++++++++
>  1 file changed, 59 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
> index 53d97f459252..fc470278a492 100644
> --- a/drivers/vfio/pci/vfio_pci_igd.c
> +++ b/drivers/vfio/pci/vfio_pci_igd.c
> @@ -21,6 +21,10 @@
>  #define OPREGION_SIZE		(8 * 1024)
>  #define OPREGION_PCI_ADDR	0xfc
>  
> +#define OPREGION_RVDA		0x3ba
> +#define OPREGION_RVDS		0x3c2
> +#define OPREGION_VERSION	0x16
> +
>  static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
>  			      size_t count, loff_t *ppos, bool iswrite)
>  {
> @@ -58,6 +62,7 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
>  	u32 addr, size;
>  	void *base;
>  	int ret;
> +	u16 version;
>  
>  	ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, &addr);
>  	if (ret)
> @@ -83,6 +88,60 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
>  
>  	size *= 1024; /* In KB */
>  
> +	/*
> +	 * Support opregion v2.0+
> +	 * When VBT data exceeds 6KB size and cannot be within mailbox #4
> +	 * Extended VBT region, next to opregion, is used to hold the VBT data.
> +	 * RVDA (Relative Address of VBT Data from Opregion Base) and RVDS
> +	 * (VBT Data Size) from opregion structure member are used to hold the
> +	 * address from region base and size of VBT data while RVDA/RVDS
> +	 * are not defined before opregion 2.0.
> +	 *
> +	 * opregion 2.0: rvda is the physical VBT address.
> +	 *
> +	 * opregion 2.1+: rvda is unsigned, relative offset from
> +	 * opregion base, and should never point within opregion.
> +	 */
> +	version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION));
> +	if (version >= 0x0200) {
> +		u64 rvda;
> +		u32 rvds;
> +
> +		rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA));
> +		rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS));
> +		if (rvda && rvds) {
> +			u32 offset;
> +
> +			if (version == 0x0200)
> +				offset = rvda - (u64)addr;
> +			else
> +				offset = rvda;
> +
> +			if (offset != size) {
> +				pci_err(vdev->pdev,
> +				"Extended VBT does not follow opregion !\n"
> +				"opregion version 0x%x:offset 0x%x\n", version, offset);
> +				return -EINVAL;
> +			}
> +
> +			/*
> +			 * the only difference between opregion 2.0 and 2.1 is
> +			 * rvda addressing mode. since rvda is physical host
> +			 * VBT address and cannot be directly used in guest,
> +			 * faked into opregion 2.1's relative offset.
> +			 */
> +			if (version == 0x0200) {
> +				*(__le16 *)(base + OPREGION_VERSION) =
> +					cpu_to_le16(0x0201);
> +				(*(__le64 *)(base + OPREGION_RVDA)) =
> +					cpu_to_le64((rvda - (u64)addr));
> +			}

There's a much better description of the fields and logic here, thanks
for that.  I also see we've closed the gap to require the extended
region to immediately follow the opregion table.  The code
immediately above still makes me nervous as even if this is the only
difference between the specs, code might make some differentiation
based on the spec version, which we're changing in host memory for all
subsequent drivers until the host is rebooted.  Could we use a pci_dbg()
in this branch to flag that event in dmesg for debugging?  Thanks,

Alex

> +
> +			/* region size for opregion v2.0+: opregion and VBT size */
> +			size = offset + rvds;
> +		}
> +	}
> +
>  	if (size != OPREGION_SIZE) {
>  		memunmap(base);
>  		base = memremap(addr, size, MEMREMAP_WB);
Zhenyu Wang Feb. 2, 2021, 5:09 a.m. UTC | #2
On 2021.01.21 13:33:18 -0700, Alex Williamson wrote:
> On Mon, 18 Jan 2021 20:38:34 +0800
> Fred Gao <fred.gao@intel.com> wrote:
> 
> > Before opregion version 2.0 VBT data is stored in opregion mailbox #4,
> > However, When VBT data exceeds 6KB size and cannot be within mailbox #4
> > starting from opregion v2.0+, Extended VBT region, next to opregion, is
> > used to hold the VBT data, so the total size will be opregion size plus
> > extended VBT region size.
> > 
> > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Signed-off-by: Swee Yee Fonn <swee.yee.fonn@intel.com>
> > Signed-off-by: Fred Gao <fred.gao@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_igd.c | 59 +++++++++++++++++++++++++++++++++
> >  1 file changed, 59 insertions(+)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
> > index 53d97f459252..fc470278a492 100644
> > --- a/drivers/vfio/pci/vfio_pci_igd.c
> > +++ b/drivers/vfio/pci/vfio_pci_igd.c
> > @@ -21,6 +21,10 @@
> >  #define OPREGION_SIZE		(8 * 1024)
> >  #define OPREGION_PCI_ADDR	0xfc
> >  
> > +#define OPREGION_RVDA		0x3ba
> > +#define OPREGION_RVDS		0x3c2
> > +#define OPREGION_VERSION	0x16
> > +
> >  static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
> >  			      size_t count, loff_t *ppos, bool iswrite)
> >  {
> > @@ -58,6 +62,7 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
> >  	u32 addr, size;
> >  	void *base;
> >  	int ret;
> > +	u16 version;
> >  
> >  	ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, &addr);
> >  	if (ret)
> > @@ -83,6 +88,60 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
> >  
> >  	size *= 1024; /* In KB */
> >  
> > +	/*
> > +	 * Support opregion v2.0+
> > +	 * When VBT data exceeds 6KB size and cannot be within mailbox #4
> > +	 * Extended VBT region, next to opregion, is used to hold the VBT data.
> > +	 * RVDA (Relative Address of VBT Data from Opregion Base) and RVDS
> > +	 * (VBT Data Size) from opregion structure member are used to hold the
> > +	 * address from region base and size of VBT data while RVDA/RVDS
> > +	 * are not defined before opregion 2.0.
> > +	 *
> > +	 * opregion 2.0: rvda is the physical VBT address.
> > +	 *
> > +	 * opregion 2.1+: rvda is unsigned, relative offset from
> > +	 * opregion base, and should never point within opregion.
> > +	 */
> > +	version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION));
> > +	if (version >= 0x0200) {
> > +		u64 rvda;
> > +		u32 rvds;
> > +
> > +		rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA));
> > +		rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS));
> > +		if (rvda && rvds) {
> > +			u32 offset;
> > +
> > +			if (version == 0x0200)
> > +				offset = rvda - (u64)addr;
> > +			else
> > +				offset = rvda;
> > +
> > +			if (offset != size) {
> > +				pci_err(vdev->pdev,
> > +				"Extended VBT does not follow opregion !\n"
> > +				"opregion version 0x%x:offset 0x%x\n", version, offset);
> > +				return -EINVAL;
> > +			}
> > +
> > +			/*
> > +			 * the only difference between opregion 2.0 and 2.1 is
> > +			 * rvda addressing mode. since rvda is physical host
> > +			 * VBT address and cannot be directly used in guest,
> > +			 * faked into opregion 2.1's relative offset.
> > +			 */
> > +			if (version == 0x0200) {
> > +				*(__le16 *)(base + OPREGION_VERSION) =
> > +					cpu_to_le16(0x0201);
> > +				(*(__le64 *)(base + OPREGION_RVDA)) =
> > +					cpu_to_le64((rvda - (u64)addr));
> > +			}
> 
> There's a much better description of the fields and logic here, thanks
> for that.  I also see we've closed the gap to require the extended
> region to immediately follow the opregion table.  The code
> immediately above still makes me nervous as even if this is the only
> difference between the specs, code might make some differentiation
> based on the spec version, which we're changing in host memory for all
> subsequent drivers until the host is rebooted.  Could we use a pci_dbg()
> in this branch to flag that event in dmesg for debugging?  Thanks,
> 

Alex, that's really valid concern, even we thought it should be no problem,
we asked firmware team again, it looks for opregion 2.0 with VBT >6k case (RVDA != 0)
is not practically available for end user. So I think we may just not support
for that. For opregion 2.1+, just extend the size properly.

Thanks

> 
> > +
> > +			/* region size for opregion v2.0+: opregion and VBT size */
> > +			size = offset + rvds;
> > +		}
> > +	}
> > +
> >  	if (size != OPREGION_SIZE) {
> >  		memunmap(base);
> >  		base = memremap(addr, size, MEMREMAP_WB);
>
diff mbox series

Patch

diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 53d97f459252..fc470278a492 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -21,6 +21,10 @@ 
 #define OPREGION_SIZE		(8 * 1024)
 #define OPREGION_PCI_ADDR	0xfc
 
+#define OPREGION_RVDA		0x3ba
+#define OPREGION_RVDS		0x3c2
+#define OPREGION_VERSION	0x16
+
 static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
 			      size_t count, loff_t *ppos, bool iswrite)
 {
@@ -58,6 +62,7 @@  static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
 	u32 addr, size;
 	void *base;
 	int ret;
+	u16 version;
 
 	ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, &addr);
 	if (ret)
@@ -83,6 +88,60 @@  static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
 
 	size *= 1024; /* In KB */
 
+	/*
+	 * Support opregion v2.0+
+	 * When VBT data exceeds 6KB size and cannot be within mailbox #4
+	 * Extended VBT region, next to opregion, is used to hold the VBT data.
+	 * RVDA (Relative Address of VBT Data from Opregion Base) and RVDS
+	 * (VBT Data Size) from opregion structure member are used to hold the
+	 * address from region base and size of VBT data while RVDA/RVDS
+	 * are not defined before opregion 2.0.
+	 *
+	 * opregion 2.0: rvda is the physical VBT address.
+	 *
+	 * opregion 2.1+: rvda is unsigned, relative offset from
+	 * opregion base, and should never point within opregion.
+	 */
+	version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION));
+	if (version >= 0x0200) {
+		u64 rvda;
+		u32 rvds;
+
+		rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA));
+		rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS));
+		if (rvda && rvds) {
+			u32 offset;
+
+			if (version == 0x0200)
+				offset = rvda - (u64)addr;
+			else
+				offset = rvda;
+
+			if (offset != size) {
+				pci_err(vdev->pdev,
+				"Extended VBT does not follow opregion !\n"
+				"opregion version 0x%x:offset 0x%x\n", version, offset);
+				return -EINVAL;
+			}
+
+			/*
+			 * the only difference between opregion 2.0 and 2.1 is
+			 * rvda addressing mode. since rvda is physical host
+			 * VBT address and cannot be directly used in guest,
+			 * faked into opregion 2.1's relative offset.
+			 */
+			if (version == 0x0200) {
+				*(__le16 *)(base + OPREGION_VERSION) =
+					cpu_to_le16(0x0201);
+				(*(__le64 *)(base + OPREGION_RVDA)) =
+					cpu_to_le64((rvda - (u64)addr));
+			}
+
+			/* region size for opregion v2.0+: opregion and VBT size */
+			size = offset + rvds;
+		}
+	}
+
 	if (size != OPREGION_SIZE) {
 		memunmap(base);
 		base = memremap(addr, size, MEMREMAP_WB);