Message ID | b6e8edb21e6cc5f616383cb733c2d5f75d89953e.1548266832.git-series.knut.omang@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | pcie: Add simple ACS "support" to the generic PCIe root port | expand |
On Wed, 23 Jan 2019 19:27:59 +0100 Knut Omang <knut.omang@oracle.com> wrote: > Add a helper function to add PCIe capability for Access Control Services (ACS) > ACS support in the associated root port is a prerequisite to be able to do useful > passthrough with VFIO without Alex Williamson's pcie_acs_override kernel patch. Define "useful". We can certainly still assign single function PFs to an L2 guest, or multi-function so long as all the functions are assigned. I won't deny that it's problematic, but it's a virtual topology that can be adjusted, so I think this is overstating things a bit. > Signed-off-by: Knut Omang <knut.omang@oracle.com> > --- > hw/pci/pcie.c | 14 ++++++++++++++ > include/hw/pci/pcie.h | 1 + > include/hw/pci/pcie_regs.h | 4 ++++ > 3 files changed, 19 insertions(+) > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > index 230478f..18feff5 100644 > --- a/hw/pci/pcie.c > +++ b/hw/pci/pcie.c > @@ -906,3 +906,17 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset) > > pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f); > } > + > +/* Add an ACS (Access Control Services) capability */ > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz) > +{ > + int ectrl_words = (egress_ctrl_vec_sz + 31) & ~31; > + pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, > + offset, PCI_ACS_SIZEOF + ectrl_words); The egress control vector is only valid if the egress control capability is enabled, which is not set below, so this just seems to waste config space and introduces a meaningless function arg. > + pci_set_word(dev->config + offset + PCI_ACS_CAP, > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); Some of these bits are only valid for downstream ports, it would violate the spec to set them on and endpoint. > + pci_set_word(dev->config + offset + PCI_ACS_CTRL, > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); The default values of the control register bits is zero, so we shouldn't be setting it here and we should have a reset hook to clear it. > + /* Make CTRL register writable */ > + memset(dev->wmask + offset + PCI_ACS_CTRL, 0xff, 2); > +} > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h > index 5b82a0d..c2da148 100644 > --- a/include/hw/pci/pcie.h > +++ b/include/hw/pci/pcie.h > @@ -129,6 +129,7 @@ void pcie_add_capability(PCIDevice *dev, > void pcie_sync_bridge_lnk(PCIDevice *dev); > > void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn); > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz); > void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num); > void pcie_ats_init(PCIDevice *dev, uint16_t offset); > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h > index ad4e780..5e7409c 100644 > --- a/include/hw/pci/pcie_regs.h > +++ b/include/hw/pci/pcie_regs.h > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth { > PCI_ERR_COR_INTERNAL | \ > PCI_ERR_COR_HL_OVERFLOW) > > +/* ACS */ > +#define PCI_ACS_VER 0x2 > +#define PCI_ACS_SIZEOF 8 > + > #endif /* QEMU_PCIE_REGS_H */
On Wed, 2019-01-23 at 12:04 -0700, Alex Williamson wrote: > On Wed, 23 Jan 2019 19:27:59 +0100 > Knut Omang <knut.omang@oracle.com> wrote: > > > Add a helper function to add PCIe capability for Access Control Services (ACS) > > ACS support in the associated root port is a prerequisite to be able to do useful > > passthrough with VFIO without Alex Williamson's pcie_acs_override kernel patch. > > Define "useful". Hmm - just that without the patches, the root port itself also gets assigned to the same group, which seemed problematic to me (without any further testing than just binding/unbinding to VFIO) Knut > We can certainly still assign single function PFs to > an L2 guest, or multi-function so long as all the functions are > assigned. I won't deny that it's problematic, but it's a virtual > topology that can be adjusted, so I think this is overstating things a > bit. > > > Signed-off-by: Knut Omang <knut.omang@oracle.com> > > --- > > hw/pci/pcie.c | 14 ++++++++++++++ > > include/hw/pci/pcie.h | 1 + > > include/hw/pci/pcie_regs.h | 4 ++++ > > 3 files changed, 19 insertions(+) > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > index 230478f..18feff5 100644 > > --- a/hw/pci/pcie.c > > +++ b/hw/pci/pcie.c > > @@ -906,3 +906,17 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset) > > > > pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f); > > } > > + > > +/* Add an ACS (Access Control Services) capability */ > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz) > > +{ > > + int ectrl_words = (egress_ctrl_vec_sz + 31) & ~31; > > + pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, > > + offset, PCI_ACS_SIZEOF + ectrl_words); > > The egress control vector is only valid if the egress control > capability is enabled, which is not set below, so this just seems to > waste config space and introduces a meaningless function arg. > > > + pci_set_word(dev->config + offset + PCI_ACS_CAP, > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); > > Some of these bits are only valid for downstream ports, it would > violate the spec to set them on and endpoint. > > > + pci_set_word(dev->config + offset + PCI_ACS_CTRL, > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); > > The default values of the control register bits is zero, so we > shouldn't be setting it here and we should have a reset hook to clear > it. > > > + /* Make CTRL register writable */ > > + memset(dev->wmask + offset + PCI_ACS_CTRL, 0xff, 2); > > +} > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h > > index 5b82a0d..c2da148 100644 > > --- a/include/hw/pci/pcie.h > > +++ b/include/hw/pci/pcie.h > > @@ -129,6 +129,7 @@ void pcie_add_capability(PCIDevice *dev, > > void pcie_sync_bridge_lnk(PCIDevice *dev); > > > > void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn); > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz); > > void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num); > > void pcie_ats_init(PCIDevice *dev, uint16_t offset); > > > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h > > index ad4e780..5e7409c 100644 > > --- a/include/hw/pci/pcie_regs.h > > +++ b/include/hw/pci/pcie_regs.h > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth { > > PCI_ERR_COR_INTERNAL | \ > > PCI_ERR_COR_HL_OVERFLOW) > > > > +/* ACS */ > > +#define PCI_ACS_VER 0x2 > > +#define PCI_ACS_SIZEOF 8 > > + > > #endif /* QEMU_PCIE_REGS_H */ >
On Wed, 23 Jan 2019 20:14:07 +0100 Knut Omang <knut.omang@oracle.com> wrote: > On Wed, 2019-01-23 at 12:04 -0700, Alex Williamson wrote: > > On Wed, 23 Jan 2019 19:27:59 +0100 > > Knut Omang <knut.omang@oracle.com> wrote: > > > > > Add a helper function to add PCIe capability for Access Control Services (ACS) > > > ACS support in the associated root port is a prerequisite to be able to do useful > > > passthrough with VFIO without Alex Williamson's pcie_acs_override kernel patch. > > > > Define "useful". > > Hmm - just that without the patches, the root port itself > also gets assigned to the same group, which seemed problematic to me > (without any further testing than just binding/unbinding to VFIO) vfio-pci binding rules only apply to endpoints. A root port lacking ACS will include all devices downstream of it in the IOMMU group, and potentially sibling functions, and devices downstream of those, but it doesn't absolutely preclude L2 assignment, or L1 userspace usage, which is already widely used. It simply means that all the endpoints within that group need to be bound to vfio-pci and can only have a single owner. Thanks, Alex > > We can certainly still assign single function PFs to > > an L2 guest, or multi-function so long as all the functions are > > assigned. I won't deny that it's problematic, but it's a virtual > > topology that can be adjusted, so I think this is overstating things a > > bit. > > > > > Signed-off-by: Knut Omang <knut.omang@oracle.com> > > > --- > > > hw/pci/pcie.c | 14 ++++++++++++++ > > > include/hw/pci/pcie.h | 1 + > > > include/hw/pci/pcie_regs.h | 4 ++++ > > > 3 files changed, 19 insertions(+) > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > index 230478f..18feff5 100644 > > > --- a/hw/pci/pcie.c > > > +++ b/hw/pci/pcie.c > > > @@ -906,3 +906,17 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset) > > > > > > pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f); > > > } > > > + > > > +/* Add an ACS (Access Control Services) capability */ > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz) > > > +{ > > > + int ectrl_words = (egress_ctrl_vec_sz + 31) & ~31; > > > + pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, > > > + offset, PCI_ACS_SIZEOF + ectrl_words); > > > > The egress control vector is only valid if the egress control > > capability is enabled, which is not set below, so this just seems to > > waste config space and introduces a meaningless function arg. > > > > > + pci_set_word(dev->config + offset + PCI_ACS_CAP, > > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); > > > > Some of these bits are only valid for downstream ports, it would > > violate the spec to set them on and endpoint. > > > > > + pci_set_word(dev->config + offset + PCI_ACS_CTRL, > > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); > > > > The default values of the control register bits is zero, so we > > shouldn't be setting it here and we should have a reset hook to clear > > it. > > > > > + /* Make CTRL register writable */ > > > + memset(dev->wmask + offset + PCI_ACS_CTRL, 0xff, 2); > > > +} > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h > > > index 5b82a0d..c2da148 100644 > > > --- a/include/hw/pci/pcie.h > > > +++ b/include/hw/pci/pcie.h > > > @@ -129,6 +129,7 @@ void pcie_add_capability(PCIDevice *dev, > > > void pcie_sync_bridge_lnk(PCIDevice *dev); > > > > > > void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn); > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz); > > > void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num); > > > void pcie_ats_init(PCIDevice *dev, uint16_t offset); > > > > > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h > > > index ad4e780..5e7409c 100644 > > > --- a/include/hw/pci/pcie_regs.h > > > +++ b/include/hw/pci/pcie_regs.h > > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth { > > > PCI_ERR_COR_INTERNAL | \ > > > PCI_ERR_COR_HL_OVERFLOW) > > > > > > +/* ACS */ > > > +#define PCI_ACS_VER 0x2 > > > +#define PCI_ACS_SIZEOF 8 > > > + > > > #endif /* QEMU_PCIE_REGS_H */ > > >
On Wed, 2019-01-23 at 12:32 -0700, Alex Williamson wrote: > On Wed, 23 Jan 2019 20:14:07 +0100 > Knut Omang <knut.omang@oracle.com> wrote: > > > On Wed, 2019-01-23 at 12:04 -0700, Alex Williamson wrote: > > > On Wed, 23 Jan 2019 19:27:59 +0100 > > > Knut Omang <knut.omang@oracle.com> wrote: > > > > > > > Add a helper function to add PCIe capability for Access Control Services (ACS) > > > > ACS support in the associated root port is a prerequisite to be able to do useful > > > > passthrough with VFIO without Alex Williamson's pcie_acs_override kernel patch. > > > > > > Define "useful". > > > > Hmm - just that without the patches, the root port itself > > also gets assigned to the same group, which seemed problematic to me > > (without any further testing than just binding/unbinding to VFIO) > > vfio-pci binding rules only apply to endpoints. A root port lacking > ACS will include all devices downstream of it in the IOMMU group, and > potentially sibling functions, and devices downstream of those, but it > doesn't absolutely preclude L2 assignment, or L1 userspace usage, > which is already widely used. It simply means that all the endpoints > within that group need to be bound to vfio-pci and can only have a > single owner. Thanks, I see, that makes sense - I'll moderate my language! Thanks, Knut > > Alex > > > > We can certainly still assign single function PFs to > > > an L2 guest, or multi-function so long as all the functions are > > > assigned. I won't deny that it's problematic, but it's a virtual > > > topology that can be adjusted, so I think this is overstating things a > > > bit. > > > > > > > Signed-off-by: Knut Omang <knut.omang@oracle.com> > > > > --- > > > > hw/pci/pcie.c | 14 ++++++++++++++ > > > > include/hw/pci/pcie.h | 1 + > > > > include/hw/pci/pcie_regs.h | 4 ++++ > > > > 3 files changed, 19 insertions(+) > > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > > index 230478f..18feff5 100644 > > > > --- a/hw/pci/pcie.c > > > > +++ b/hw/pci/pcie.c > > > > @@ -906,3 +906,17 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset) > > > > > > > > pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f); > > > > } > > > > + > > > > +/* Add an ACS (Access Control Services) capability */ > > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz) > > > > +{ > > > > + int ectrl_words = (egress_ctrl_vec_sz + 31) & ~31; > > > > + pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, > > > > + offset, PCI_ACS_SIZEOF + ectrl_words); > > > > > > The egress control vector is only valid if the egress control > > > capability is enabled, which is not set below, so this just seems to > > > waste config space and introduces a meaningless function arg. > > > > > > > + pci_set_word(dev->config + offset + PCI_ACS_CAP, > > > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | > PCI_ACS_UF); > > > > > > Some of these bits are only valid for downstream ports, it would > > > violate the spec to set them on and endpoint. > > > > > > > + pci_set_word(dev->config + offset + PCI_ACS_CTRL, > > > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | > PCI_ACS_UF); > > > > > > The default values of the control register bits is zero, so we > > > shouldn't be setting it here and we should have a reset hook to clear > > > it. > > > > > > > + /* Make CTRL register writable */ > > > > + memset(dev->wmask + offset + PCI_ACS_CTRL, 0xff, 2); > > > > +} > > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h > > > > index 5b82a0d..c2da148 100644 > > > > --- a/include/hw/pci/pcie.h > > > > +++ b/include/hw/pci/pcie.h > > > > @@ -129,6 +129,7 @@ void pcie_add_capability(PCIDevice *dev, > > > > void pcie_sync_bridge_lnk(PCIDevice *dev); > > > > > > > > void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn); > > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz); > > > > void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num); > > > > void pcie_ats_init(PCIDevice *dev, uint16_t offset); > > > > > > > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h > > > > index ad4e780..5e7409c 100644 > > > > --- a/include/hw/pci/pcie_regs.h > > > > +++ b/include/hw/pci/pcie_regs.h > > > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth { > > > > PCI_ERR_COR_INTERNAL | \ > > > > PCI_ERR_COR_HL_OVERFLOW) > > > > > > > > +/* ACS */ > > > > +#define PCI_ACS_VER 0x2 > > > > +#define PCI_ACS_SIZEOF 8 > > > > + > > > > #endif /* QEMU_PCIE_REGS_H */ > > > > > >
On Wed, 2019-01-23 at 12:04 -0700, Alex Williamson wrote: > On Wed, 23 Jan 2019 19:27:59 +0100 > Knut Omang <knut.omang@oracle.com> wrote: > > > Add a helper function to add PCIe capability for Access Control Services (ACS) > > ACS support in the associated root port is a prerequisite to be able to do useful > > passthrough with VFIO without Alex Williamson's pcie_acs_override kernel patch. > > Define "useful". We can certainly still assign single function PFs to > an L2 guest, or multi-function so long as all the functions are > assigned. I won't deny that it's problematic, but it's a virtual > topology that can be adjusted, so I think this is overstating things a > bit. > > > Signed-off-by: Knut Omang <knut.omang@oracle.com> > > --- > > hw/pci/pcie.c | 14 ++++++++++++++ > > include/hw/pci/pcie.h | 1 + > > include/hw/pci/pcie_regs.h | 4 ++++ > > 3 files changed, 19 insertions(+) > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > index 230478f..18feff5 100644 > > --- a/hw/pci/pcie.c > > +++ b/hw/pci/pcie.c > > @@ -906,3 +906,17 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset) > > > > pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f); > > } > > + > > +/* Add an ACS (Access Control Services) capability */ > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz) > > +{ > > + int ectrl_words = (egress_ctrl_vec_sz + 31) & ~31; > > + pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, > > + offset, PCI_ACS_SIZEOF + ectrl_words); > > The egress control vector is only valid if the egress control > capability is enabled, which is not set below, so this just seems to > waste config space and introduces a meaningless function arg. I think my intention way back when I implemented this was to provide it as a skeleton for further detailing later, I use it with ectrl_words = 0 in the root port. > > + pci_set_word(dev->config + offset + PCI_ACS_CAP, > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); > > Some of these bits are only valid for downstream ports, it would > violate the spec to set them on and endpoint. I must admit I haven't really dived deep here - I just set the cap bits that one of my servers sets for it's root ports - this is the one I used as model: 00:02.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22) (prog-if 00 [Normal decode]) 00:02.0 0604: 8086:3409 (rev 22) > > + pci_set_word(dev->config + offset + PCI_ACS_CTRL, > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); > > The default values of the control register bits is zero, so we > shouldn't be setting it here and we should have a reset hook to clear > it. I agree - I'll implement it, thanks! Knut > > > + /* Make CTRL register writable */ > > + memset(dev->wmask + offset + PCI_ACS_CTRL, 0xff, 2); > > +} > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h > > index 5b82a0d..c2da148 100644 > > --- a/include/hw/pci/pcie.h > > +++ b/include/hw/pci/pcie.h > > @@ -129,6 +129,7 @@ void pcie_add_capability(PCIDevice *dev, > > void pcie_sync_bridge_lnk(PCIDevice *dev); > > > > void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn); > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz); > > void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num); > > void pcie_ats_init(PCIDevice *dev, uint16_t offset); > > > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h > > index ad4e780..5e7409c 100644 > > --- a/include/hw/pci/pcie_regs.h > > +++ b/include/hw/pci/pcie_regs.h > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth { > > PCI_ERR_COR_INTERNAL | \ > > PCI_ERR_COR_HL_OVERFLOW) > > > > +/* ACS */ > > +#define PCI_ACS_VER 0x2 > > +#define PCI_ACS_SIZEOF 8 > > + > > #endif /* QEMU_PCIE_REGS_H */ >
On Wed, 23 Jan 2019 20:46:14 +0100 Knut Omang <knut.omang@oracle.com> wrote: > On Wed, 2019-01-23 at 12:04 -0700, Alex Williamson wrote: > > On Wed, 23 Jan 2019 19:27:59 +0100 > > Knut Omang <knut.omang@oracle.com> wrote: > > > > > Add a helper function to add PCIe capability for Access Control Services (ACS) > > > ACS support in the associated root port is a prerequisite to be able to do useful > > > passthrough with VFIO without Alex Williamson's pcie_acs_override kernel patch. > > > > Define "useful". We can certainly still assign single function PFs to > > an L2 guest, or multi-function so long as all the functions are > > assigned. I won't deny that it's problematic, but it's a virtual > > topology that can be adjusted, so I think this is overstating things a > > bit. > > > > > Signed-off-by: Knut Omang <knut.omang@oracle.com> > > > --- > > > hw/pci/pcie.c | 14 ++++++++++++++ > > > include/hw/pci/pcie.h | 1 + > > > include/hw/pci/pcie_regs.h | 4 ++++ > > > 3 files changed, 19 insertions(+) > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > index 230478f..18feff5 100644 > > > --- a/hw/pci/pcie.c > > > +++ b/hw/pci/pcie.c > > > @@ -906,3 +906,17 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset) > > > > > > pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f); > > > } > > > + > > > +/* Add an ACS (Access Control Services) capability */ > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz) > > > +{ > > > + int ectrl_words = (egress_ctrl_vec_sz + 31) & ~31; > > > + pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, > > > + offset, PCI_ACS_SIZEOF + ectrl_words); > > > > The egress control vector is only valid if the egress control > > capability is enabled, which is not set below, so this just seems to > > waste config space and introduces a meaningless function arg. > > I think my intention way back when I implemented this was to provide it as a skeleton > for further detailing later, I use it with ectrl_words = 0 in the root port. > > > > + pci_set_word(dev->config + offset + PCI_ACS_CAP, > > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); > > > > Some of these bits are only valid for downstream ports, it would > > violate the spec to set them on and endpoint. > > I must admit I haven't really dived deep here - I just set the cap bits that one of my > servers sets for it's root ports - this is the one I used as model: > > 00:02.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev > 22) (prog-if 00 [Normal decode]) > 00:02.0 0604: 8086:3409 (rev 22) > > > > + pci_set_word(dev->config + offset + PCI_ACS_CTRL, > > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); > > > > The default values of the control register bits is zero, so we > > shouldn't be setting it here and we should have a reset hook to clear > > it. > > I agree - I'll implement it, > thanks! > > Knut > > > > > > + /* Make CTRL register writable */ > > > + memset(dev->wmask + offset + PCI_ACS_CTRL, 0xff, 2); While you're at it, it doesn't make sense to set unimplemented control bits as writable, this should match the bits set in the capabilities register. Thanks, Alex > > > +} > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h > > > index 5b82a0d..c2da148 100644 > > > --- a/include/hw/pci/pcie.h > > > +++ b/include/hw/pci/pcie.h > > > @@ -129,6 +129,7 @@ void pcie_add_capability(PCIDevice *dev, > > > void pcie_sync_bridge_lnk(PCIDevice *dev); > > > > > > void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn); > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz); > > > void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num); > > > void pcie_ats_init(PCIDevice *dev, uint16_t offset); > > > > > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h > > > index ad4e780..5e7409c 100644 > > > --- a/include/hw/pci/pcie_regs.h > > > +++ b/include/hw/pci/pcie_regs.h > > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth { > > > PCI_ERR_COR_INTERNAL | \ > > > PCI_ERR_COR_HL_OVERFLOW) > > > > > > +/* ACS */ > > > +#define PCI_ACS_VER 0x2 > > > +#define PCI_ACS_SIZEOF 8 > > > + > > > #endif /* QEMU_PCIE_REGS_H */ > > > >
On Wed, 2019-01-23 at 12:56 -0700, Alex Williamson wrote: > On Wed, 23 Jan 2019 20:46:14 +0100 > Knut Omang <knut.omang@oracle.com> wrote: > > > On Wed, 2019-01-23 at 12:04 -0700, Alex Williamson wrote: > > > On Wed, 23 Jan 2019 19:27:59 +0100 > > > Knut Omang <knut.omang@oracle.com> wrote: > > > > > > > Add a helper function to add PCIe capability for Access Control Services (ACS) > > > > ACS support in the associated root port is a prerequisite to be able to do useful > > > > passthrough with VFIO without Alex Williamson's pcie_acs_override kernel patch. > > > > > > Define "useful". We can certainly still assign single function PFs to > > > an L2 guest, or multi-function so long as all the functions are > > > assigned. I won't deny that it's problematic, but it's a virtual > > > topology that can be adjusted, so I think this is overstating things a > > > bit. > > > > > > > Signed-off-by: Knut Omang <knut.omang@oracle.com> > > > > --- > > > > hw/pci/pcie.c | 14 ++++++++++++++ > > > > include/hw/pci/pcie.h | 1 + > > > > include/hw/pci/pcie_regs.h | 4 ++++ > > > > 3 files changed, 19 insertions(+) > > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > > index 230478f..18feff5 100644 > > > > --- a/hw/pci/pcie.c > > > > +++ b/hw/pci/pcie.c > > > > @@ -906,3 +906,17 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset) > > > > > > > > pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f); > > > > } > > > > + > > > > +/* Add an ACS (Access Control Services) capability */ > > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz) > > > > +{ > > > > + int ectrl_words = (egress_ctrl_vec_sz + 31) & ~31; > > > > + pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, > > > > + offset, PCI_ACS_SIZEOF + ectrl_words); > > > > > > The egress control vector is only valid if the egress control > > > capability is enabled, which is not set below, so this just seems to > > > waste config space and introduces a meaningless function arg. > > > > I think my intention way back when I implemented this was to provide it as a skeleton > > for further detailing later, I use it with ectrl_words = 0 in the root port. > > > > > > + pci_set_word(dev->config + offset + PCI_ACS_CAP, > > > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | > PCI_ACS_UF); > > > > > > Some of these bits are only valid for downstream ports, it would > > > violate the spec to set them on and endpoint. > > > > I must admit I haven't really dived deep here - I just set the cap bits that one of my > > servers sets for it's root ports - this is the one I used as model: > > > > 00:02.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 > (rev > > 22) (prog-if 00 [Normal decode]) > > 00:02.0 0604: 8086:3409 (rev 22) > > > > > > + pci_set_word(dev->config + offset + PCI_ACS_CTRL, > > > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | > PCI_ACS_UF); > > > > > > The default values of the control register bits is zero, so we > > > shouldn't be setting it here and we should have a reset hook to clear > > > it. > > > > I agree - I'll implement it, > > thanks! > > > > Knut > > > > > > > > > + /* Make CTRL register writable */ > > > > + memset(dev->wmask + offset + PCI_ACS_CTRL, 0xff, 2); > > While you're at it, it doesn't make sense to set unimplemented control > bits as writable, this should match the bits set in the capabilities > register. Thanks, Good point, I'll fix that as well, Thanks! Knut > > Alex > > > > > +} > > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h > > > > index 5b82a0d..c2da148 100644 > > > > --- a/include/hw/pci/pcie.h > > > > +++ b/include/hw/pci/pcie.h > > > > @@ -129,6 +129,7 @@ void pcie_add_capability(PCIDevice *dev, > > > > void pcie_sync_bridge_lnk(PCIDevice *dev); > > > > > > > > void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn); > > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz); > > > > void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num); > > > > void pcie_ats_init(PCIDevice *dev, uint16_t offset); > > > > > > > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h > > > > index ad4e780..5e7409c 100644 > > > > --- a/include/hw/pci/pcie_regs.h > > > > +++ b/include/hw/pci/pcie_regs.h > > > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth { > > > > PCI_ERR_COR_INTERNAL | \ > > > > PCI_ERR_COR_HL_OVERFLOW) > > > > > > > > +/* ACS */ > > > > +#define PCI_ACS_VER 0x2 > > > > +#define PCI_ACS_SIZEOF 8 > > > > + > > > > #endif /* QEMU_PCIE_REGS_H */ > > > > > > > >
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 230478f..18feff5 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -906,3 +906,17 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset) pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f); } + +/* Add an ACS (Access Control Services) capability */ +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz) +{ + int ectrl_words = (egress_ctrl_vec_sz + 31) & ~31; + pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, + offset, PCI_ACS_SIZEOF + ectrl_words); + pci_set_word(dev->config + offset + PCI_ACS_CAP, + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); + pci_set_word(dev->config + offset + PCI_ACS_CTRL, + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); + /* Make CTRL register writable */ + memset(dev->wmask + offset + PCI_ACS_CTRL, 0xff, 2); +} diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h index 5b82a0d..c2da148 100644 --- a/include/hw/pci/pcie.h +++ b/include/hw/pci/pcie.h @@ -129,6 +129,7 @@ void pcie_add_capability(PCIDevice *dev, void pcie_sync_bridge_lnk(PCIDevice *dev); void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn); +void pcie_acs_init(PCIDevice *dev, uint16_t offset, uint8_t egress_ctrl_vec_sz); void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num); void pcie_ats_init(PCIDevice *dev, uint16_t offset); diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h index ad4e780..5e7409c 100644 --- a/include/hw/pci/pcie_regs.h +++ b/include/hw/pci/pcie_regs.h @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth { PCI_ERR_COR_INTERNAL | \ PCI_ERR_COR_HL_OVERFLOW) +/* ACS */ +#define PCI_ACS_VER 0x2 +#define PCI_ACS_SIZEOF 8 + #endif /* QEMU_PCIE_REGS_H */
Add a helper function to add PCIe capability for Access Control Services (ACS) ACS support in the associated root port is a prerequisite to be able to do useful passthrough with VFIO without Alex Williamson's pcie_acs_override kernel patch. Signed-off-by: Knut Omang <knut.omang@oracle.com> --- hw/pci/pcie.c | 14 ++++++++++++++ include/hw/pci/pcie.h | 1 + include/hw/pci/pcie_regs.h | 4 ++++ 3 files changed, 19 insertions(+)