Message ID | 7294a70c-0089-e375-bb5a-bf9544d4f251@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RFC,v2] vPCI: account for hidden devices | expand |
On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: > Hidden devices (e.g. an add-in PCI serial card used for Xen's serial > console) are associated with DomXEN, not Dom0. This means that while > looking for overlapping BARs such devices cannot be found on Dom0's list > of devices; DomXEN's list also needs to be scanned. > > Suppress vPCI init altogether for r/o devices (which constitute a subset > of hidden ones). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > RFC: The modify_bars() change is intentionally mis-formatted, as the > necessary re-indentation would make the diff difficult to read. At > this point I'd merely like to gather input towards possible better > approaches to solve the issue (not the least because quite possibly > there are further places needing changing). I think we should also handle the case of !pdev->vpci for the hardware doamin, as it's allowed for the vpci_add_handlers() call in setup_one_hwdom_device() to fail and the device wold still be assigned to the hardware domain. I can submit that as a separate bugfix, as it's already an issue without taking r/o or hidden devices into account. > > RFC: Whether to actually suppress vPCI init is up for debate; adding the > extra logic is following Roger's suggestion (I'm not convinced it is > useful to have). If we want to keep that, a 2nd question would be > whether to keep it in vpci_add_handlers(): Both of its callers (can) > have a struct pci_seg readily available, so checking ->ro_map at the > call sites would be easier. But that would duplicate the logic into the callers, which doesn't seem very nice to me, and makes it easier to make mistakes if further callers are added and r/o is not checked there. > RFC: _setup_hwdom_pci_devices()' loop may want splitting: For > modify_bars() to consistently respect BARs of hidden devices while > setting up "normal" ones (i.e. to avoid as much as possible the > "continue" path introduced here), setting up of the former may want > doing first. But BARs of hidden devices should be mapped into dom0 physmap? And hence doing those before or after normal devices will lead to the same result. The loop in modify_bars() is there to avoid attempting to map the same range twice, or to unmap a range while there are devices still using it, but the unmap is never done during initial device setup. > > RFC: vpci_write()'s check of the r/o map may want moving out of mainline > code, into the case dealing with !pdev->vpci. Indeed. > --- > v2: Extend existing comment. Relax assertion. Don't initialize vPCI for > r/o devices. > > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ > struct vpci_header *header = &pdev->vpci->header; > struct rangeset *mem = rangeset_new(NULL, NULL, 0); > struct pci_dev *tmp, *dev = NULL; > + const struct domain *d; > const struct vpci_msix *msix = pdev->vpci->msix; > unsigned int i; > int rc; > @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ > > /* > * Check for overlaps with other BARs. Note that only BARs that are > - * currently mapped (enabled) are checked for overlaps. > + * currently mapped (enabled) are checked for overlaps. Note also that > + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. > */ > - for_each_pdev ( pdev->domain, tmp ) > +for ( d = pdev->domain; ; d = dom_xen ) {//todo > + for_each_pdev ( d, tmp ) > { > if ( tmp == pdev ) > { > @@ -304,6 +307,7 @@ static int modify_bars(const struct pci_ > */ > continue; > } > +if ( !tmp->vpci ) { ASSERT(d == dom_xen); continue; }//todo > > for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) > { > @@ -330,6 +334,7 @@ static int modify_bars(const struct pci_ > } > } > } > +if ( !is_hardware_domain(d) ) break; }//todo AFAICT in order to support hidden devices there's one bit missing in vpci_{write,read}(): the call to pci_get_pdev() should be also done against dom_xen when handling accesses originated from the hardware domain. One further question is whether we want to map BARs of r/o devices into the hardware domain physmap. Not sure that's very helpful, as dom0 won't be allowed to modify any of the config space bits of those devices, so even attempts to size the BARs will fail. I wonder what kind of issues this can cause to dom0 OSes. Thanks, Roger.
On 24.05.2023 16:22, Roger Pau Monné wrote: > On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: >> Hidden devices (e.g. an add-in PCI serial card used for Xen's serial >> console) are associated with DomXEN, not Dom0. This means that while >> looking for overlapping BARs such devices cannot be found on Dom0's list >> of devices; DomXEN's list also needs to be scanned. >> >> Suppress vPCI init altogether for r/o devices (which constitute a subset >> of hidden ones). >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> RFC: The modify_bars() change is intentionally mis-formatted, as the >> necessary re-indentation would make the diff difficult to read. At >> this point I'd merely like to gather input towards possible better >> approaches to solve the issue (not the least because quite possibly >> there are further places needing changing). > > I think we should also handle the case of !pdev->vpci for the hardware > doamin, as it's allowed for the vpci_add_handlers() call in > setup_one_hwdom_device() to fail and the device wold still be assigned > to the hardware domain. > > I can submit that as a separate bugfix, as it's already an issue > without taking r/o or hidden devices into account. Yeah, I think that wants dealing with separately. I'm not actually sure though that "is allowed to fail" is proper behavior ... But anyway - I take this as you agreeing to go that route, which is the prereq for me to actually make a well-formed patch. Please shout soon if that's a misunderstanding of mine. >> RFC: Whether to actually suppress vPCI init is up for debate; adding the >> extra logic is following Roger's suggestion (I'm not convinced it is >> useful to have). If we want to keep that, a 2nd question would be >> whether to keep it in vpci_add_handlers(): Both of its callers (can) >> have a struct pci_seg readily available, so checking ->ro_map at the >> call sites would be easier. > > But that would duplicate the logic into the callers, which doesn't > seem very nice to me, and makes it easier to make mistakes if further > callers are added and r/o is not checked there. Right, hence why I didn't do it the alternative way from the beginning. Both approaches have a pro and a con. But prior to answering the 2nd question, what about the 1st one? Is it really worth to have the extra logic? >> RFC: _setup_hwdom_pci_devices()' loop may want splitting: For >> modify_bars() to consistently respect BARs of hidden devices while >> setting up "normal" ones (i.e. to avoid as much as possible the >> "continue" path introduced here), setting up of the former may want >> doing first. > > But BARs of hidden devices should be mapped into dom0 physmap? Yes. > And > hence doing those before or after normal devices will lead to the same > result. The loop in modify_bars() is there to avoid attempting to map > the same range twice, or to unmap a range while there are devices > still using it, but the unmap is never done during initial device > setup. Okay, so maybe indeed there's no effect on the final result. Yet still the anomaly bothered me when seeing it in the logs (it actually mislead me initially in my conclusions as to what was actually going on), when I added a printk() to that new "continue" path. We would skip hidden devices up until them getting initialized themselves. There would be less skipping if all (there aren't going to be many) DomXEN devices were initialized first. >> RFC: vpci_write()'s check of the r/o map may want moving out of mainline >> code, into the case dealing with !pdev->vpci. > > Indeed. Will extend the patch accordingly then >> --- a/xen/drivers/vpci/header.c >> +++ b/xen/drivers/vpci/header.c >> @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ >> struct vpci_header *header = &pdev->vpci->header; >> struct rangeset *mem = rangeset_new(NULL, NULL, 0); >> struct pci_dev *tmp, *dev = NULL; >> + const struct domain *d; >> const struct vpci_msix *msix = pdev->vpci->msix; >> unsigned int i; >> int rc; >> @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ >> >> /* >> * Check for overlaps with other BARs. Note that only BARs that are >> - * currently mapped (enabled) are checked for overlaps. >> + * currently mapped (enabled) are checked for overlaps. Note also that >> + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. >> */ >> - for_each_pdev ( pdev->domain, tmp ) >> +for ( d = pdev->domain; ; d = dom_xen ) {//todo >> + for_each_pdev ( d, tmp ) >> { >> if ( tmp == pdev ) >> { >> @@ -304,6 +307,7 @@ static int modify_bars(const struct pci_ >> */ >> continue; >> } >> +if ( !tmp->vpci ) { ASSERT(d == dom_xen); continue; }//todo >> >> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) >> { >> @@ -330,6 +334,7 @@ static int modify_bars(const struct pci_ >> } >> } >> } >> +if ( !is_hardware_domain(d) ) break; }//todo > > AFAICT in order to support hidden devices there's one bit missing in > vpci_{write,read}(): the call to pci_get_pdev() should be also done > against dom_xen when handling accesses originated from the hardware > domain. Hmm, right. Without that we'd always take the vpci_{read,write}_hw() path. > One further question is whether we want to map BARs of r/o devices > into the hardware domain physmap. Not sure that's very helpful, as > dom0 won't be allowed to modify any of the config space bits of those > devices, so even attempts to size the BARs will fail. I wonder what > kind of issues this can cause to dom0 OSes. This is what Linux (6.3) says: pci 0000:02:00.1: [Firmware Bug]: reg 0x10: invalid BAR (can't size) pci 0000:02:00.1: [Firmware Bug]: reg 0x14: invalid BAR (can't size) pci 0000:02:00.1: [Firmware Bug]: reg 0x24: invalid BAR (can't size) Jan
On Wed, May 24, 2023 at 04:44:49PM +0200, Jan Beulich wrote: > On 24.05.2023 16:22, Roger Pau Monné wrote: > > On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: > >> Hidden devices (e.g. an add-in PCI serial card used for Xen's serial > >> console) are associated with DomXEN, not Dom0. This means that while > >> looking for overlapping BARs such devices cannot be found on Dom0's list > >> of devices; DomXEN's list also needs to be scanned. > >> > >> Suppress vPCI init altogether for r/o devices (which constitute a subset > >> of hidden ones). > >> > >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > >> --- > >> RFC: The modify_bars() change is intentionally mis-formatted, as the > >> necessary re-indentation would make the diff difficult to read. At > >> this point I'd merely like to gather input towards possible better > >> approaches to solve the issue (not the least because quite possibly > >> there are further places needing changing). > > > > I think we should also handle the case of !pdev->vpci for the hardware > > doamin, as it's allowed for the vpci_add_handlers() call in > > setup_one_hwdom_device() to fail and the device wold still be assigned > > to the hardware domain. > > > > I can submit that as a separate bugfix, as it's already an issue > > without taking r/o or hidden devices into account. > > Yeah, I think that wants dealing with separately. I'm not actually sure > though that "is allowed to fail" is proper behavior ... One better option would be to mark the device r/o if the vpci_add_handlers() call fails in setup_one_hwdom_device(), as that would prevent dom0 from accessing native MSI(-X) capabilities. > But anyway - I take this as you agreeing to go that route, which is the > prereq for me to actually make a well-formed patch. Please shout soon > if that's a misunderstanding of mine. Sure, will send the fix later today or tomorrow so that you can rebase. > >> RFC: Whether to actually suppress vPCI init is up for debate; adding the > >> extra logic is following Roger's suggestion (I'm not convinced it is > >> useful to have). If we want to keep that, a 2nd question would be > >> whether to keep it in vpci_add_handlers(): Both of its callers (can) > >> have a struct pci_seg readily available, so checking ->ro_map at the > >> call sites would be easier. > > > > But that would duplicate the logic into the callers, which doesn't > > seem very nice to me, and makes it easier to make mistakes if further > > callers are added and r/o is not checked there. > > Right, hence why I didn't do it the alternative way from the beginning. > Both approaches have a pro and a con. > > But prior to answering the 2nd question, what about the 1st one? Is it > really worth to have the extra logic? Why would we want to do all the vPCI initialization for r/o devices? None of the handlers setup will get called, so I see it the other way around: not short-circuiting vpci_add_handlers() for r/o devices is a waste of time and resources because none of the setup state would be used anyway. > > And > > hence doing those before or after normal devices will lead to the same > > result. The loop in modify_bars() is there to avoid attempting to map > > the same range twice, or to unmap a range while there are devices > > still using it, but the unmap is never done during initial device > > setup. > > Okay, so maybe indeed there's no effect on the final result. Yet still > the anomaly bothered me when seeing it in the logs (it actually mislead > me initially in my conclusions as to what was actually going on), when > I added a printk() to that new "continue" path. We would skip hidden > devices up until them getting initialized themselves. There would be > less skipping if all (there aren't going to be many) DomXEN devices > were initialized first. I think that just makes the logic more complicated for no reason, the only reason you don't see this with devices assigned to dom0 is because device addition is interleaved with calls to vpci_add_handlers(). However it would also be valid to add all devices to dom0 and then call vpci_add_handlers() for each one of them. > > One further question is whether we want to map BARs of r/o devices > > into the hardware domain physmap. Not sure that's very helpful, as > > dom0 won't be allowed to modify any of the config space bits of those > > devices, so even attempts to size the BARs will fail. I wonder what > > kind of issues this can cause to dom0 OSes. > > This is what Linux (6.3) says: > > pci 0000:02:00.1: [Firmware Bug]: reg 0x10: invalid BAR (can't size) > pci 0000:02:00.1: [Firmware Bug]: reg 0x14: invalid BAR (can't size) > pci 0000:02:00.1: [Firmware Bug]: reg 0x24: invalid BAR (can't size) OK, seems fine then. There's no point in mapping the BARs of r/o devices to the dom0 physmap, as the domain is unable to size them in the first place. Thanks, Roger.
On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ > struct vpci_header *header = &pdev->vpci->header; > struct rangeset *mem = rangeset_new(NULL, NULL, 0); > struct pci_dev *tmp, *dev = NULL; > + const struct domain *d; > const struct vpci_msix *msix = pdev->vpci->msix; > unsigned int i; > int rc; > @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ > > /* > * Check for overlaps with other BARs. Note that only BARs that are > - * currently mapped (enabled) are checked for overlaps. > + * currently mapped (enabled) are checked for overlaps. Note also that > + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. > */ > - for_each_pdev ( pdev->domain, tmp ) > +for ( d = pdev->domain; ; d = dom_xen ) {//todo Looking at this again, I think this is slightly more complex, as during runtime dom0 will get here with pdev->domain == hardware_domain OR dom_xen, and hence you also need to account that devices that have pdev->domain == dom_xen need to iterate over devices that belong to the hardware_domain, ie: for ( d = pdev->domain; ; d = (pdev->domain == dom_xen) ? hardware_domain : dom_xen ) And we likely want to limit this to devices that belong to the hardware_domain or to dom_xen (in preparation for vPCI being used for domUs). Thanks, Roger.
On Wed, 24 May 2023, Jan Beulich wrote: > >> RFC: _setup_hwdom_pci_devices()' loop may want splitting: For > >> modify_bars() to consistently respect BARs of hidden devices while > >> setting up "normal" ones (i.e. to avoid as much as possible the > >> "continue" path introduced here), setting up of the former may want > >> doing first. > > > > But BARs of hidden devices should be mapped into dom0 physmap? > > Yes. The BARs would be mapped read-only (not read-write), right? Otherwise we let dom0 access devices that belong to Xen, which doesn't seem like a good idea. But even if we map the BARs read-only, what is the benefit of mapping them to Dom0? If Dom0 loads a driver for it and the driver wants to initialize the device, the driver will crash because the MMIO region is read-only instead of read-write, right? How does this device hiding work for dom0? How does dom0 know not to access a device that is present on the PCI bus but is used by Xen? The reason why I was suggesting to hide the device completely in the past was that I assumed we had to hide the device (make it "disappear" on the PCI bus) otherwise Dom0 would access it. Is there another way to mark a PCI devices as "inaccessible" or "disabled"?
On Wed, 24 May 2023, Jan Beulich wrote: > Hidden devices (e.g. an add-in PCI serial card used for Xen's serial > console) are associated with DomXEN, not Dom0. This means that while > looking for overlapping BARs such devices cannot be found on Dom0's list > of devices; DomXEN's list also needs to be scanned. > > Suppress vPCI init altogether for r/o devices (which constitute a subset > of hidden ones). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> This works! Ship it! :-) https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4346896950 I understand this is an RFC and there are still open questions, but thank you for addressing the issue quickly. > --- > RFC: The modify_bars() change is intentionally mis-formatted, as the > necessary re-indentation would make the diff difficult to read. At > this point I'd merely like to gather input towards possible better > approaches to solve the issue (not the least because quite possibly > there are further places needing changing). > > RFC: Whether to actually suppress vPCI init is up for debate; adding the > extra logic is following Roger's suggestion (I'm not convinced it is > useful to have). If we want to keep that, a 2nd question would be > whether to keep it in vpci_add_handlers(): Both of its callers (can) > have a struct pci_seg readily available, so checking ->ro_map at the > call sites would be easier. > > RFC: _setup_hwdom_pci_devices()' loop may want splitting: For > modify_bars() to consistently respect BARs of hidden devices while > setting up "normal" ones (i.e. to avoid as much as possible the > "continue" path introduced here), setting up of the former may want > doing first. > > RFC: vpci_write()'s check of the r/o map may want moving out of mainline > code, into the case dealing with !pdev->vpci. > --- > v2: Extend existing comment. Relax assertion. Don't initialize vPCI for > r/o devices. > > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ > struct vpci_header *header = &pdev->vpci->header; > struct rangeset *mem = rangeset_new(NULL, NULL, 0); > struct pci_dev *tmp, *dev = NULL; > + const struct domain *d; > const struct vpci_msix *msix = pdev->vpci->msix; > unsigned int i; > int rc; > @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ > > /* > * Check for overlaps with other BARs. Note that only BARs that are > - * currently mapped (enabled) are checked for overlaps. > + * currently mapped (enabled) are checked for overlaps. Note also that > + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. > */ > - for_each_pdev ( pdev->domain, tmp ) > +for ( d = pdev->domain; ; d = dom_xen ) {//todo > + for_each_pdev ( d, tmp ) > { > if ( tmp == pdev ) > { > @@ -304,6 +307,7 @@ static int modify_bars(const struct pci_ > */ > continue; > } > +if ( !tmp->vpci ) { ASSERT(d == dom_xen); continue; }//todo > > for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) > { > @@ -330,6 +334,7 @@ static int modify_bars(const struct pci_ > } > } > } > +if ( !is_hardware_domain(d) ) break; }//todo > > ASSERT(dev); > > --- a/xen/drivers/vpci/vpci.c > +++ b/xen/drivers/vpci/vpci.c > @@ -70,6 +70,7 @@ void vpci_remove_device(struct pci_dev * > int vpci_add_handlers(struct pci_dev *pdev) > { > unsigned int i; > + const unsigned long *ro_map; > int rc = 0; > > if ( !has_vpci(pdev->domain) ) > @@ -78,6 +79,11 @@ int vpci_add_handlers(struct pci_dev *pd > /* We should not get here twice for the same device. */ > ASSERT(!pdev->vpci); > > + /* No vPCI for r/o devices. */ > + ro_map = pci_get_ro_map(pdev->sbdf.seg); > + if ( ro_map && test_bit(pdev->sbdf.bdf, ro_map) ) > + return 0; > + > pdev->vpci = xzalloc(struct vpci); > if ( !pdev->vpci ) > return -ENOMEM; >
On 25.05.2023 01:37, Stefano Stabellini wrote: > On Wed, 24 May 2023, Jan Beulich wrote: >>>> RFC: _setup_hwdom_pci_devices()' loop may want splitting: For >>>> modify_bars() to consistently respect BARs of hidden devices while >>>> setting up "normal" ones (i.e. to avoid as much as possible the >>>> "continue" path introduced here), setting up of the former may want >>>> doing first. >>> >>> But BARs of hidden devices should be mapped into dom0 physmap? >> >> Yes. > > The BARs would be mapped read-only (not read-write), right? Otherwise we > let dom0 access devices that belong to Xen, which doesn't seem like a > good idea. > > But even if we map the BARs read-only, what is the benefit of mapping > them to Dom0? If Dom0 loads a driver for it and the driver wants to > initialize the device, the driver will crash because the MMIO region is > read-only instead of read-write, right? > > How does this device hiding work for dom0? How does dom0 know not to > access a device that is present on the PCI bus but is used by Xen? None of these are new questions - this has all been this way for PV Dom0, and so far we've limped along quite okay. That's not to say that we shouldn't improve things if we can, but that first requires ideas as to how. > The reason why I was suggesting to hide the device completely in the > past was that I assumed we had to hide the device (make it "disappear" > on the PCI bus) otherwise Dom0 would access it. Is there another way to > mark a PCI devices as "inaccessible" or "disabled"? I'm no aware of any. Jan
On Wed, May 24, 2023 at 04:37:42PM -0700, Stefano Stabellini wrote: > On Wed, 24 May 2023, Jan Beulich wrote: > > >> RFC: _setup_hwdom_pci_devices()' loop may want splitting: For > > >> modify_bars() to consistently respect BARs of hidden devices while > > >> setting up "normal" ones (i.e. to avoid as much as possible the > > >> "continue" path introduced here), setting up of the former may want > > >> doing first. > > > > > > But BARs of hidden devices should be mapped into dom0 physmap? > > > > Yes. > > The BARs would be mapped read-only (not read-write), right? Otherwise we > let dom0 access devices that belong to Xen, which doesn't seem like a > good idea. It's my understanding that Xen will mark any regions of the BARs in-use by the hypervisor as read-only, but the rest will be writable. > But even if we map the BARs read-only, what is the benefit of mapping > them to Dom0? If Dom0 loads a driver for it and the driver wants to > initialize the device, the driver will crash because the MMIO region is > read-only instead of read-write, right? I think USB is a good example, Xen uses the debug functionality of EHCI/XHCI, but by marking the device as hidden it allows dom0 to also make use of any remaining USB ports. For r/o devices I don't see much point in mapping the BARs to dom0, as dom0 is not allowed to size them in the first place, so will be able to detect the BAR position, but not the BAR size. I guess this could cause issues in the future if dom0 starts repositioning BARs, but let's leave that aside. > How does this device hiding work for dom0? How does dom0 know not to > access a device that is present on the PCI bus but is used by Xen? I think the objective for hidden is to allow dom0 to use the device, but Xen should protect any MMIO region it doesn't want dom0 to modify. > The reason why I was suggesting to hide the device completely in the > past was that I assumed we had to hide the device (make it "disappear" > on the PCI bus) otherwise Dom0 would access it. Is there another way to > mark a PCI devices as "inaccessible" or "disabled"? I'm not aware of anything else, short of using stuff like the ACPI STAO and reporting disabled devices there in addition of marking their config space as r/o. Thanks, Roger.
On 24.05.2023 17:33, Roger Pau Monné wrote: > On Wed, May 24, 2023 at 04:44:49PM +0200, Jan Beulich wrote: >> On 24.05.2023 16:22, Roger Pau Monné wrote: >>> On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: >>>> Hidden devices (e.g. an add-in PCI serial card used for Xen's serial >>>> console) are associated with DomXEN, not Dom0. This means that while >>>> looking for overlapping BARs such devices cannot be found on Dom0's list >>>> of devices; DomXEN's list also needs to be scanned. >>>> >>>> Suppress vPCI init altogether for r/o devices (which constitute a subset >>>> of hidden ones). >>>> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>> --- >>>> RFC: The modify_bars() change is intentionally mis-formatted, as the >>>> necessary re-indentation would make the diff difficult to read. At >>>> this point I'd merely like to gather input towards possible better >>>> approaches to solve the issue (not the least because quite possibly >>>> there are further places needing changing). >>> >>> I think we should also handle the case of !pdev->vpci for the hardware >>> doamin, as it's allowed for the vpci_add_handlers() call in >>> setup_one_hwdom_device() to fail and the device wold still be assigned >>> to the hardware domain. >>> >>> I can submit that as a separate bugfix, as it's already an issue >>> without taking r/o or hidden devices into account. >> >> Yeah, I think that wants dealing with separately. I'm not actually sure >> though that "is allowed to fail" is proper behavior ... > > One better option would be to mark the device r/o if the > vpci_add_handlers() call fails in setup_one_hwdom_device(), as that > would prevent dom0 from accessing native MSI(-X) capabilities. Perhaps, but again in a separate patch. >>>> RFC: Whether to actually suppress vPCI init is up for debate; adding the >>>> extra logic is following Roger's suggestion (I'm not convinced it is >>>> useful to have). If we want to keep that, a 2nd question would be >>>> whether to keep it in vpci_add_handlers(): Both of its callers (can) >>>> have a struct pci_seg readily available, so checking ->ro_map at the >>>> call sites would be easier. >>> >>> But that would duplicate the logic into the callers, which doesn't >>> seem very nice to me, and makes it easier to make mistakes if further >>> callers are added and r/o is not checked there. >> >> Right, hence why I didn't do it the alternative way from the beginning. >> Both approaches have a pro and a con. >> >> But prior to answering the 2nd question, what about the 1st one? Is it >> really worth to have the extra logic? > > Why would we want to do all the vPCI initialization for r/o devices? > None of the handlers setup will get called, so I see it the other way > around: not short-circuiting vpci_add_handlers() for r/o devices is a > waste of time and resources because none of the setup state would be > used anyway. Hmm, yes, that's a valid way of looking at (and justifying) it. >>> And >>> hence doing those before or after normal devices will lead to the same >>> result. The loop in modify_bars() is there to avoid attempting to map >>> the same range twice, or to unmap a range while there are devices >>> still using it, but the unmap is never done during initial device >>> setup. >> >> Okay, so maybe indeed there's no effect on the final result. Yet still >> the anomaly bothered me when seeing it in the logs (it actually mislead >> me initially in my conclusions as to what was actually going on), when >> I added a printk() to that new "continue" path. We would skip hidden >> devices up until them getting initialized themselves. There would be >> less skipping if all (there aren't going to be many) DomXEN devices >> were initialized first. > > I think that just makes the logic more complicated for no reason, the > only reason you don't see this with devices assigned to dom0 is > because device addition is interleaved with calls to > vpci_add_handlers(). However it would also be valid to add all > devices to dom0 and then call vpci_add_handlers() for each one of them. Okay, I'll leave that alone than, at least as long as we don't see any reason because of actual behavioral differences. Jan
On 24.05.2023 17:56, Roger Pau Monné wrote: > On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: >> --- a/xen/drivers/vpci/header.c >> +++ b/xen/drivers/vpci/header.c >> @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ >> struct vpci_header *header = &pdev->vpci->header; >> struct rangeset *mem = rangeset_new(NULL, NULL, 0); >> struct pci_dev *tmp, *dev = NULL; >> + const struct domain *d; >> const struct vpci_msix *msix = pdev->vpci->msix; >> unsigned int i; >> int rc; >> @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ >> >> /* >> * Check for overlaps with other BARs. Note that only BARs that are >> - * currently mapped (enabled) are checked for overlaps. >> + * currently mapped (enabled) are checked for overlaps. Note also that >> + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. >> */ >> - for_each_pdev ( pdev->domain, tmp ) >> +for ( d = pdev->domain; ; d = dom_xen ) {//todo > > Looking at this again, I think this is slightly more complex, as during > runtime dom0 will get here with pdev->domain == hardware_domain OR > dom_xen, and hence you also need to account that devices that have > pdev->domain == dom_xen need to iterate over devices that belong to > the hardware_domain, ie: > > for ( d = pdev->domain; ; > d = (pdev->domain == dom_xen) ? hardware_domain : dom_xen ) Right, something along these lines. To keep loop continuation expression and exit condition simple, I'll probably prefer for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; ; d = dom_xen ) > And we likely want to limit this to devices that belong to the > hardware_domain or to dom_xen (in preparation for vPCI being used for > domUs). I'm afraid I don't understand this remark, though. Jan
On Thu, May 25, 2023 at 04:39:51PM +0200, Jan Beulich wrote: > On 24.05.2023 17:56, Roger Pau Monné wrote: > > On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: > >> --- a/xen/drivers/vpci/header.c > >> +++ b/xen/drivers/vpci/header.c > >> @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ > >> struct vpci_header *header = &pdev->vpci->header; > >> struct rangeset *mem = rangeset_new(NULL, NULL, 0); > >> struct pci_dev *tmp, *dev = NULL; > >> + const struct domain *d; > >> const struct vpci_msix *msix = pdev->vpci->msix; > >> unsigned int i; > >> int rc; > >> @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ > >> > >> /* > >> * Check for overlaps with other BARs. Note that only BARs that are > >> - * currently mapped (enabled) are checked for overlaps. > >> + * currently mapped (enabled) are checked for overlaps. Note also that > >> + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. > >> */ > >> - for_each_pdev ( pdev->domain, tmp ) > >> +for ( d = pdev->domain; ; d = dom_xen ) {//todo > > > > Looking at this again, I think this is slightly more complex, as during > > runtime dom0 will get here with pdev->domain == hardware_domain OR > > dom_xen, and hence you also need to account that devices that have > > pdev->domain == dom_xen need to iterate over devices that belong to > > the hardware_domain, ie: > > > > for ( d = pdev->domain; ; > > d = (pdev->domain == dom_xen) ? hardware_domain : dom_xen ) > > Right, something along these lines. To keep loop continuation expression > and exit condition simple, I'll probably prefer > > for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; > ; d = dom_xen ) LGTM. I would add parentheses around the pdev->domain != dom_xen condition, but that's just my personal taste. We might want to add an ASSERT(pdev->domain == hardware_domain || pdev->domain == dom_xen); here, just to remind that this chunk must be revisited when adding domU support (but you can also argue we haven't done this elsewhere), I just feel here it's not so obvious we don't want do to this for domUs. > > And we likely want to limit this to devices that belong to the > > hardware_domain or to dom_xen (in preparation for vPCI being used for > > domUs). > > I'm afraid I don't understand this remark, though. This was looking forward to domU support, so that you already cater for pdev->domain not being hardware_domain or dom_xen, but we might want to leave that for later, when domU support is actually introduced. Thanks, Roger.
On 25.05.2023 17:02, Roger Pau Monné wrote: > On Thu, May 25, 2023 at 04:39:51PM +0200, Jan Beulich wrote: >> On 24.05.2023 17:56, Roger Pau Monné wrote: >>> On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: >>>> --- a/xen/drivers/vpci/header.c >>>> +++ b/xen/drivers/vpci/header.c >>>> @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ >>>> struct vpci_header *header = &pdev->vpci->header; >>>> struct rangeset *mem = rangeset_new(NULL, NULL, 0); >>>> struct pci_dev *tmp, *dev = NULL; >>>> + const struct domain *d; >>>> const struct vpci_msix *msix = pdev->vpci->msix; >>>> unsigned int i; >>>> int rc; >>>> @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ >>>> >>>> /* >>>> * Check for overlaps with other BARs. Note that only BARs that are >>>> - * currently mapped (enabled) are checked for overlaps. >>>> + * currently mapped (enabled) are checked for overlaps. Note also that >>>> + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. >>>> */ >>>> - for_each_pdev ( pdev->domain, tmp ) >>>> +for ( d = pdev->domain; ; d = dom_xen ) {//todo >>> >>> Looking at this again, I think this is slightly more complex, as during >>> runtime dom0 will get here with pdev->domain == hardware_domain OR >>> dom_xen, and hence you also need to account that devices that have >>> pdev->domain == dom_xen need to iterate over devices that belong to >>> the hardware_domain, ie: >>> >>> for ( d = pdev->domain; ; >>> d = (pdev->domain == dom_xen) ? hardware_domain : dom_xen ) >> >> Right, something along these lines. To keep loop continuation expression >> and exit condition simple, I'll probably prefer >> >> for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; >> ; d = dom_xen ) > > LGTM. I would add parentheses around the pdev->domain != dom_xen > condition, but that's just my personal taste. > > We might want to add an > > ASSERT(pdev->domain == hardware_domain || pdev->domain == dom_xen); > > here, just to remind that this chunk must be revisited when adding > domU support (but you can also argue we haven't done this elsewhere), > I just feel here it's not so obvious we don't want do to this for > domUs. I could add such an assertion, if only ... >>> And we likely want to limit this to devices that belong to the >>> hardware_domain or to dom_xen (in preparation for vPCI being used for >>> domUs). >> >> I'm afraid I don't understand this remark, though. > > This was looking forward to domU support, so that you already cater > for pdev->domain not being hardware_domain or dom_xen, but we might > want to leave that for later, when domU support is actually > introduced. ... I understood why this checking doesn't apply to DomU-s as well, in your opinion. Jan
On 25.05.2023 17:30, Jan Beulich wrote: > On 25.05.2023 17:02, Roger Pau Monné wrote: >> On Thu, May 25, 2023 at 04:39:51PM +0200, Jan Beulich wrote: >>> On 24.05.2023 17:56, Roger Pau Monné wrote: >>>> On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: >>>>> --- a/xen/drivers/vpci/header.c >>>>> +++ b/xen/drivers/vpci/header.c >>>>> @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ >>>>> struct vpci_header *header = &pdev->vpci->header; >>>>> struct rangeset *mem = rangeset_new(NULL, NULL, 0); >>>>> struct pci_dev *tmp, *dev = NULL; >>>>> + const struct domain *d; >>>>> const struct vpci_msix *msix = pdev->vpci->msix; >>>>> unsigned int i; >>>>> int rc; >>>>> @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ >>>>> >>>>> /* >>>>> * Check for overlaps with other BARs. Note that only BARs that are >>>>> - * currently mapped (enabled) are checked for overlaps. >>>>> + * currently mapped (enabled) are checked for overlaps. Note also that >>>>> + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. >>>>> */ >>>>> - for_each_pdev ( pdev->domain, tmp ) >>>>> +for ( d = pdev->domain; ; d = dom_xen ) {//todo >>>> >>>> Looking at this again, I think this is slightly more complex, as during >>>> runtime dom0 will get here with pdev->domain == hardware_domain OR >>>> dom_xen, and hence you also need to account that devices that have >>>> pdev->domain == dom_xen need to iterate over devices that belong to >>>> the hardware_domain, ie: >>>> >>>> for ( d = pdev->domain; ; >>>> d = (pdev->domain == dom_xen) ? hardware_domain : dom_xen ) >>> >>> Right, something along these lines. To keep loop continuation expression >>> and exit condition simple, I'll probably prefer >>> >>> for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; >>> ; d = dom_xen ) >> >> LGTM. I would add parentheses around the pdev->domain != dom_xen >> condition, but that's just my personal taste. >> >> We might want to add an >> >> ASSERT(pdev->domain == hardware_domain || pdev->domain == dom_xen); >> >> here, just to remind that this chunk must be revisited when adding >> domU support (but you can also argue we haven't done this elsewhere), >> I just feel here it's not so obvious we don't want do to this for >> domUs. > > I could add such an assertion, if only ... > >>>> And we likely want to limit this to devices that belong to the >>>> hardware_domain or to dom_xen (in preparation for vPCI being used for >>>> domUs). >>> >>> I'm afraid I don't understand this remark, though. >> >> This was looking forward to domU support, so that you already cater >> for pdev->domain not being hardware_domain or dom_xen, but we might >> want to leave that for later, when domU support is actually >> introduced. > > ... I understood why this checking doesn't apply to DomU-s as well, > in your opinion. Or did you mean that to go inside the if() your patch adds (and hence my patch won't need to add anymore)? I didn't think you did, because then it would rather be ASSERT(d == hardware_domain || d == dom_xen) imo. Jan
On Thu, 25 May 2023, Jan Beulich wrote: > On 25.05.2023 01:37, Stefano Stabellini wrote: > > On Wed, 24 May 2023, Jan Beulich wrote: > >>>> RFC: _setup_hwdom_pci_devices()' loop may want splitting: For > >>>> modify_bars() to consistently respect BARs of hidden devices while > >>>> setting up "normal" ones (i.e. to avoid as much as possible the > >>>> "continue" path introduced here), setting up of the former may want > >>>> doing first. > >>> > >>> But BARs of hidden devices should be mapped into dom0 physmap? > >> > >> Yes. > > > > The BARs would be mapped read-only (not read-write), right? Otherwise we > > let dom0 access devices that belong to Xen, which doesn't seem like a > > good idea. > > > > But even if we map the BARs read-only, what is the benefit of mapping > > them to Dom0? If Dom0 loads a driver for it and the driver wants to > > initialize the device, the driver will crash because the MMIO region is > > read-only instead of read-write, right? > > > > How does this device hiding work for dom0? How does dom0 know not to > > access a device that is present on the PCI bus but is used by Xen? > > None of these are new questions - this has all been this way for PV Dom0, > and so far we've limped along quite okay. That's not to say that we > shouldn't improve things if we can, but that first requires ideas as to > how. For PV, that was OK because PV requires extensive guest modifications anyway. We only run Linux and few BSDs as Dom0. So, making the interface cleaner and reducing guest changes is nice-to-have but not critical. For PVH, this is different. One of the top reasons for AMD to work on PVH is to enable arbitrary non-Linux OSes as Dom0 (when paired with dom0less/hyperlaunch). It could be anything from Zephyr to a proprietary RTOS like VxWorks. Minimal guest changes for advanced features (e.g. Dom0 S3) might be OK but in general I think we should aim at (almost) zero guest changes. On ARM, it is already the case (with some non-upstream patches for dom0less PCI.) For this specific patch, which is necessary to enable PVH on AMD x86 in gitlab-ci, we can do anything we want to make it move faster. But medium/long term I think we should try to make non-Xen-aware PVH Dom0 possible. Cheers, Stefano
On Thu, 25 May 2023, Stefano Stabellini wrote: > On Thu, 25 May 2023, Jan Beulich wrote: > > On 25.05.2023 01:37, Stefano Stabellini wrote: > > > On Wed, 24 May 2023, Jan Beulich wrote: > > >>>> RFC: _setup_hwdom_pci_devices()' loop may want splitting: For > > >>>> modify_bars() to consistently respect BARs of hidden devices while > > >>>> setting up "normal" ones (i.e. to avoid as much as possible the > > >>>> "continue" path introduced here), setting up of the former may want > > >>>> doing first. > > >>> > > >>> But BARs of hidden devices should be mapped into dom0 physmap? > > >> > > >> Yes. > > > > > > The BARs would be mapped read-only (not read-write), right? Otherwise we > > > let dom0 access devices that belong to Xen, which doesn't seem like a > > > good idea. > > > > > > But even if we map the BARs read-only, what is the benefit of mapping > > > them to Dom0? If Dom0 loads a driver for it and the driver wants to > > > initialize the device, the driver will crash because the MMIO region is > > > read-only instead of read-write, right? > > > > > > How does this device hiding work for dom0? How does dom0 know not to > > > access a device that is present on the PCI bus but is used by Xen? > > > > None of these are new questions - this has all been this way for PV Dom0, > > and so far we've limped along quite okay. That's not to say that we > > shouldn't improve things if we can, but that first requires ideas as to > > how. > > For PV, that was OK because PV requires extensive guest modifications > anyway. We only run Linux and few BSDs as Dom0. So, making the interface > cleaner and reducing guest changes is nice-to-have but not critical. > > For PVH, this is different. One of the top reasons for AMD to work on > PVH is to enable arbitrary non-Linux OSes as Dom0 (when paired with > dom0less/hyperlaunch). It could be anything from Zephyr to a > proprietary RTOS like VxWorks. Minimal guest changes for advanced > features (e.g. Dom0 S3) might be OK but in general I think we should aim > at (almost) zero guest changes. On ARM, it is already the case (with some > non-upstream patches for dom0less PCI.) > > For this specific patch, which is necessary to enable PVH on AMD x86 in > gitlab-ci, we can do anything we want to make it move faster. But > medium/long term I think we should try to make non-Xen-aware PVH Dom0 > possible. Like I wrote, personally I am happy with whatever gets us to have the PVH test in gitlab-ci faster. However, on the specific problem of PCI devices used by Xen and how to deal with them for Dom0 PVH, I think they should be completely hidden. Hidden in the sense that they don't appear on the Dom0 PCI bus. If the hidden device is a function of a multi-function PCI device, then the entire multi-function PCI device should be hidden. I don't think this case is very important because devices used by Xen are timers, IOMMUs, UARTs, all devices that typically are not multi-function, so it is OK to be extra careful and remove the entire device from Dom0 in the odd case that the device is both multi-function and only partially used by Xen. This is what I would do for Xen on ARM too.
On 25.05.2023 21:24, Stefano Stabellini wrote: > On Thu, 25 May 2023, Jan Beulich wrote: >> On 25.05.2023 01:37, Stefano Stabellini wrote: >>> On Wed, 24 May 2023, Jan Beulich wrote: >>>>>> RFC: _setup_hwdom_pci_devices()' loop may want splitting: For >>>>>> modify_bars() to consistently respect BARs of hidden devices while >>>>>> setting up "normal" ones (i.e. to avoid as much as possible the >>>>>> "continue" path introduced here), setting up of the former may want >>>>>> doing first. >>>>> >>>>> But BARs of hidden devices should be mapped into dom0 physmap? >>>> >>>> Yes. >>> >>> The BARs would be mapped read-only (not read-write), right? Otherwise we >>> let dom0 access devices that belong to Xen, which doesn't seem like a >>> good idea. >>> >>> But even if we map the BARs read-only, what is the benefit of mapping >>> them to Dom0? If Dom0 loads a driver for it and the driver wants to >>> initialize the device, the driver will crash because the MMIO region is >>> read-only instead of read-write, right? >>> >>> How does this device hiding work for dom0? How does dom0 know not to >>> access a device that is present on the PCI bus but is used by Xen? >> >> None of these are new questions - this has all been this way for PV Dom0, >> and so far we've limped along quite okay. That's not to say that we >> shouldn't improve things if we can, but that first requires ideas as to >> how. > > For PV, that was OK because PV requires extensive guest modifications > anyway. We only run Linux and few BSDs as Dom0. So, making the interface > cleaner and reducing guest changes is nice-to-have but not critical. > > For PVH, this is different. One of the top reasons for AMD to work on > PVH is to enable arbitrary non-Linux OSes as Dom0 (when paired with > dom0less/hyperlaunch). It could be anything from Zephyr to a > proprietary RTOS like VxWorks. Minimal guest changes for advanced > features (e.g. Dom0 S3) might be OK but in general I think we should aim > at (almost) zero guest changes. On ARM, it is already the case (with some > non-upstream patches for dom0less PCI.) > > For this specific patch, which is necessary to enable PVH on AMD x86 in > gitlab-ci, we can do anything we want to make it move faster. But > medium/long term I think we should try to make non-Xen-aware PVH Dom0 > possible. I don't think Linux could boot as PVH Dom0 without any awareness. Hence I guess it's not easy to see how other OSes might. What you're after looks rather like a HVM Dom0 to me, with it being unclear where the external emulator then would run (in a stubdom maybe, which might be possible to arrange for via the dom0less way of creating boot time DomU-s) and how it would get any necessary xenstore based information. Jan
On 25.05.2023 21:32, Stefano Stabellini wrote: > Like I wrote, personally I am happy with whatever gets us to have the PVH > test in gitlab-ci faster. > > However, on the specific problem of PCI devices used by Xen and how to > deal with them for Dom0 PVH, I think they should be completely hidden. > Hidden in the sense that they don't appear on the Dom0 PCI bus. If the > hidden device is a function of a multi-function PCI device, then the > entire multi-function PCI device should be hidden. > > I don't think this case is very important because devices used by Xen > are timers, IOMMUs, UARTs, ... USB debug ports (EHCI, XHCI), ... > all devices that typically are not multi-function, except for the ones added. Furthermore see video_endboot() for a case of also hiding the VGA device, which isn't unlikely to have secondary functions (sound controllers are not uncommon). Hence ... > so it is OK to be extra careful and remove the entire > device from Dom0 in the odd case that the device is both multi-function > and only partially used by Xen. This is what I would do for Xen on ARM > too. ... at best I would see this as a non-default mode of operation. Of course we could also play more funny games with vPCI, like surfacing a "stub" device in place of a hidden one, so the other functions can still be found. Jan
On Thu, May 25, 2023 at 05:30:54PM +0200, Jan Beulich wrote: > On 25.05.2023 17:02, Roger Pau Monné wrote: > > On Thu, May 25, 2023 at 04:39:51PM +0200, Jan Beulich wrote: > >> On 24.05.2023 17:56, Roger Pau Monné wrote: > >>> On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: > >>>> --- a/xen/drivers/vpci/header.c > >>>> +++ b/xen/drivers/vpci/header.c > >>>> @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ > >>>> struct vpci_header *header = &pdev->vpci->header; > >>>> struct rangeset *mem = rangeset_new(NULL, NULL, 0); > >>>> struct pci_dev *tmp, *dev = NULL; > >>>> + const struct domain *d; > >>>> const struct vpci_msix *msix = pdev->vpci->msix; > >>>> unsigned int i; > >>>> int rc; > >>>> @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ > >>>> > >>>> /* > >>>> * Check for overlaps with other BARs. Note that only BARs that are > >>>> - * currently mapped (enabled) are checked for overlaps. > >>>> + * currently mapped (enabled) are checked for overlaps. Note also that > >>>> + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. > >>>> */ > >>>> - for_each_pdev ( pdev->domain, tmp ) > >>>> +for ( d = pdev->domain; ; d = dom_xen ) {//todo > >>> > >>> Looking at this again, I think this is slightly more complex, as during > >>> runtime dom0 will get here with pdev->domain == hardware_domain OR > >>> dom_xen, and hence you also need to account that devices that have > >>> pdev->domain == dom_xen need to iterate over devices that belong to > >>> the hardware_domain, ie: > >>> > >>> for ( d = pdev->domain; ; > >>> d = (pdev->domain == dom_xen) ? hardware_domain : dom_xen ) > >> > >> Right, something along these lines. To keep loop continuation expression > >> and exit condition simple, I'll probably prefer > >> > >> for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; > >> ; d = dom_xen ) > > > > LGTM. I would add parentheses around the pdev->domain != dom_xen > > condition, but that's just my personal taste. > > > > We might want to add an > > > > ASSERT(pdev->domain == hardware_domain || pdev->domain == dom_xen); > > > > here, just to remind that this chunk must be revisited when adding > > domU support (but you can also argue we haven't done this elsewhere), > > I just feel here it's not so obvious we don't want do to this for > > domUs. > > I could add such an assertion, if only ... > > >>> And we likely want to limit this to devices that belong to the > >>> hardware_domain or to dom_xen (in preparation for vPCI being used for > >>> domUs). > >> > >> I'm afraid I don't understand this remark, though. > > > > This was looking forward to domU support, so that you already cater > > for pdev->domain not being hardware_domain or dom_xen, but we might > > want to leave that for later, when domU support is actually > > introduced. > > ... I understood why this checking doesn't apply to DomU-s as well, > in your opinion. It's my understanding that domUs can never get hidden or read-only devices assigned, and hence there no need to check for overlap with devices assigned to dom_xen, as those cannot have any BARs mapped in a domU physmap. So for domUs the overlap check only needs to be performed against devices assigned to pdev->domain. Thanks, Roger.
On 29.05.2023 10:08, Roger Pau Monné wrote: > On Thu, May 25, 2023 at 05:30:54PM +0200, Jan Beulich wrote: >> On 25.05.2023 17:02, Roger Pau Monné wrote: >>> On Thu, May 25, 2023 at 04:39:51PM +0200, Jan Beulich wrote: >>>> On 24.05.2023 17:56, Roger Pau Monné wrote: >>>>> On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: >>>>>> --- a/xen/drivers/vpci/header.c >>>>>> +++ b/xen/drivers/vpci/header.c >>>>>> @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ >>>>>> struct vpci_header *header = &pdev->vpci->header; >>>>>> struct rangeset *mem = rangeset_new(NULL, NULL, 0); >>>>>> struct pci_dev *tmp, *dev = NULL; >>>>>> + const struct domain *d; >>>>>> const struct vpci_msix *msix = pdev->vpci->msix; >>>>>> unsigned int i; >>>>>> int rc; >>>>>> @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ >>>>>> >>>>>> /* >>>>>> * Check for overlaps with other BARs. Note that only BARs that are >>>>>> - * currently mapped (enabled) are checked for overlaps. >>>>>> + * currently mapped (enabled) are checked for overlaps. Note also that >>>>>> + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. >>>>>> */ >>>>>> - for_each_pdev ( pdev->domain, tmp ) >>>>>> +for ( d = pdev->domain; ; d = dom_xen ) {//todo >>>>> >>>>> Looking at this again, I think this is slightly more complex, as during >>>>> runtime dom0 will get here with pdev->domain == hardware_domain OR >>>>> dom_xen, and hence you also need to account that devices that have >>>>> pdev->domain == dom_xen need to iterate over devices that belong to >>>>> the hardware_domain, ie: >>>>> >>>>> for ( d = pdev->domain; ; >>>>> d = (pdev->domain == dom_xen) ? hardware_domain : dom_xen ) >>>> >>>> Right, something along these lines. To keep loop continuation expression >>>> and exit condition simple, I'll probably prefer >>>> >>>> for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; >>>> ; d = dom_xen ) >>> >>> LGTM. I would add parentheses around the pdev->domain != dom_xen >>> condition, but that's just my personal taste. >>> >>> We might want to add an >>> >>> ASSERT(pdev->domain == hardware_domain || pdev->domain == dom_xen); >>> >>> here, just to remind that this chunk must be revisited when adding >>> domU support (but you can also argue we haven't done this elsewhere), >>> I just feel here it's not so obvious we don't want do to this for >>> domUs. >> >> I could add such an assertion, if only ... >> >>>>> And we likely want to limit this to devices that belong to the >>>>> hardware_domain or to dom_xen (in preparation for vPCI being used for >>>>> domUs). >>>> >>>> I'm afraid I don't understand this remark, though. >>> >>> This was looking forward to domU support, so that you already cater >>> for pdev->domain not being hardware_domain or dom_xen, but we might >>> want to leave that for later, when domU support is actually >>> introduced. >> >> ... I understood why this checking doesn't apply to DomU-s as well, >> in your opinion. > > It's my understanding that domUs can never get hidden or read-only > devices assigned, and hence there no need to check for overlap with > devices assigned to dom_xen, as those cannot have any BARs mapped in > a domU physmap. > > So for domUs the overlap check only needs to be performed against > devices assigned to pdev->domain. I fully agree, but the assertion you suggested doesn't express that. Or maybe I'm misunderstanding what you did suggest, and there was an implication of some further if() around it. Jan
On Tue, May 30, 2023 at 10:45:09AM +0200, Jan Beulich wrote: > On 29.05.2023 10:08, Roger Pau Monné wrote: > > On Thu, May 25, 2023 at 05:30:54PM +0200, Jan Beulich wrote: > >> On 25.05.2023 17:02, Roger Pau Monné wrote: > >>> On Thu, May 25, 2023 at 04:39:51PM +0200, Jan Beulich wrote: > >>>> On 24.05.2023 17:56, Roger Pau Monné wrote: > >>>>> On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: > >>>>>> --- a/xen/drivers/vpci/header.c > >>>>>> +++ b/xen/drivers/vpci/header.c > >>>>>> @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ > >>>>>> struct vpci_header *header = &pdev->vpci->header; > >>>>>> struct rangeset *mem = rangeset_new(NULL, NULL, 0); > >>>>>> struct pci_dev *tmp, *dev = NULL; > >>>>>> + const struct domain *d; > >>>>>> const struct vpci_msix *msix = pdev->vpci->msix; > >>>>>> unsigned int i; > >>>>>> int rc; > >>>>>> @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ > >>>>>> > >>>>>> /* > >>>>>> * Check for overlaps with other BARs. Note that only BARs that are > >>>>>> - * currently mapped (enabled) are checked for overlaps. > >>>>>> + * currently mapped (enabled) are checked for overlaps. Note also that > >>>>>> + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. > >>>>>> */ > >>>>>> - for_each_pdev ( pdev->domain, tmp ) > >>>>>> +for ( d = pdev->domain; ; d = dom_xen ) {//todo > >>>>> > >>>>> Looking at this again, I think this is slightly more complex, as during > >>>>> runtime dom0 will get here with pdev->domain == hardware_domain OR > >>>>> dom_xen, and hence you also need to account that devices that have > >>>>> pdev->domain == dom_xen need to iterate over devices that belong to > >>>>> the hardware_domain, ie: > >>>>> > >>>>> for ( d = pdev->domain; ; > >>>>> d = (pdev->domain == dom_xen) ? hardware_domain : dom_xen ) > >>>> > >>>> Right, something along these lines. To keep loop continuation expression > >>>> and exit condition simple, I'll probably prefer > >>>> > >>>> for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; > >>>> ; d = dom_xen ) > >>> > >>> LGTM. I would add parentheses around the pdev->domain != dom_xen > >>> condition, but that's just my personal taste. > >>> > >>> We might want to add an > >>> > >>> ASSERT(pdev->domain == hardware_domain || pdev->domain == dom_xen); > >>> > >>> here, just to remind that this chunk must be revisited when adding > >>> domU support (but you can also argue we haven't done this elsewhere), > >>> I just feel here it's not so obvious we don't want do to this for > >>> domUs. > >> > >> I could add such an assertion, if only ... > >> > >>>>> And we likely want to limit this to devices that belong to the > >>>>> hardware_domain or to dom_xen (in preparation for vPCI being used for > >>>>> domUs). > >>>> > >>>> I'm afraid I don't understand this remark, though. > >>> > >>> This was looking forward to domU support, so that you already cater > >>> for pdev->domain not being hardware_domain or dom_xen, but we might > >>> want to leave that for later, when domU support is actually > >>> introduced. > >> > >> ... I understood why this checking doesn't apply to DomU-s as well, > >> in your opinion. > > > > It's my understanding that domUs can never get hidden or read-only > > devices assigned, and hence there no need to check for overlap with > > devices assigned to dom_xen, as those cannot have any BARs mapped in > > a domU physmap. > > > > So for domUs the overlap check only needs to be performed against > > devices assigned to pdev->domain. > > I fully agree, but the assertion you suggested doesn't express that. Or > maybe I'm misunderstanding what you did suggest, and there was an > implication of some further if() around it. Maybe I'm getting myself confused, but if you add something like: for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; ; d = dom_xen ) Such loop would need to be avoided for domUs, so my suggestion was to add the assert in order to remind us that the loop would need adjusting if we ever add domU support. But maybe you had already plans to restrict the loop to dom0 only. Thanks, Roger.
On 30.05.2023 11:12, Roger Pau Monné wrote: > On Tue, May 30, 2023 at 10:45:09AM +0200, Jan Beulich wrote: >> On 29.05.2023 10:08, Roger Pau Monné wrote: >>> On Thu, May 25, 2023 at 05:30:54PM +0200, Jan Beulich wrote: >>>> On 25.05.2023 17:02, Roger Pau Monné wrote: >>>>> On Thu, May 25, 2023 at 04:39:51PM +0200, Jan Beulich wrote: >>>>>> On 24.05.2023 17:56, Roger Pau Monné wrote: >>>>>>> On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: >>>>>>>> --- a/xen/drivers/vpci/header.c >>>>>>>> +++ b/xen/drivers/vpci/header.c >>>>>>>> @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ >>>>>>>> struct vpci_header *header = &pdev->vpci->header; >>>>>>>> struct rangeset *mem = rangeset_new(NULL, NULL, 0); >>>>>>>> struct pci_dev *tmp, *dev = NULL; >>>>>>>> + const struct domain *d; >>>>>>>> const struct vpci_msix *msix = pdev->vpci->msix; >>>>>>>> unsigned int i; >>>>>>>> int rc; >>>>>>>> @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ >>>>>>>> >>>>>>>> /* >>>>>>>> * Check for overlaps with other BARs. Note that only BARs that are >>>>>>>> - * currently mapped (enabled) are checked for overlaps. >>>>>>>> + * currently mapped (enabled) are checked for overlaps. Note also that >>>>>>>> + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. >>>>>>>> */ >>>>>>>> - for_each_pdev ( pdev->domain, tmp ) >>>>>>>> +for ( d = pdev->domain; ; d = dom_xen ) {//todo >>>>>>> >>>>>>> Looking at this again, I think this is slightly more complex, as during >>>>>>> runtime dom0 will get here with pdev->domain == hardware_domain OR >>>>>>> dom_xen, and hence you also need to account that devices that have >>>>>>> pdev->domain == dom_xen need to iterate over devices that belong to >>>>>>> the hardware_domain, ie: >>>>>>> >>>>>>> for ( d = pdev->domain; ; >>>>>>> d = (pdev->domain == dom_xen) ? hardware_domain : dom_xen ) >>>>>> >>>>>> Right, something along these lines. To keep loop continuation expression >>>>>> and exit condition simple, I'll probably prefer >>>>>> >>>>>> for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; >>>>>> ; d = dom_xen ) >>>>> >>>>> LGTM. I would add parentheses around the pdev->domain != dom_xen >>>>> condition, but that's just my personal taste. >>>>> >>>>> We might want to add an >>>>> >>>>> ASSERT(pdev->domain == hardware_domain || pdev->domain == dom_xen); >>>>> >>>>> here, just to remind that this chunk must be revisited when adding >>>>> domU support (but you can also argue we haven't done this elsewhere), >>>>> I just feel here it's not so obvious we don't want do to this for >>>>> domUs. >>>> >>>> I could add such an assertion, if only ... >>>> >>>>>>> And we likely want to limit this to devices that belong to the >>>>>>> hardware_domain or to dom_xen (in preparation for vPCI being used for >>>>>>> domUs). >>>>>> >>>>>> I'm afraid I don't understand this remark, though. >>>>> >>>>> This was looking forward to domU support, so that you already cater >>>>> for pdev->domain not being hardware_domain or dom_xen, but we might >>>>> want to leave that for later, when domU support is actually >>>>> introduced. >>>> >>>> ... I understood why this checking doesn't apply to DomU-s as well, >>>> in your opinion. >>> >>> It's my understanding that domUs can never get hidden or read-only >>> devices assigned, and hence there no need to check for overlap with >>> devices assigned to dom_xen, as those cannot have any BARs mapped in >>> a domU physmap. >>> >>> So for domUs the overlap check only needs to be performed against >>> devices assigned to pdev->domain. >> >> I fully agree, but the assertion you suggested doesn't express that. Or >> maybe I'm misunderstanding what you did suggest, and there was an >> implication of some further if() around it. > > Maybe I'm getting myself confused, but if you add something like: > > for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; > ; d = dom_xen ) > > Such loop would need to be avoided for domUs, so my suggestion was to > add the assert in order to remind us that the loop would need > adjusting if we ever add domU support. But maybe you had already > plans to restrict the loop to dom0 only. Not really, no, but at the bottom of the loop I also have if ( !is_hardware_domain(d) ) break; } (still mis-formatted in the v2 patch). I.e. restricting to Dom0 goes only as far as the 2nd loop iteration. Jan
On Tue, May 30, 2023 at 11:44:52AM +0200, Jan Beulich wrote: > On 30.05.2023 11:12, Roger Pau Monné wrote: > > On Tue, May 30, 2023 at 10:45:09AM +0200, Jan Beulich wrote: > >> On 29.05.2023 10:08, Roger Pau Monné wrote: > >>> On Thu, May 25, 2023 at 05:30:54PM +0200, Jan Beulich wrote: > >>>> On 25.05.2023 17:02, Roger Pau Monné wrote: > >>>>> On Thu, May 25, 2023 at 04:39:51PM +0200, Jan Beulich wrote: > >>>>>> On 24.05.2023 17:56, Roger Pau Monné wrote: > >>>>>>> On Wed, May 24, 2023 at 03:45:58PM +0200, Jan Beulich wrote: > >>>>>>>> --- a/xen/drivers/vpci/header.c > >>>>>>>> +++ b/xen/drivers/vpci/header.c > >>>>>>>> @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ > >>>>>>>> struct vpci_header *header = &pdev->vpci->header; > >>>>>>>> struct rangeset *mem = rangeset_new(NULL, NULL, 0); > >>>>>>>> struct pci_dev *tmp, *dev = NULL; > >>>>>>>> + const struct domain *d; > >>>>>>>> const struct vpci_msix *msix = pdev->vpci->msix; > >>>>>>>> unsigned int i; > >>>>>>>> int rc; > >>>>>>>> @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ > >>>>>>>> > >>>>>>>> /* > >>>>>>>> * Check for overlaps with other BARs. Note that only BARs that are > >>>>>>>> - * currently mapped (enabled) are checked for overlaps. > >>>>>>>> + * currently mapped (enabled) are checked for overlaps. Note also that > >>>>>>>> + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. > >>>>>>>> */ > >>>>>>>> - for_each_pdev ( pdev->domain, tmp ) > >>>>>>>> +for ( d = pdev->domain; ; d = dom_xen ) {//todo > >>>>>>> > >>>>>>> Looking at this again, I think this is slightly more complex, as during > >>>>>>> runtime dom0 will get here with pdev->domain == hardware_domain OR > >>>>>>> dom_xen, and hence you also need to account that devices that have > >>>>>>> pdev->domain == dom_xen need to iterate over devices that belong to > >>>>>>> the hardware_domain, ie: > >>>>>>> > >>>>>>> for ( d = pdev->domain; ; > >>>>>>> d = (pdev->domain == dom_xen) ? hardware_domain : dom_xen ) > >>>>>> > >>>>>> Right, something along these lines. To keep loop continuation expression > >>>>>> and exit condition simple, I'll probably prefer > >>>>>> > >>>>>> for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; > >>>>>> ; d = dom_xen ) > >>>>> > >>>>> LGTM. I would add parentheses around the pdev->domain != dom_xen > >>>>> condition, but that's just my personal taste. > >>>>> > >>>>> We might want to add an > >>>>> > >>>>> ASSERT(pdev->domain == hardware_domain || pdev->domain == dom_xen); > >>>>> > >>>>> here, just to remind that this chunk must be revisited when adding > >>>>> domU support (but you can also argue we haven't done this elsewhere), > >>>>> I just feel here it's not so obvious we don't want do to this for > >>>>> domUs. > >>>> > >>>> I could add such an assertion, if only ... > >>>> > >>>>>>> And we likely want to limit this to devices that belong to the > >>>>>>> hardware_domain or to dom_xen (in preparation for vPCI being used for > >>>>>>> domUs). > >>>>>> > >>>>>> I'm afraid I don't understand this remark, though. > >>>>> > >>>>> This was looking forward to domU support, so that you already cater > >>>>> for pdev->domain not being hardware_domain or dom_xen, but we might > >>>>> want to leave that for later, when domU support is actually > >>>>> introduced. > >>>> > >>>> ... I understood why this checking doesn't apply to DomU-s as well, > >>>> in your opinion. > >>> > >>> It's my understanding that domUs can never get hidden or read-only > >>> devices assigned, and hence there no need to check for overlap with > >>> devices assigned to dom_xen, as those cannot have any BARs mapped in > >>> a domU physmap. > >>> > >>> So for domUs the overlap check only needs to be performed against > >>> devices assigned to pdev->domain. > >> > >> I fully agree, but the assertion you suggested doesn't express that. Or > >> maybe I'm misunderstanding what you did suggest, and there was an > >> implication of some further if() around it. > > > > Maybe I'm getting myself confused, but if you add something like: > > > > for ( d = pdev->domain != dom_xen ? pdev->domain : hardware_domain; > > ; d = dom_xen ) > > > > Such loop would need to be avoided for domUs, so my suggestion was to > > add the assert in order to remind us that the loop would need > > adjusting if we ever add domU support. But maybe you had already > > plans to restrict the loop to dom0 only. > > Not really, no, but at the bottom of the loop I also have > > if ( !is_hardware_domain(d) ) > break; > } > > (still mis-formatted in the v2 patch). I.e. restricting to Dom0 goes > only as far as the 2nd loop iteration. Oh, right, and that would also exit the loop on the first iteration if the device is assigned to a domU, so it's all fine. Sorry for the noise then. Thanks, Roger.
On Fri, 26 May 2023, Jan Beulich wrote: > On 25.05.2023 21:24, Stefano Stabellini wrote: > > On Thu, 25 May 2023, Jan Beulich wrote: > >> On 25.05.2023 01:37, Stefano Stabellini wrote: > >>> On Wed, 24 May 2023, Jan Beulich wrote: > >>>>>> RFC: _setup_hwdom_pci_devices()' loop may want splitting: For > >>>>>> modify_bars() to consistently respect BARs of hidden devices while > >>>>>> setting up "normal" ones (i.e. to avoid as much as possible the > >>>>>> "continue" path introduced here), setting up of the former may want > >>>>>> doing first. > >>>>> > >>>>> But BARs of hidden devices should be mapped into dom0 physmap? > >>>> > >>>> Yes. > >>> > >>> The BARs would be mapped read-only (not read-write), right? Otherwise we > >>> let dom0 access devices that belong to Xen, which doesn't seem like a > >>> good idea. > >>> > >>> But even if we map the BARs read-only, what is the benefit of mapping > >>> them to Dom0? If Dom0 loads a driver for it and the driver wants to > >>> initialize the device, the driver will crash because the MMIO region is > >>> read-only instead of read-write, right? > >>> > >>> How does this device hiding work for dom0? How does dom0 know not to > >>> access a device that is present on the PCI bus but is used by Xen? > >> > >> None of these are new questions - this has all been this way for PV Dom0, > >> and so far we've limped along quite okay. That's not to say that we > >> shouldn't improve things if we can, but that first requires ideas as to > >> how. > > > > For PV, that was OK because PV requires extensive guest modifications > > anyway. We only run Linux and few BSDs as Dom0. So, making the interface > > cleaner and reducing guest changes is nice-to-have but not critical. > > > > For PVH, this is different. One of the top reasons for AMD to work on > > PVH is to enable arbitrary non-Linux OSes as Dom0 (when paired with > > dom0less/hyperlaunch). It could be anything from Zephyr to a > > proprietary RTOS like VxWorks. Minimal guest changes for advanced > > features (e.g. Dom0 S3) might be OK but in general I think we should aim > > at (almost) zero guest changes. On ARM, it is already the case (with some > > non-upstream patches for dom0less PCI.) > > > > For this specific patch, which is necessary to enable PVH on AMD x86 in > > gitlab-ci, we can do anything we want to make it move faster. But > > medium/long term I think we should try to make non-Xen-aware PVH Dom0 > > possible. > > I don't think Linux could boot as PVH Dom0 without any awareness. Hence > I guess it's not easy to see how other OSes might. What you're after > looks rather like a HVM Dom0 to me, with it being unclear where the > external emulator then would run (in a stubdom maybe, which might be > possible to arrange for via the dom0less way of creating boot time > DomU-s) and how it would get any necessary xenstore based information. I know that Linux has lots of Xen awareness scattered everywhere so it is difficult to tell what's what. Leaving the PVH entry point aside for this discussion, what else is really needed for a Linux without CONFIG_XEN to boot as PVH Dom0? Same question from a different angle: let's say that we boot Zephyr or another RTOS as HVM Dom0, what is really required for the emulator to emulate? I am hoping that the answer is "nothing" except for maybe a UART. It comes down to how much legacy stuff the guest OS expects to find. Legacy stuff that would normally be emulated by QEMU. I am counting on the fact that a modern OS doesn't expect any of the legacy stuff (e.g. PIIX3/Q35/E1000) if it is not advertised in the firmware tables. If there is no need for QEMU, I don't know if I would call it PVH or HVM but either way we are good. Same for xenstore: there should be no need for xenstore without CONFIG_XEN.
On Fri, 26 May 2023, Jan Beulich wrote: > On 25.05.2023 21:32, Stefano Stabellini wrote: > > Like I wrote, personally I am happy with whatever gets us to have the PVH > > test in gitlab-ci faster. > > > > However, on the specific problem of PCI devices used by Xen and how to > > deal with them for Dom0 PVH, I think they should be completely hidden. > > Hidden in the sense that they don't appear on the Dom0 PCI bus. If the > > hidden device is a function of a multi-function PCI device, then the > > entire multi-function PCI device should be hidden. > > > > I don't think this case is very important because devices used by Xen > > are timers, IOMMUs, UARTs, > > ... USB debug ports (EHCI, XHCI), ... > > > all devices that typically are not multi-function, > > except for the ones added. Furthermore see video_endboot() for a case > of also hiding the VGA device, which isn't unlikely to have secondary > functions (sound controllers are not uncommon). Hence ... > > > so it is OK to be extra careful and remove the entire > > device from Dom0 in the odd case that the device is both multi-function > > and only partially used by Xen. This is what I would do for Xen on ARM > > too. > > ... at best I would see this as a non-default mode of operation. Of > course we could also play more funny games with vPCI, like surfacing > a "stub" device in place of a hidden one, so the other functions can > still be found. From our use-case point of view, non-default is OK. The PCI stub idea is a cool trick that might work but hopefully we won't need it at least initially. But it is something to consider if it turns out there is an important multi-function device with one of the devices hidden.
On 31.05.2023 00:38, Stefano Stabellini wrote: > On Fri, 26 May 2023, Jan Beulich wrote: >> On 25.05.2023 21:24, Stefano Stabellini wrote: >>> On Thu, 25 May 2023, Jan Beulich wrote: >>>> On 25.05.2023 01:37, Stefano Stabellini wrote: >>>>> On Wed, 24 May 2023, Jan Beulich wrote: >>>>>>>> RFC: _setup_hwdom_pci_devices()' loop may want splitting: For >>>>>>>> modify_bars() to consistently respect BARs of hidden devices while >>>>>>>> setting up "normal" ones (i.e. to avoid as much as possible the >>>>>>>> "continue" path introduced here), setting up of the former may want >>>>>>>> doing first. >>>>>>> >>>>>>> But BARs of hidden devices should be mapped into dom0 physmap? >>>>>> >>>>>> Yes. >>>>> >>>>> The BARs would be mapped read-only (not read-write), right? Otherwise we >>>>> let dom0 access devices that belong to Xen, which doesn't seem like a >>>>> good idea. >>>>> >>>>> But even if we map the BARs read-only, what is the benefit of mapping >>>>> them to Dom0? If Dom0 loads a driver for it and the driver wants to >>>>> initialize the device, the driver will crash because the MMIO region is >>>>> read-only instead of read-write, right? >>>>> >>>>> How does this device hiding work for dom0? How does dom0 know not to >>>>> access a device that is present on the PCI bus but is used by Xen? >>>> >>>> None of these are new questions - this has all been this way for PV Dom0, >>>> and so far we've limped along quite okay. That's not to say that we >>>> shouldn't improve things if we can, but that first requires ideas as to >>>> how. >>> >>> For PV, that was OK because PV requires extensive guest modifications >>> anyway. We only run Linux and few BSDs as Dom0. So, making the interface >>> cleaner and reducing guest changes is nice-to-have but not critical. >>> >>> For PVH, this is different. One of the top reasons for AMD to work on >>> PVH is to enable arbitrary non-Linux OSes as Dom0 (when paired with >>> dom0less/hyperlaunch). It could be anything from Zephyr to a >>> proprietary RTOS like VxWorks. Minimal guest changes for advanced >>> features (e.g. Dom0 S3) might be OK but in general I think we should aim >>> at (almost) zero guest changes. On ARM, it is already the case (with some >>> non-upstream patches for dom0less PCI.) >>> >>> For this specific patch, which is necessary to enable PVH on AMD x86 in >>> gitlab-ci, we can do anything we want to make it move faster. But >>> medium/long term I think we should try to make non-Xen-aware PVH Dom0 >>> possible. >> >> I don't think Linux could boot as PVH Dom0 without any awareness. Hence >> I guess it's not easy to see how other OSes might. What you're after >> looks rather like a HVM Dom0 to me, with it being unclear where the >> external emulator then would run (in a stubdom maybe, which might be >> possible to arrange for via the dom0less way of creating boot time >> DomU-s) and how it would get any necessary xenstore based information. > > I know that Linux has lots of Xen awareness scattered everywhere so it > is difficult to tell what's what. Leaving the PVH entry point aside for > this discussion, what else is really needed for a Linux without > CONFIG_XEN to boot as PVH Dom0? > > Same question from a different angle: let's say that we boot Zephyr or > another RTOS as HVM Dom0, what is really required for the emulator to > emulate? I am hoping that the answer is "nothing" except for maybe a > UART. > > It comes down to how much legacy stuff the guest OS expects to find. > Legacy stuff that would normally be emulated by QEMU. I am counting on > the fact that a modern OS doesn't expect any of the legacy stuff (e.g. > PIIX3/Q35/E1000) if it is not advertised in the firmware tables. And that's where I expect the problems start: We don't really alter things like the DSDT and SSDTs, and we also don't parse them. So we won't know what firmware describes there. Hence we have to expect that any legacy device might be present in the underlying platform, and hence would also need offering either by passing through or by emulation. Yet we can't sensibly emulate everything in Xen itself. > If > there is no need for QEMU, I don't know if I would call it PVH or HVM > but either way we are good. > > Same for xenstore: there should be no need for xenstore without > CONFIG_XEN. Right, it may be possible to get away without xenstore. Jan
--- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -218,6 +218,7 @@ static int modify_bars(const struct pci_ struct vpci_header *header = &pdev->vpci->header; struct rangeset *mem = rangeset_new(NULL, NULL, 0); struct pci_dev *tmp, *dev = NULL; + const struct domain *d; const struct vpci_msix *msix = pdev->vpci->msix; unsigned int i; int rc; @@ -285,9 +286,11 @@ static int modify_bars(const struct pci_ /* * Check for overlaps with other BARs. Note that only BARs that are - * currently mapped (enabled) are checked for overlaps. + * currently mapped (enabled) are checked for overlaps. Note also that + * for Dom0 we also need to include hidden, i.e. DomXEN's, devices. */ - for_each_pdev ( pdev->domain, tmp ) +for ( d = pdev->domain; ; d = dom_xen ) {//todo + for_each_pdev ( d, tmp ) { if ( tmp == pdev ) { @@ -304,6 +307,7 @@ static int modify_bars(const struct pci_ */ continue; } +if ( !tmp->vpci ) { ASSERT(d == dom_xen); continue; }//todo for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) { @@ -330,6 +334,7 @@ static int modify_bars(const struct pci_ } } } +if ( !is_hardware_domain(d) ) break; }//todo ASSERT(dev); --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -70,6 +70,7 @@ void vpci_remove_device(struct pci_dev * int vpci_add_handlers(struct pci_dev *pdev) { unsigned int i; + const unsigned long *ro_map; int rc = 0; if ( !has_vpci(pdev->domain) ) @@ -78,6 +79,11 @@ int vpci_add_handlers(struct pci_dev *pd /* We should not get here twice for the same device. */ ASSERT(!pdev->vpci); + /* No vPCI for r/o devices. */ + ro_map = pci_get_ro_map(pdev->sbdf.seg); + if ( ro_map && test_bit(pdev->sbdf.bdf, ro_map) ) + return 0; + pdev->vpci = xzalloc(struct vpci); if ( !pdev->vpci ) return -ENOMEM;
Hidden devices (e.g. an add-in PCI serial card used for Xen's serial console) are associated with DomXEN, not Dom0. This means that while looking for overlapping BARs such devices cannot be found on Dom0's list of devices; DomXEN's list also needs to be scanned. Suppress vPCI init altogether for r/o devices (which constitute a subset of hidden ones). Signed-off-by: Jan Beulich <jbeulich@suse.com> --- RFC: The modify_bars() change is intentionally mis-formatted, as the necessary re-indentation would make the diff difficult to read. At this point I'd merely like to gather input towards possible better approaches to solve the issue (not the least because quite possibly there are further places needing changing). RFC: Whether to actually suppress vPCI init is up for debate; adding the extra logic is following Roger's suggestion (I'm not convinced it is useful to have). If we want to keep that, a 2nd question would be whether to keep it in vpci_add_handlers(): Both of its callers (can) have a struct pci_seg readily available, so checking ->ro_map at the call sites would be easier. RFC: _setup_hwdom_pci_devices()' loop may want splitting: For modify_bars() to consistently respect BARs of hidden devices while setting up "normal" ones (i.e. to avoid as much as possible the "continue" path introduced here), setting up of the former may want doing first. RFC: vpci_write()'s check of the r/o map may want moving out of mainline code, into the case dealing with !pdev->vpci. --- v2: Extend existing comment. Relax assertion. Don't initialize vPCI for r/o devices.