Message ID | 20220204063459.680961-7-andr2000@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PCI devices passthrough on Arm, part 3 | expand |
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev, > + unsigned int reg, void *data) > +{ > + return 0; > +} > + > +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg, > + struct vpci_bar *bar) > +{ > + if ( is_hardware_domain(pdev->domain) ) > + return 0; > + > + return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL, > + reg, 4, bar); > +} For these two functions: I'm not sure "ignore" is an appropriate term here. unused_bar_read() and unused_bar() maybe? Or, considering we already have VPCI_BAR_EMPTY, s/unused/empty/ ? I'm also not sure we really need the is_hardware_domain() check here: Returning 0 for Dom0 is going to be fine as well; there's no need to fetch the value from actual hardware. The one exception might be for devices with buggy BAR behavior ... > @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev) > if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO ) > { > bars[i].type = VPCI_BAR_IO; > + > + rc = bar_ignore_access(pdev, reg, &bars[i]); > + if ( rc ) > + return rc; Elsewhere the command register is restored on error paths. Jan
On 07.02.22 19:06, Jan Beulich wrote: > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >> +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev, >> + unsigned int reg, void *data) >> +{ >> + return 0; >> +} >> + >> +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg, >> + struct vpci_bar *bar) >> +{ >> + if ( is_hardware_domain(pdev->domain) ) >> + return 0; >> + >> + return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL, >> + reg, 4, bar); >> +} > For these two functions: I'm not sure "ignore" is an appropriate > term here. unused_bar_read() and unused_bar() maybe? Or, > considering we already have VPCI_BAR_EMPTY, s/unused/empty/ ? I'm > also not sure we really need the is_hardware_domain() check here: > Returning 0 for Dom0 is going to be fine as well; there's no need > to fetch the value from actual hardware. The one exception might > be for devices with buggy BAR behavior ... Well, I think this should be ok, so then - s/guest_bar_ignore_read/empty_bar_read - s/bar_ignore_access/empty_bar - no is_hardware_domain check > >> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev) >> if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO ) >> { >> bars[i].type = VPCI_BAR_IO; >> + >> + rc = bar_ignore_access(pdev, reg, &bars[i]); >> + if ( rc ) >> + return rc; > Elsewhere the command register is restored on error paths. Ok, I will restore > > Jan > Thank you, Oleksandr
On 08.02.2022 09:06, Oleksandr Andrushchenko wrote: > > > On 07.02.22 19:06, Jan Beulich wrote: >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>> +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev, >>> + unsigned int reg, void *data) >>> +{ >>> + return 0; >>> +} >>> + >>> +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg, >>> + struct vpci_bar *bar) >>> +{ >>> + if ( is_hardware_domain(pdev->domain) ) >>> + return 0; >>> + >>> + return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL, >>> + reg, 4, bar); >>> +} >> For these two functions: I'm not sure "ignore" is an appropriate >> term here. unused_bar_read() and unused_bar() maybe? Or, >> considering we already have VPCI_BAR_EMPTY, s/unused/empty/ ? I'm >> also not sure we really need the is_hardware_domain() check here: >> Returning 0 for Dom0 is going to be fine as well; there's no need >> to fetch the value from actual hardware. The one exception might >> be for devices with buggy BAR behavior ... > Well, I think this should be ok, so then > - s/guest_bar_ignore_read/empty_bar_read > - s/bar_ignore_access/empty_bar Hmm, seeing it, I don't think empty_bar() is a good function name. setup_empty_bar() or empty_bar_setup() would make more clear what the function's purpose is. > - no is_hardware_domain check Please wait a little to see whether Roger has any input on this aspect. Jan
On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote: > From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> > > Add relevant vpci register handlers when assigning PCI device to a domain > and remove those when de-assigning. This allows having different > handlers for different domains, e.g. hwdom and other guests. > > Emulate guest BAR register values: this allows creating a guest view > of the registers and emulates size and properties probe as it is done > during PCI device enumeration by the guest. > > All empty, IO and ROM BARs for guests are emulated by returning 0 on > reads and ignoring writes: this BARs are special with this respect as > their lower bits have special meaning, so returning default ~0 on read > may confuse guest OS. > > Memory decoding is initially disabled when used by guests in order to > prevent the BAR being placed on top of a RAM region. > > Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> > --- > Since v5: > - make sure that the guest set address has the same page offset > as the physical address on the host > - remove guest_rom_{read|write} as those just implement the default > behaviour of the registers not being handled > - adjusted comment for struct vpci.addr field > - add guest handlers for BARs which are not handled and will otherwise > return ~0 on read and ignore writes. The BARs are special with this > respect as their lower bits have special meaning, so returning ~0 > doesn't seem to be right > Since v4: > - updated commit message > - s/guest_addr/guest_reg > Since v3: > - squashed two patches: dynamic add/remove handlers and guest BAR > handler implementation > - fix guest BAR read of the high part of a 64bit BAR (Roger) > - add error handling to vpci_assign_device > - s/dom%pd/%pd > - blank line before return > Since v2: > - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code > has been eliminated from being built on x86 > Since v1: > - constify struct pci_dev where possible > - do not open code is_system_domain() > - simplify some code3. simplify > - use gdprintk + error code instead of gprintk > - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT, > so these do not get compiled for x86 > - removed unneeded is_system_domain check > - re-work guest read/write to be much simpler and do more work on write > than read which is expected to be called more frequently > - removed one too obvious comment > --- > xen/drivers/vpci/header.c | 131 +++++++++++++++++++++++++++++++++----- > xen/include/xen/vpci.h | 3 + > 2 files changed, 118 insertions(+), 16 deletions(-) > > diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c > index bd23c0274d48..2620a95ff35b 100644 > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -406,6 +406,81 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg, > pci_conf_write32(pdev->sbdf, reg, val); > } > > +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg, > + uint32_t val, void *data) > +{ > + struct vpci_bar *bar = data; > + bool hi = false; > + uint64_t guest_reg = bar->guest_reg; > + > + if ( bar->type == VPCI_BAR_MEM64_HI ) > + { > + ASSERT(reg > PCI_BASE_ADDRESS_0); > + bar--; > + hi = true; > + } > + else > + { > + val &= PCI_BASE_ADDRESS_MEM_MASK; > + val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32 > + : PCI_BASE_ADDRESS_MEM_TYPE_64; > + val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0; > + } > + > + guest_reg &= ~(0xffffffffull << (hi ? 32 : 0)); > + guest_reg |= (uint64_t)val << (hi ? 32 : 0); > + > + guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK; > + > + /* > + * Make sure that the guest set address has the same page offset > + * as the physical address on the host or otherwise things won't work as > + * expected. > + */ > + if ( (guest_reg & (~PAGE_MASK & PCI_BASE_ADDRESS_MEM_MASK)) != > + (bar->addr & ~PAGE_MASK) ) This is only required when !hi, but I'm fine with doing it unconditionally as it's clearer. > + { > + gprintk(XENLOG_WARNING, > + "%pp: ignored BAR %zu write with wrong page offset\n", "%pp: ignored BAR %zu write attempting to change page offset\n" > + &pdev->sbdf, bar - pdev->vpci->header.bars + hi); > + return; > + } > + > + bar->guest_reg = guest_reg; > +} > + > +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg, > + void *data) > +{ > + const struct vpci_bar *bar = data; > + bool hi = false; > + > + if ( bar->type == VPCI_BAR_MEM64_HI ) > + { > + ASSERT(reg > PCI_BASE_ADDRESS_0); > + bar--; > + hi = true; > + } > + > + return bar->guest_reg >> (hi ? 32 : 0); > +} > + > +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev, > + unsigned int reg, void *data) > +{ > + return 0; > +} > + > +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg, > + struct vpci_bar *bar) > +{ > + if ( is_hardware_domain(pdev->domain) ) > + return 0; > + > + return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL, > + reg, 4, bar); > +} > + > static void rom_write(const struct pci_dev *pdev, unsigned int reg, > uint32_t val, void *data) > { > @@ -462,6 +537,7 @@ static int init_bars(struct pci_dev *pdev) > struct vpci_header *header = &pdev->vpci->header; > struct vpci_bar *bars = header->bars; > int rc; > + bool is_hwdom = is_hardware_domain(pdev->domain); > > switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f ) > { > @@ -501,8 +577,10 @@ static int init_bars(struct pci_dev *pdev) > if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO ) > { > bars[i].type = VPCI_BAR_MEM64_HI; > - rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, > - 4, &bars[i]); > + rc = vpci_add_register(pdev->vpci, > + is_hwdom ? vpci_hw_read32 : guest_bar_read, > + is_hwdom ? bar_write : guest_bar_write, > + reg, 4, &bars[i]); > if ( rc ) > { > pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd); > @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev) > if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO ) > { > bars[i].type = VPCI_BAR_IO; > + > + rc = bar_ignore_access(pdev, reg, &bars[i]); This is wrong: you only want to ignore access to IO BARs for Arm, for x86 we should keep the previous behavior. Even more if you go with Jan's suggestions to make bar_ignore_access also applicable to dom0. > + if ( rc ) > + return rc; > + > continue; > } > if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == > @@ -535,6 +618,11 @@ static int init_bars(struct pci_dev *pdev) > if ( size == 0 ) > { > bars[i].type = VPCI_BAR_EMPTY; > + > + rc = bar_ignore_access(pdev, reg, &bars[i]); > + if ( rc ) > + return rc; I would be fine to just call vpci_add_register here, ie; if ( !is_hwdom ) { rc = vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL, reg, 4, &bars[i]); if ( rc ) { ... } } Feel free to unify the writing of the PCI_COMMAND register on the error path into a label, as then the error case would simply be a `goto error;` Thanks, Roger.
On Tue, Feb 08, 2022 at 10:16:59AM +0100, Jan Beulich wrote: > On 08.02.2022 09:06, Oleksandr Andrushchenko wrote: > > > > > > On 07.02.22 19:06, Jan Beulich wrote: > >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > >>> +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev, > >>> + unsigned int reg, void *data) > >>> +{ > >>> + return 0; > >>> +} > >>> + > >>> +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg, > >>> + struct vpci_bar *bar) > >>> +{ > >>> + if ( is_hardware_domain(pdev->domain) ) > >>> + return 0; > >>> + > >>> + return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL, > >>> + reg, 4, bar); > >>> +} > >> For these two functions: I'm not sure "ignore" is an appropriate > >> term here. unused_bar_read() and unused_bar() maybe? Or, > >> considering we already have VPCI_BAR_EMPTY, s/unused/empty/ ? I'm > >> also not sure we really need the is_hardware_domain() check here: > >> Returning 0 for Dom0 is going to be fine as well; there's no need > >> to fetch the value from actual hardware. The one exception might > >> be for devices with buggy BAR behavior ... > > Well, I think this should be ok, so then > > - s/guest_bar_ignore_read/empty_bar_read > > - s/bar_ignore_access/empty_bar > > Hmm, seeing it, I don't think empty_bar() is a good function name. > setup_empty_bar() or empty_bar_setup() would make more clear what > the function's purpose is. I don't think you require an empty_bar_setup helper, the code there is trivial can be open coded in init_bars directly IMO. > > > - no is_hardware_domain check > > Please wait a little to see whether Roger has any input on this aspect. I think for the hw domain we should allow access to the BAR even if Xen has found it empty. Adding the ignore handlers for dom0 shouldn't make any difference, but we never know whether some quirky hardware could make use of that. Thanks, Roger.
On 08.02.22 11:25, Roger Pau Monné wrote: > On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote: >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> >> >> Add relevant vpci register handlers when assigning PCI device to a domain >> and remove those when de-assigning. This allows having different >> handlers for different domains, e.g. hwdom and other guests. >> >> Emulate guest BAR register values: this allows creating a guest view >> of the registers and emulates size and properties probe as it is done >> during PCI device enumeration by the guest. >> >> All empty, IO and ROM BARs for guests are emulated by returning 0 on >> reads and ignoring writes: this BARs are special with this respect as >> their lower bits have special meaning, so returning default ~0 on read >> may confuse guest OS. >> >> Memory decoding is initially disabled when used by guests in order to >> prevent the BAR being placed on top of a RAM region. >> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> >> --- >> Since v5: >> - make sure that the guest set address has the same page offset >> as the physical address on the host >> - remove guest_rom_{read|write} as those just implement the default >> behaviour of the registers not being handled >> - adjusted comment for struct vpci.addr field >> - add guest handlers for BARs which are not handled and will otherwise >> return ~0 on read and ignore writes. The BARs are special with this >> respect as their lower bits have special meaning, so returning ~0 >> doesn't seem to be right >> Since v4: >> - updated commit message >> - s/guest_addr/guest_reg >> Since v3: >> - squashed two patches: dynamic add/remove handlers and guest BAR >> handler implementation >> - fix guest BAR read of the high part of a 64bit BAR (Roger) >> - add error handling to vpci_assign_device >> - s/dom%pd/%pd >> - blank line before return >> Since v2: >> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code >> has been eliminated from being built on x86 >> Since v1: >> - constify struct pci_dev where possible >> - do not open code is_system_domain() >> - simplify some code3. simplify >> - use gdprintk + error code instead of gprintk >> - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT, >> so these do not get compiled for x86 >> - removed unneeded is_system_domain check >> - re-work guest read/write to be much simpler and do more work on write >> than read which is expected to be called more frequently >> - removed one too obvious comment >> --- >> xen/drivers/vpci/header.c | 131 +++++++++++++++++++++++++++++++++----- >> xen/include/xen/vpci.h | 3 + >> 2 files changed, 118 insertions(+), 16 deletions(-) >> >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >> index bd23c0274d48..2620a95ff35b 100644 >> --- a/xen/drivers/vpci/header.c >> +++ b/xen/drivers/vpci/header.c >> @@ -406,6 +406,81 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg, >> pci_conf_write32(pdev->sbdf, reg, val); >> } >> >> +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg, >> + uint32_t val, void *data) >> +{ >> + struct vpci_bar *bar = data; >> + bool hi = false; >> + uint64_t guest_reg = bar->guest_reg; >> + >> + if ( bar->type == VPCI_BAR_MEM64_HI ) >> + { >> + ASSERT(reg > PCI_BASE_ADDRESS_0); >> + bar--; >> + hi = true; >> + } >> + else >> + { >> + val &= PCI_BASE_ADDRESS_MEM_MASK; >> + val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32 >> + : PCI_BASE_ADDRESS_MEM_TYPE_64; >> + val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0; >> + } >> + >> + guest_reg &= ~(0xffffffffull << (hi ? 32 : 0)); >> + guest_reg |= (uint64_t)val << (hi ? 32 : 0); >> + >> + guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK; >> + >> + /* >> + * Make sure that the guest set address has the same page offset >> + * as the physical address on the host or otherwise things won't work as >> + * expected. >> + */ >> + if ( (guest_reg & (~PAGE_MASK & PCI_BASE_ADDRESS_MEM_MASK)) != >> + (bar->addr & ~PAGE_MASK) ) > This is only required when !hi, but I'm fine with doing it > unconditionally as it's clearer. This is correct wrt hi > >> + { >> + gprintk(XENLOG_WARNING, >> + "%pp: ignored BAR %zu write with wrong page offset\n", > "%pp: ignored BAR %zu write attempting to change page offset\n" Ok > >> + &pdev->sbdf, bar - pdev->vpci->header.bars + hi); >> + return; >> + } >> + >> + bar->guest_reg = guest_reg; >> +} >> + >> +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg, >> + void *data) >> +{ >> + const struct vpci_bar *bar = data; >> + bool hi = false; >> + >> + if ( bar->type == VPCI_BAR_MEM64_HI ) >> + { >> + ASSERT(reg > PCI_BASE_ADDRESS_0); >> + bar--; >> + hi = true; >> + } >> + >> + return bar->guest_reg >> (hi ? 32 : 0); >> +} >> + >> +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev, >> + unsigned int reg, void *data) >> +{ >> + return 0; >> +} >> + >> +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg, >> + struct vpci_bar *bar) >> +{ >> + if ( is_hardware_domain(pdev->domain) ) >> + return 0; >> + >> + return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL, >> + reg, 4, bar); >> +} >> + >> static void rom_write(const struct pci_dev *pdev, unsigned int reg, >> uint32_t val, void *data) >> { >> @@ -462,6 +537,7 @@ static int init_bars(struct pci_dev *pdev) >> struct vpci_header *header = &pdev->vpci->header; >> struct vpci_bar *bars = header->bars; >> int rc; >> + bool is_hwdom = is_hardware_domain(pdev->domain); >> >> switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f ) >> { >> @@ -501,8 +577,10 @@ static int init_bars(struct pci_dev *pdev) >> if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO ) >> { >> bars[i].type = VPCI_BAR_MEM64_HI; >> - rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, >> - 4, &bars[i]); >> + rc = vpci_add_register(pdev->vpci, >> + is_hwdom ? vpci_hw_read32 : guest_bar_read, >> + is_hwdom ? bar_write : guest_bar_write, >> + reg, 4, &bars[i]); >> if ( rc ) >> { >> pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd); >> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev) >> if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO ) >> { >> bars[i].type = VPCI_BAR_IO; >> + >> + rc = bar_ignore_access(pdev, reg, &bars[i]); > This is wrong: you only want to ignore access to IO BARs for Arm, for > x86 we should keep the previous behavior. Even more if you go with > Jan's suggestions to make bar_ignore_access also applicable to dom0. How do we want this? #ifdef CONFIG_ARM? > >> + if ( rc ) >> + return rc; >> + >> continue; >> } >> if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == >> @@ -535,6 +618,11 @@ static int init_bars(struct pci_dev *pdev) >> if ( size == 0 ) >> { >> bars[i].type = VPCI_BAR_EMPTY; >> + >> + rc = bar_ignore_access(pdev, reg, &bars[i]); >> + if ( rc ) >> + return rc; > I would be fine to just call vpci_add_register here, ie; > > if ( !is_hwdom ) > { > rc = vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL, > reg, 4, &bars[i]); > if ( rc ) > { > ... > } > } But we have 3 places where we do the same and also handle errors the same way. I was thinking having a helper will make the code clearer. Do you want to open code all the uses? > Feel free to unify the writing of the PCI_COMMAND register on the > error path into a label, as then the error case would simply be a > `goto error;` I was thinking about it. Will it be ok to make this change in this patch or you want a dedicated one for that? > Thanks, Roger. Thank you, Oleksandr
On 08.02.2022 10:31, Oleksandr Andrushchenko wrote: > On 08.02.22 11:25, Roger Pau Monné wrote: >> On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote: >>> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev) >>> if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO ) >>> { >>> bars[i].type = VPCI_BAR_IO; >>> + >>> + rc = bar_ignore_access(pdev, reg, &bars[i]); >> This is wrong: you only want to ignore access to IO BARs for Arm, for >> x86 we should keep the previous behavior. Even more if you go with >> Jan's suggestions to make bar_ignore_access also applicable to dom0. > How do we want this? > #ifdef CONFIG_ARM? Afaic better via a new, dedicated CONFIG_HAVE_* setting, which x86 selects but Arm doesn't. Unless we have one already, of course ... Jan
On 08.02.22 11:48, Jan Beulich wrote: > On 08.02.2022 10:31, Oleksandr Andrushchenko wrote: >> On 08.02.22 11:25, Roger Pau Monné wrote: >>> On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote: >>>> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev) >>>> if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO ) >>>> { >>>> bars[i].type = VPCI_BAR_IO; >>>> + >>>> + rc = bar_ignore_access(pdev, reg, &bars[i]); >>> This is wrong: you only want to ignore access to IO BARs for Arm, for >>> x86 we should keep the previous behavior. Even more if you go with >>> Jan's suggestions to make bar_ignore_access also applicable to dom0. >> How do we want this? >> #ifdef CONFIG_ARM? > Afaic better via a new, dedicated CONFIG_HAVE_* setting, which x86 selects > but Arm doesn't. Unless we have one already, of course ... Could you please be more specific on the name you see appropriate? And do you realize that this is going to be a single user of such a setting? > Jan > Thank you, Oleksandr
On 08.02.2022 10:57, Oleksandr Andrushchenko wrote: > On 08.02.22 11:48, Jan Beulich wrote: >> On 08.02.2022 10:31, Oleksandr Andrushchenko wrote: >>> On 08.02.22 11:25, Roger Pau Monné wrote: >>>> On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote: >>>>> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev) >>>>> if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO ) >>>>> { >>>>> bars[i].type = VPCI_BAR_IO; >>>>> + >>>>> + rc = bar_ignore_access(pdev, reg, &bars[i]); >>>> This is wrong: you only want to ignore access to IO BARs for Arm, for >>>> x86 we should keep the previous behavior. Even more if you go with >>>> Jan's suggestions to make bar_ignore_access also applicable to dom0. >>> How do we want this? >>> #ifdef CONFIG_ARM? >> Afaic better via a new, dedicated CONFIG_HAVE_* setting, which x86 selects >> but Arm doesn't. Unless we have one already, of course ... > Could you please be more specific on the name you see appropriate? I'm pretty sure Linux has something similar, so I'd like to ask that you go look there. I'm sorry to say this a little bluntly, but I'm really in need of doing something beyond answering your mails (and in part re-stating the same thing again and again). > And do you realize that this is going to be a single user of such a > setting? Yes, but I'm not sure this is going to remain just a single use. Furthermore every CONFIG_<arch> is problematic as soon as a new port is being worked on. If we wanted to go with a CONFIG_<arch> here, imo it ought to be CONFIG_X86, not CONFIG_ARM, as I/O ports are really an x86-specific thing (which has propagated into other architectures in more or less strange ways, but never as truly I/O ports). Jan
On 08.02.22 12:15, Jan Beulich wrote: > On 08.02.2022 10:57, Oleksandr Andrushchenko wrote: >> On 08.02.22 11:48, Jan Beulich wrote: >>> On 08.02.2022 10:31, Oleksandr Andrushchenko wrote: >>>> On 08.02.22 11:25, Roger Pau Monné wrote: >>>>> On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote: >>>>>> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev) >>>>>> if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO ) >>>>>> { >>>>>> bars[i].type = VPCI_BAR_IO; >>>>>> + >>>>>> + rc = bar_ignore_access(pdev, reg, &bars[i]); >>>>> This is wrong: you only want to ignore access to IO BARs for Arm, for >>>>> x86 we should keep the previous behavior. Even more if you go with >>>>> Jan's suggestions to make bar_ignore_access also applicable to dom0. >>>> How do we want this? >>>> #ifdef CONFIG_ARM? >>> Afaic better via a new, dedicated CONFIG_HAVE_* setting, which x86 selects >>> but Arm doesn't. Unless we have one already, of course ... >> Could you please be more specific on the name you see appropriate? > I'm pretty sure Linux has something similar, so I'd like to ask that > you go look there. Not sure, but I can have a look > I'm sorry to say this a little bluntly, but I'm > really in need of doing something beyond answering your mails Well, if answers were to be a bit more specific and not so general some time, this could definitely be helpful and save a lot of time trying to guess what other party has in their mind. > (and > in part re-stating the same thing again and again). I have no comments on this. > >> And do you realize that this is going to be a single user of such a >> setting? > Yes, but I'm not sure this is going to remain just a single use. > Furthermore every CONFIG_<arch> is problematic as soon as a new port > is being worked on. If we wanted to go with a CONFIG_<arch> here, imo > it ought to be CONFIG_X86, not CONFIG_ARM, as I/O ports are really an > x86-specific thing (which has propagated into other architectures in > more or less strange ways, but never as truly I/O ports). I am fine using CONFIG_X86 @Roger, are you ok with that? > > Jan > Thank you, Oleksandr
On Tue, Feb 08, 2022 at 10:29:22AM +0000, Oleksandr Andrushchenko wrote: > On 08.02.22 12:15, Jan Beulich wrote: > > Yes, but I'm not sure this is going to remain just a single use. > > Furthermore every CONFIG_<arch> is problematic as soon as a new port > > is being worked on. If we wanted to go with a CONFIG_<arch> here, imo > > it ought to be CONFIG_X86, not CONFIG_ARM, as I/O ports are really an > > x86-specific thing (which has propagated into other architectures in > > more or less strange ways, but never as truly I/O ports). > I am fine using CONFIG_X86 > @Roger, are you ok with that? I guess if that's the only instance of having diverging behavior because of the lack of IO ports I'm fine with using CONFIG_X86. Thanks, Roger.
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index bd23c0274d48..2620a95ff35b 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -406,6 +406,81 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg, pci_conf_write32(pdev->sbdf, reg, val); } +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg, + uint32_t val, void *data) +{ + struct vpci_bar *bar = data; + bool hi = false; + uint64_t guest_reg = bar->guest_reg; + + if ( bar->type == VPCI_BAR_MEM64_HI ) + { + ASSERT(reg > PCI_BASE_ADDRESS_0); + bar--; + hi = true; + } + else + { + val &= PCI_BASE_ADDRESS_MEM_MASK; + val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32 + : PCI_BASE_ADDRESS_MEM_TYPE_64; + val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0; + } + + guest_reg &= ~(0xffffffffull << (hi ? 32 : 0)); + guest_reg |= (uint64_t)val << (hi ? 32 : 0); + + guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK; + + /* + * Make sure that the guest set address has the same page offset + * as the physical address on the host or otherwise things won't work as + * expected. + */ + if ( (guest_reg & (~PAGE_MASK & PCI_BASE_ADDRESS_MEM_MASK)) != + (bar->addr & ~PAGE_MASK) ) + { + gprintk(XENLOG_WARNING, + "%pp: ignored BAR %zu write with wrong page offset\n", + &pdev->sbdf, bar - pdev->vpci->header.bars + hi); + return; + } + + bar->guest_reg = guest_reg; +} + +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg, + void *data) +{ + const struct vpci_bar *bar = data; + bool hi = false; + + if ( bar->type == VPCI_BAR_MEM64_HI ) + { + ASSERT(reg > PCI_BASE_ADDRESS_0); + bar--; + hi = true; + } + + return bar->guest_reg >> (hi ? 32 : 0); +} + +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev, + unsigned int reg, void *data) +{ + return 0; +} + +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg, + struct vpci_bar *bar) +{ + if ( is_hardware_domain(pdev->domain) ) + return 0; + + return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL, + reg, 4, bar); +} + static void rom_write(const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data) { @@ -462,6 +537,7 @@ static int init_bars(struct pci_dev *pdev) struct vpci_header *header = &pdev->vpci->header; struct vpci_bar *bars = header->bars; int rc; + bool is_hwdom = is_hardware_domain(pdev->domain); switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f ) { @@ -501,8 +577,10 @@ static int init_bars(struct pci_dev *pdev) if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO ) { bars[i].type = VPCI_BAR_MEM64_HI; - rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, - 4, &bars[i]); + rc = vpci_add_register(pdev->vpci, + is_hwdom ? vpci_hw_read32 : guest_bar_read, + is_hwdom ? bar_write : guest_bar_write, + reg, 4, &bars[i]); if ( rc ) { pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd); @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev) if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO ) { bars[i].type = VPCI_BAR_IO; + + rc = bar_ignore_access(pdev, reg, &bars[i]); + if ( rc ) + return rc; + continue; } if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == @@ -535,6 +618,11 @@ static int init_bars(struct pci_dev *pdev) if ( size == 0 ) { bars[i].type = VPCI_BAR_EMPTY; + + rc = bar_ignore_access(pdev, reg, &bars[i]); + if ( rc ) + return rc; + continue; } @@ -542,8 +630,10 @@ static int init_bars(struct pci_dev *pdev) bars[i].size = size; bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH; - rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4, - &bars[i]); + rc = vpci_add_register(pdev->vpci, + is_hwdom ? vpci_hw_read32 : guest_bar_read, + is_hwdom ? bar_write : guest_bar_write, + reg, 4, &bars[i]); if ( rc ) { pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd); @@ -551,22 +641,31 @@ static int init_bars(struct pci_dev *pdev) } } - /* Check expansion ROM. */ - rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM); - if ( rc > 0 && size ) + /* Check expansion ROM: we do not handle ROM for guests. */ + if ( is_hwdom ) { - struct vpci_bar *rom = &header->bars[num_bars]; + rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM); + if ( rc > 0 && size ) + { + struct vpci_bar *rom = &header->bars[num_bars]; - rom->type = VPCI_BAR_ROM; - rom->size = size; - rom->addr = addr; - header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) & - PCI_ROM_ADDRESS_ENABLE; + rom->type = VPCI_BAR_ROM; + rom->size = size; + rom->addr = addr; + header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) & + PCI_ROM_ADDRESS_ENABLE; - rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg, - 4, rom); + rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, + rom_reg, 4, rom); + if ( rc ) + rom->type = VPCI_BAR_EMPTY; + } + } + else + { + rc = bar_ignore_access(pdev, rom_reg, &header->bars[num_bars]); if ( rc ) - rom->type = VPCI_BAR_EMPTY; + return rc; } return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0; diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h index 246307e6f5d5..270d22b85653 100644 --- a/xen/include/xen/vpci.h +++ b/xen/include/xen/vpci.h @@ -66,7 +66,10 @@ struct vpci { struct vpci_header { /* Information about the PCI BARs of this device. */ struct vpci_bar { + /* Physical (host) address. */ uint64_t addr; + /* Guest view of the BAR: address and lower bits. */ + uint64_t guest_reg; uint64_t size; enum { VPCI_BAR_EMPTY,