diff mbox series

[XEN] hvmloader: Enable MMIO and I/O decode, after all resource allocation

Message ID f7b9e16e394e7e94700ed690f0c9fbd7ce7b5c74.1586195196.git.havanur@amazon.com (mailing list archive)
State Superseded
Headers show
Series [XEN] hvmloader: Enable MMIO and I/O decode, after all resource allocation | expand

Commit Message

Shamsundara Havanur, Harsha April 6, 2020, 5:46 p.m. UTC
It was observed that PCI MMIO and/or IO BARs were programmed with
BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
register) enabled, during PCI setup phase. This resulted in
spurious and premature bus transactions as soon as the lower bar of
the 64 bit bar is programmed. It is highly recommended that BARs be
programmed whilst decode bits are cleared to avoid spurious bus
transactions.

This patch address the issue by deferring enablement of memory and
I/O decode in command register until all the resources, like interrupts
I/O and/or MMIO BARs for all the PCI device functions are programmed.
PCI bus memory and I/O space is enabled in command register after
all the resources like interrupts, I/O and/or MMIO BARs are
programmed for all valid device functions. PCI BUS MASTER is kept
disabled in the bootloader as this needs to be enabled by the guest
OS driver once it initializes and takes control of the device.

Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
Ack-by: Paul Durrant <pdurrant@amazon.com>
---
 tools/firmware/hvmloader/pci.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

Comments

Andrew Cooper April 6, 2020, 6:06 p.m. UTC | #1
On 06/04/2020 18:46, Harsha Shamsundara Havanur wrote:
> It was observed that PCI MMIO and/or IO BARs were programmed with
> BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> register) enabled, during PCI setup phase. This resulted in
> spurious and premature bus transactions as soon as the lower bar of
> the 64 bit bar is programmed. It is highly recommended that BARs be
> programmed whilst decode bits are cleared to avoid spurious bus
> transactions.

What kinds of spurious transactions?

Keeping memory and I/O decoding disabled until the BARs are set up is a
no-brainer, but busmastering is a more complicated subject.  Therefore,
it would be helpful to know exactly what you've seen in the way of
spurious transactions.

>
> This patch address the issue by deferring enablement of memory and
> I/O decode in command register until all the resources, like interrupts
> I/O and/or MMIO BARs for all the PCI device functions are programmed.
> PCI bus memory and I/O space is enabled in command register after
> all the resources like interrupts, I/O and/or MMIO BARs are
> programmed for all valid device functions. PCI BUS MASTER is kept
> disabled in the bootloader as this needs to be enabled by the guest
> OS driver once it initializes and takes control of the device.

Has this been tested with an Intel integrated graphics card?  These have
a habit of hitting a platform reset line if busmaster is ever disabled.

A lot of this will depend on what Qemu does behind the scenes, and
whether disabling busmastering gets reflected in the real device.

>
> Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
> Ack-by: Paul Durrant <pdurrant@amazon.com>

Acked-by

~Andrew
Durrant, Paul April 7, 2020, 7:33 a.m. UTC | #2
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Andrew Cooper
> Sent: 06 April 2020 19:07
> To: Harsha Shamsundara Havanur <havanur@amazon.com>; xen-devel@lists.xenproject.org
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>; Wei Liu <wl@xen.org>; Jan Beulich <jbeulich@suse.com>;
> Roger Pau Monné <roger.pau@citrix.com>
> Subject: Re: [XEN PATCH] hvmloader: Enable MMIO and I/O decode, after all resource allocation
> 
> On 06/04/2020 18:46, Harsha Shamsundara Havanur wrote:
> > It was observed that PCI MMIO and/or IO BARs were programmed with
> > BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> > register) enabled, during PCI setup phase. This resulted in
> > spurious and premature bus transactions as soon as the lower bar of
> > the 64 bit bar is programmed. It is highly recommended that BARs be
> > programmed whilst decode bits are cleared to avoid spurious bus
> > transactions.
> 
> What kinds of spurious transactions?
> 
> Keeping memory and I/O decoding disabled until the BARs are set up is a
> no-brainer, but busmastering is a more complicated subject.  Therefore,
> it would be helpful to know exactly what you've seen in the way of
> spurious transactions.
> 

I think you know of some GPU h/w that doesn't necessarily stop DMAing after an FLR. There is no reason why hvmloader, or anything else until the function driver loads, needs BME to be on. As you say mem and io decodes are no-brainers, yet without this patch hvmloader enables them after the first BAR on the device is programmed, thus causing much fun for device models when the subsequent BARs are programmed.

> >
> > This patch address the issue by deferring enablement of memory and
> > I/O decode in command register until all the resources, like interrupts
> > I/O and/or MMIO BARs for all the PCI device functions are programmed.
> > PCI bus memory and I/O space is enabled in command register after
> > all the resources like interrupts, I/O and/or MMIO BARs are
> > programmed for all valid device functions. PCI BUS MASTER is kept
> > disabled in the bootloader as this needs to be enabled by the guest
> > OS driver once it initializes and takes control of the device.
> 
> Has this been tested with an Intel integrated graphics card?  These have
> a habit of hitting a platform reset line if busmaster is ever disabled.
> 

No, we don't have suitable h/w for that AFAIK. If that is the case then we ought to quirk it.

> A lot of this will depend on what Qemu does behind the scenes, and
> whether disabling busmastering gets reflected in the real device.
> 

When I last looked at upstream QEMU modifications to the BME bit were echoed through.

> >
> > Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
> > Ack-by: Paul Durrant <pdurrant@amazon.com>
> 
> Acked-by

This was a little premature as I have not yet looked at the re-based code.

  Paul

> 
> ~Andrew
Durrant, Paul April 7, 2020, 7:48 a.m. UTC | #3
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Harsha Shamsundara Havanur
> Sent: 06 April 2020 18:47
> To: xen-devel@lists.xenproject.org
> Cc: Wei Liu <wl@xen.org>; Andrew Cooper <andrew.cooper3@citrix.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>; Jan Beulich <jbeulich@suse.com>; Harsha Shamsundara Havanur
> <havanur@amazon.com>; Roger Pau Monné <roger.pau@citrix.com>
> Subject: [XEN PATCH] hvmloader: Enable MMIO and I/O decode, after all resource allocation
> 
> It was observed that PCI MMIO and/or IO BARs were programmed with
> BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> register) enabled, during PCI setup phase. This resulted in
> spurious and premature bus transactions as soon as the lower bar of
> the 64 bit bar is programmed. It is highly recommended that BARs be
> programmed whilst decode bits are cleared to avoid spurious bus
> transactions.
> 

It's not so much spurious transactions that are the issue. I think "spurious and premature bus transactions" should be replaced with "incorrect mappings being created".

I believe the PCI spec says all three bits should be clear after reset anyway, and BAR programming whilst decodes are enabled causes problems for emulators such as QEMU which need to create and destroy mappings between the gaddr being programming into the virtual BAR and the maddr programmed into the physical BAR.
Specifically the case we see is that a 64-bit memory BAR is programmed by writing the lower half and then the upper half. After the first write the BAR is mapped to an address under 4G that happens to contain RAM, which is displaced by the mapping. After the second write the BAR is re-mapped to the intended location but the RAM displaced by the other mapping is not restored. The OS then continues to boot and function until at some point it happens to try to use that RAM at which point it suffers a page fault and crashes. It was only by noticing that the faulting address lay within the transient BAR mapping that we figured out what was happening.

> This patch address the issue by deferring enablement of memory and
> I/O decode in command register until all the resources, like interrupts
> I/O and/or MMIO BARs for all the PCI device functions are programmed.
> PCI bus memory and I/O space is enabled in command register after
> all the resources like interrupts, I/O and/or MMIO BARs are
> programmed for all valid device functions. PCI BUS MASTER is kept
> disabled in the bootloader as this needs to be enabled by the guest
> OS driver once it initializes and takes control of the device.
> 
> Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
> Ack-by: Paul Durrant <pdurrant@amazon.com>

With the comment fixed as I suggest, you can replace this with:

Reviewed-by: Paul Durrant <paul@xen.org>

> ---
>  tools/firmware/hvmloader/pci.c | 24 +++++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 0b708bf578..0f31866453 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -84,6 +84,7 @@ void pci_setup(void)
>      uint32_t vga_devfn = 256;
>      uint16_t class, vendor_id, device_id;
>      unsigned int bar, pin, link, isa_irq;
> +    uint8_t pci_devfn_decode_type[256] = {};
> 
>      /* Resources assignable to PCI devices via BARs. */
>      struct resource {
> @@ -289,9 +290,14 @@ void pci_setup(void)
>                     devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
>          }
> 
> -        /* Enable bus mastering. */
> +        /*
> +         * Disable bus mastering, memory and I/O space, which is typical device
> +         * reset state. It is recommended that BAR programming be done whilst
> +         * decode bits are cleared to avoid spurious DMAs and bus transactions.
> +         * Bus master should be enabled by guest driver when it deems fit.
> +         */
>          cmd = pci_readw(devfn, PCI_COMMAND);
> -        cmd |= PCI_COMMAND_MASTER;
> +        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
>          pci_writew(devfn, PCI_COMMAND, cmd);
>      }
> 
> @@ -503,10 +509,9 @@ void pci_setup(void)
>          if ( (bar_reg == PCI_ROM_ADDRESS) ||
>               ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>                PCI_BASE_ADDRESS_SPACE_MEMORY) )
> -            cmd |= PCI_COMMAND_MEMORY;
> +            pci_devfn_decode_type[devfn] |= PCI_COMMAND_MEMORY;
>          else
> -            cmd |= PCI_COMMAND_IO;
> -        pci_writew(devfn, PCI_COMMAND, cmd);
> +            pci_devfn_decode_type[devfn] |= PCI_COMMAND_IO;
>      }
> 
>      if ( pci_hi_mem_start )
> @@ -530,6 +535,15 @@ void pci_setup(void)
>          cmd |= PCI_COMMAND_IO;
>          pci_writew(vga_devfn, PCI_COMMAND, cmd);
>      }
> +
> +    /* Enable memory and I/O space. */
> +    for ( devfn = 0; devfn < 256; devfn++ )
> +        if ( pci_devfn_decode_type[devfn] )
> +        {
> +            cmd = pci_readw(devfn, PCI_COMMAND);
> +            cmd |= pci_devfn_decode_type[devfn];
> +            pci_writew(devfn, PCI_COMMAND, cmd);
> +        }
>  }
> 
>  /*
> --
> 2.16.6
>
Jan Beulich April 7, 2020, 7:57 a.m. UTC | #4
On 06.04.2020 19:46, Harsha Shamsundara Havanur wrote:
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -84,6 +84,7 @@ void pci_setup(void)
>      uint32_t vga_devfn = 256;
>      uint16_t class, vendor_id, device_id;
>      unsigned int bar, pin, link, isa_irq;
> +    uint8_t pci_devfn_decode_type[256] = {};

256 bytes of new stack space consumption looks quite a lot to me.
Did you consider using two 256-bit bitmaps instead, totaling to
an extra 64 bytes of stack space needed?

> It was observed that PCI MMIO and/or IO BARs were programmed with
> BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> register) enabled, during PCI setup phase. This resulted in
> spurious and premature bus transactions as soon as the lower bar of
> the 64 bit bar is programmed. It is highly recommended that BARs be
> programmed whilst decode bits are cleared to avoid spurious bus
> transactions.
> 
> This patch address the issue by deferring enablement of memory and
> I/O decode in command register until all the resources, like interrupts
> I/O and/or MMIO BARs for all the PCI device functions are programmed.
> PCI bus memory and I/O space is enabled in command register after
> all the resources like interrupts, I/O and/or MMIO BARs are
> programmed for all valid device functions. PCI BUS MASTER is kept
> disabled in the bootloader as this needs to be enabled by the guest
> OS driver once it initializes and takes control of the device.

Identifying the commit that introduced the issue, even if very old,
would be nice (bbfa22f8c9ca). From looking at current code I first
got the impression that it might have been a result of splitting a
loop, as the main issue is that the bits get set after a first
BAR got programmed, without considering that there might be more.
However, the original commit looks to have assumed that there's
at most one BAR or each type per device (which may have been true
back then for just the few emulated devices there were).

> @@ -289,9 +290,14 @@ void pci_setup(void)
>                     devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
>          }
>  
> -        /* Enable bus mastering. */
> +        /*
> +         * Disable bus mastering, memory and I/O space, which is typical device
> +         * reset state. It is recommended that BAR programming be done whilst
> +         * decode bits are cleared to avoid spurious DMAs and bus transactions.
> +         * Bus master should be enabled by guest driver when it deems fit.
> +         */
>          cmd = pci_readw(devfn, PCI_COMMAND);
> -        cmd |= PCI_COMMAND_MASTER;
> +        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
>          pci_writew(devfn, PCI_COMMAND, cmd);
>      }

I agree with Andrew that there doesn't look to be a reason to
fiddle with bus mastering here. This is the more that you don't
re-enable it later.

I'm also curious whether there are actually devices getting
handed to the domain (and hence hvmloader) with the memory
and/or I/O decode bits set. This would look to be wrong to me,
and would perhaps want fixing wherever they get set prematurely.
(This isn't to say that, to be on the safe side, we couldn't
also re-clear them here. Maybe there should be a warning issued
if either bit is set?)

Jan
Roger Pau Monne April 7, 2020, 8:06 a.m. UTC | #5
On Tue, Apr 07, 2020 at 08:48:42AM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Harsha Shamsundara Havanur
> > Sent: 06 April 2020 18:47
> > To: xen-devel@lists.xenproject.org
> > Cc: Wei Liu <wl@xen.org>; Andrew Cooper <andrew.cooper3@citrix.com>; Ian Jackson
> > <ian.jackson@eu.citrix.com>; Jan Beulich <jbeulich@suse.com>; Harsha Shamsundara Havanur
> > <havanur@amazon.com>; Roger Pau Monné <roger.pau@citrix.com>
> > Subject: [XEN PATCH] hvmloader: Enable MMIO and I/O decode, after all resource allocation
> > 
> > It was observed that PCI MMIO and/or IO BARs were programmed with
> > BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> > register) enabled, during PCI setup phase. This resulted in
> > spurious and premature bus transactions as soon as the lower bar of
> > the 64 bit bar is programmed. It is highly recommended that BARs be
> > programmed whilst decode bits are cleared to avoid spurious bus
> > transactions.
> > 
> 
> It's not so much spurious transactions that are the issue. I think "spurious and premature bus transactions" should be replaced with "incorrect mappings being created".
> 
> I believe the PCI spec says all three bits should be clear after reset anyway, and BAR programming whilst decodes are enabled causes problems for emulators such as QEMU which need to create and destroy mappings between the gaddr being programming into the virtual BAR and the maddr programmed into the physical BAR.
> Specifically the case we see is that a 64-bit memory BAR is programmed by writing the lower half and then the upper half. After the first write the BAR is mapped to an address under 4G that happens to contain RAM, which is displaced by the mapping. After the second write the BAR is re-mapped to the intended location but the RAM displaced by the other mapping is not restored. The OS then continues to boot and function until at some point it happens to try to use that RAM at which point it suffers a page fault and crashes. It was only by noticing that the faulting address lay within the transient BAR mapping that we figured out what was happening.

In order to fix this isn't it enough to just disable memory and IO
decodes, leaving bus mastering as it is?

I assume there is (or was) some reason why bus master is enabled by
hvmloader in the first place?

Thanks, Roger.
Durrant, Paul April 7, 2020, 8:14 a.m. UTC | #6
> -----Original Message-----
> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: 07 April 2020 09:07
> To: paul@xen.org
> Cc: 'Harsha Shamsundara Havanur' <havanur@amazon.com>; xen-devel@lists.xenproject.org; 'Wei Liu'
> <wl@xen.org>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>;
> 'Jan Beulich' <jbeulich@suse.com>
> Subject: Re: [XEN PATCH] hvmloader: Enable MMIO and I/O decode, after all resource allocation
> 
> On Tue, Apr 07, 2020 at 08:48:42AM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Harsha Shamsundara Havanur
> > > Sent: 06 April 2020 18:47
> > > To: xen-devel@lists.xenproject.org
> > > Cc: Wei Liu <wl@xen.org>; Andrew Cooper <andrew.cooper3@citrix.com>; Ian Jackson
> > > <ian.jackson@eu.citrix.com>; Jan Beulich <jbeulich@suse.com>; Harsha Shamsundara Havanur
> > > <havanur@amazon.com>; Roger Pau Monné <roger.pau@citrix.com>
> > > Subject: [XEN PATCH] hvmloader: Enable MMIO and I/O decode, after all resource allocation
> > >
> > > It was observed that PCI MMIO and/or IO BARs were programmed with
> > > BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> > > register) enabled, during PCI setup phase. This resulted in
> > > spurious and premature bus transactions as soon as the lower bar of
> > > the 64 bit bar is programmed. It is highly recommended that BARs be
> > > programmed whilst decode bits are cleared to avoid spurious bus
> > > transactions.
> > >
> >
> > It's not so much spurious transactions that are the issue. I think "spurious and premature bus
> transactions" should be replaced with "incorrect mappings being created".
> >
> > I believe the PCI spec says all three bits should be clear after reset anyway, and BAR programming
> whilst decodes are enabled causes problems for emulators such as QEMU which need to create and destroy
> mappings between the gaddr being programming into the virtual BAR and the maddr programmed into the
> physical BAR.
> > Specifically the case we see is that a 64-bit memory BAR is programmed by writing the lower half and
> then the upper half. After the first write the BAR is mapped to an address under 4G that happens to
> contain RAM, which is displaced by the mapping. After the second write the BAR is re-mapped to the
> intended location but the RAM displaced by the other mapping is not restored. The OS then continues to
> boot and function until at some point it happens to try to use that RAM at which point it suffers a
> page fault and crashes. It was only by noticing that the faulting address lay within the transient BAR
> mapping that we figured out what was happening.
> 
> In order to fix this isn't it enough to just disable memory and IO
> decodes, leaving bus mastering as it is?
> 
> I assume there is (or was) some reason why bus master is enabled by
> hvmloader in the first place?
> 

I admit it is a while since I went mining for the reason BME was being set but IIRC the commit that added the original code did not state why it was done.

In the past I have run with local hacks disabling it whilst playing with GPU pass-through and not noticed it causing any problems. There is an argument to say that hvmloader should perhaps leave it alone but there is no good reason I can think of why it ought to be enabling it.

  Paul

> Thanks, Roger.
Roger Pau Monne April 7, 2020, 8:25 a.m. UTC | #7
On Tue, Apr 07, 2020 at 09:14:53AM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Roger Pau Monné <roger.pau@citrix.com>
> > Sent: 07 April 2020 09:07
> > To: paul@xen.org
> > Cc: 'Harsha Shamsundara Havanur' <havanur@amazon.com>; xen-devel@lists.xenproject.org; 'Wei Liu'
> > <wl@xen.org>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>;
> > 'Jan Beulich' <jbeulich@suse.com>
> > Subject: Re: [XEN PATCH] hvmloader: Enable MMIO and I/O decode, after all resource allocation
> > 
> > On Tue, Apr 07, 2020 at 08:48:42AM +0100, Paul Durrant wrote:
> > > > -----Original Message-----
> > > > From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Harsha Shamsundara Havanur
> > > > Sent: 06 April 2020 18:47
> > > > To: xen-devel@lists.xenproject.org
> > > > Cc: Wei Liu <wl@xen.org>; Andrew Cooper <andrew.cooper3@citrix.com>; Ian Jackson
> > > > <ian.jackson@eu.citrix.com>; Jan Beulich <jbeulich@suse.com>; Harsha Shamsundara Havanur
> > > > <havanur@amazon.com>; Roger Pau Monné <roger.pau@citrix.com>
> > > > Subject: [XEN PATCH] hvmloader: Enable MMIO and I/O decode, after all resource allocation
> > > >
> > > > It was observed that PCI MMIO and/or IO BARs were programmed with
> > > > BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> > > > register) enabled, during PCI setup phase. This resulted in
> > > > spurious and premature bus transactions as soon as the lower bar of
> > > > the 64 bit bar is programmed. It is highly recommended that BARs be
> > > > programmed whilst decode bits are cleared to avoid spurious bus
> > > > transactions.
> > > >
> > >
> > > It's not so much spurious transactions that are the issue. I think "spurious and premature bus
> > transactions" should be replaced with "incorrect mappings being created".
> > >
> > > I believe the PCI spec says all three bits should be clear after reset anyway, and BAR programming
> > whilst decodes are enabled causes problems for emulators such as QEMU which need to create and destroy
> > mappings between the gaddr being programming into the virtual BAR and the maddr programmed into the
> > physical BAR.
> > > Specifically the case we see is that a 64-bit memory BAR is programmed by writing the lower half and
> > then the upper half. After the first write the BAR is mapped to an address under 4G that happens to
> > contain RAM, which is displaced by the mapping. After the second write the BAR is re-mapped to the
> > intended location but the RAM displaced by the other mapping is not restored. The OS then continues to
> > boot and function until at some point it happens to try to use that RAM at which point it suffers a
> > page fault and crashes. It was only by noticing that the faulting address lay within the transient BAR
> > mapping that we figured out what was happening.
> > 
> > In order to fix this isn't it enough to just disable memory and IO
> > decodes, leaving bus mastering as it is?
> > 
> > I assume there is (or was) some reason why bus master is enabled by
> > hvmloader in the first place?
> > 
> 
> I admit it is a while since I went mining for the reason BME was being set but IIRC the commit that added the original code did not state why it was done.
> 
> In the past I have run with local hacks disabling it whilst playing with GPU pass-through and not noticed it causing any problems. There is an argument to say that hvmloader should perhaps leave it alone but there is no good reason I can think of why it ought to be enabling it.

IMO forcefully disabling decodings (and enabling them afterwards if
already enabled) and leaving bus mastering as-is would be better.

Thanks, Roger.
Jan Beulich April 7, 2020, 8:27 a.m. UTC | #8
On 07.04.2020 10:14, Paul Durrant wrote:
>> -----Original Message-----
>> From: Roger Pau Monné <roger.pau@citrix.com>
>> Sent: 07 April 2020 09:07
>> To: paul@xen.org
>> Cc: 'Harsha Shamsundara Havanur' <havanur@amazon.com>; xen-devel@lists.xenproject.org; 'Wei Liu'
>> <wl@xen.org>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>;
>> 'Jan Beulich' <jbeulich@suse.com>
>> Subject: Re: [XEN PATCH] hvmloader: Enable MMIO and I/O decode, after all resource allocation
>>
>> On Tue, Apr 07, 2020 at 08:48:42AM +0100, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Harsha Shamsundara Havanur
>>>> Sent: 06 April 2020 18:47
>>>> To: xen-devel@lists.xenproject.org
>>>> Cc: Wei Liu <wl@xen.org>; Andrew Cooper <andrew.cooper3@citrix.com>; Ian Jackson
>>>> <ian.jackson@eu.citrix.com>; Jan Beulich <jbeulich@suse.com>; Harsha Shamsundara Havanur
>>>> <havanur@amazon.com>; Roger Pau Monné <roger.pau@citrix.com>
>>>> Subject: [XEN PATCH] hvmloader: Enable MMIO and I/O decode, after all resource allocation
>>>>
>>>> It was observed that PCI MMIO and/or IO BARs were programmed with
>>>> BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
>>>> register) enabled, during PCI setup phase. This resulted in
>>>> spurious and premature bus transactions as soon as the lower bar of
>>>> the 64 bit bar is programmed. It is highly recommended that BARs be
>>>> programmed whilst decode bits are cleared to avoid spurious bus
>>>> transactions.
>>>>
>>>
>>> It's not so much spurious transactions that are the issue. I think "spurious and premature bus
>> transactions" should be replaced with "incorrect mappings being created".
>>>
>>> I believe the PCI spec says all three bits should be clear after reset anyway, and BAR programming
>> whilst decodes are enabled causes problems for emulators such as QEMU which need to create and destroy
>> mappings between the gaddr being programming into the virtual BAR and the maddr programmed into the
>> physical BAR.
>>> Specifically the case we see is that a 64-bit memory BAR is programmed by writing the lower half and
>> then the upper half. After the first write the BAR is mapped to an address under 4G that happens to
>> contain RAM, which is displaced by the mapping. After the second write the BAR is re-mapped to the
>> intended location but the RAM displaced by the other mapping is not restored. The OS then continues to
>> boot and function until at some point it happens to try to use that RAM at which point it suffers a
>> page fault and crashes. It was only by noticing that the faulting address lay within the transient BAR
>> mapping that we figured out what was happening.
>>
>> In order to fix this isn't it enough to just disable memory and IO
>> decodes, leaving bus mastering as it is?
>>
>> I assume there is (or was) some reason why bus master is enabled by
>> hvmloader in the first place?
>>
> 
> I admit it is a while since I went mining for the reason BME
> was being set but IIRC the commit that added the original code
> did not state why it was done.

I can second this observation, but this is not a reason to drop
the enabling again without suitable justification. If there are
babbling devices, perhaps they should be quirked rather than,
as you did suggest in reply to Andrew, ones which require it
enabled? (If such a requirement indeed exists, I assume they
would get handed to hvmloader with the bit already set, in
which case a middle ground may be to simply leave the bit
alone rather than force-enabling or force-disabling it. But
again the commit adding the enabling would stand against this,
because it likely was done for a reason even if that reason is
not stated in the commit.)

Jan
Jan Beulich April 7, 2020, 8:29 a.m. UTC | #9
On 07.04.2020 10:25, Roger Pau Monné wrote:
> IMO forcefully disabling decodings (and enabling them afterwards if
> already enabled) and leaving bus mastering as-is would be better.

Limiting this to the "already enabled" case may not be enough:
Devices needed for guest booting may need them to be enabled
even if prior to BAR assignment they were turned off. That's
what a BIOS would typically do, too. I've seen quite a few
BIOS setups actually where one could chose whether the BIOS
would do so just for what it recognized as boot devices, or
for all of them.

Jan
diff mbox series

Patch

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 0b708bf578..0f31866453 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -84,6 +84,7 @@  void pci_setup(void)
     uint32_t vga_devfn = 256;
     uint16_t class, vendor_id, device_id;
     unsigned int bar, pin, link, isa_irq;
+    uint8_t pci_devfn_decode_type[256] = {};
 
     /* Resources assignable to PCI devices via BARs. */
     struct resource {
@@ -289,9 +290,14 @@  void pci_setup(void)
                    devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
         }
 
-        /* Enable bus mastering. */
+        /*
+         * Disable bus mastering, memory and I/O space, which is typical device
+         * reset state. It is recommended that BAR programming be done whilst
+         * decode bits are cleared to avoid spurious DMAs and bus transactions.
+         * Bus master should be enabled by guest driver when it deems fit.
+         */
         cmd = pci_readw(devfn, PCI_COMMAND);
-        cmd |= PCI_COMMAND_MASTER;
+        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
         pci_writew(devfn, PCI_COMMAND, cmd);
     }
 
@@ -503,10 +509,9 @@  void pci_setup(void)
         if ( (bar_reg == PCI_ROM_ADDRESS) ||
              ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
               PCI_BASE_ADDRESS_SPACE_MEMORY) )
-            cmd |= PCI_COMMAND_MEMORY;
+            pci_devfn_decode_type[devfn] |= PCI_COMMAND_MEMORY;
         else
-            cmd |= PCI_COMMAND_IO;
-        pci_writew(devfn, PCI_COMMAND, cmd);
+            pci_devfn_decode_type[devfn] |= PCI_COMMAND_IO;
     }
 
     if ( pci_hi_mem_start )
@@ -530,6 +535,15 @@  void pci_setup(void)
         cmd |= PCI_COMMAND_IO;
         pci_writew(vga_devfn, PCI_COMMAND, cmd);
     }
+
+    /* Enable memory and I/O space. */
+    for ( devfn = 0; devfn < 256; devfn++ )
+        if ( pci_devfn_decode_type[devfn] )
+        {
+            cmd = pci_readw(devfn, PCI_COMMAND);
+            cmd |= pci_devfn_decode_type[devfn];
+            pci_writew(devfn, PCI_COMMAND, cmd);
+        }
 }
 
 /*