diff mbox

[kvm-unit-tests,v7,06/13] pci: Rework pci_bar_addr()

Message ID 38dc0bd6c1350568f4978dba3373249753944a61.1471434672.git.agordeev@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alexander Gordeev Aug. 17, 2016, 12:07 p.m. UTC
This update makes pci_bar_addr() interface 64 bit BARs aware and
introduces a concept of PCI address translation.

An architecutre should implement pci_translate_addr() interface
in order to provide mapping between PCI bus address and CPU
physical address.

Cc: Thomas Huth <thuth@redhat.com>
Cc: Andrew Jones <drjones@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 lib/pci.c         | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++----
 lib/pci.h         | 17 ++++++++++++-
 lib/x86/asm/pci.h |  6 +++++
 3 files changed, 90 insertions(+), 6 deletions(-)

Comments

Peter Xu Sept. 23, 2016, 7:14 a.m. UTC | #1
Hi, Alex,

I see that we are adding ARM tests in the series as well, so I am
thinking whether this work is a prepare work to test ARM SMMU?

Also, some comments below...

On Wed, Aug 17, 2016 at 02:07:07PM +0200, Alexander Gordeev wrote:

[...]

> -unsigned long pci_bar_addr(pcidevaddr_t dev, int bar_num)
> +phys_addr_t pci_bar_addr(pcidevaddr_t dev, int bar_num)
>  {
>  	uint32_t bar = pci_bar_get(dev, bar_num);
> +	uint32_t mask = pci_bar_mask(bar);
> +	uint64_t addr = bar & mask;
>  
> -	if (bar & PCI_BASE_ADDRESS_SPACE_IO)
> -		return bar & PCI_BASE_ADDRESS_IO_MASK;
> -	else
> -		return bar & PCI_BASE_ADDRESS_MEM_MASK;
> +	if (pci_bar_is64(dev, bar_num))
> +		addr |= (uint64_t)pci_bar_get(dev, bar_num + 1) << 32;
> +
> +	return pci_translate_addr(dev, addr);

Raw question: do we need to translate bar addresses as well?

[...]

> +static inline
> +phys_addr_t pci_translate_addr(pcidevaddr_t dev __unused, uint64_t addr)
> +{
> +    return addr;
> +}
> +

We are not using dev here (and in following patches), I don't know
ARM, but at least x86 will need this to translate IOVA into PA (in
other words, each device can have its own IO address space). If this
codes are for common, shall we consider that as well?

Thanks.

-- peterx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Jones Sept. 23, 2016, 8:51 a.m. UTC | #2
On Fri, Sep 23, 2016 at 03:14:04PM +0800, Peter Xu wrote:
> Hi, Alex,
> 
> I see that we are adding ARM tests in the series as well, so I am
> thinking whether this work is a prepare work to test ARM SMMU?

That would be excellent. Adding Eric, he may be able to help here.

drew

> 
> Also, some comments below...
> 
> On Wed, Aug 17, 2016 at 02:07:07PM +0200, Alexander Gordeev wrote:
> 
> [...]
> 
> > -unsigned long pci_bar_addr(pcidevaddr_t dev, int bar_num)
> > +phys_addr_t pci_bar_addr(pcidevaddr_t dev, int bar_num)
> >  {
> >  	uint32_t bar = pci_bar_get(dev, bar_num);
> > +	uint32_t mask = pci_bar_mask(bar);
> > +	uint64_t addr = bar & mask;
> >  
> > -	if (bar & PCI_BASE_ADDRESS_SPACE_IO)
> > -		return bar & PCI_BASE_ADDRESS_IO_MASK;
> > -	else
> > -		return bar & PCI_BASE_ADDRESS_MEM_MASK;
> > +	if (pci_bar_is64(dev, bar_num))
> > +		addr |= (uint64_t)pci_bar_get(dev, bar_num + 1) << 32;
> > +
> > +	return pci_translate_addr(dev, addr);
> 
> Raw question: do we need to translate bar addresses as well?
> 
> [...]
> 
> > +static inline
> > +phys_addr_t pci_translate_addr(pcidevaddr_t dev __unused, uint64_t addr)
> > +{
> > +    return addr;
> > +}
> > +
> 
> We are not using dev here (and in following patches), I don't know
> ARM, but at least x86 will need this to translate IOVA into PA (in
> other words, each device can have its own IO address space). If this
> codes are for common, shall we consider that as well?
> 
> Thanks.
> 
> -- peterx
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Xu Sept. 23, 2016, 8:58 a.m. UTC | #3
On Fri, Sep 23, 2016 at 10:51:06AM +0200, Andrew Jones wrote:
> On Fri, Sep 23, 2016 at 03:14:04PM +0800, Peter Xu wrote:
> > Hi, Alex,
> > 
> > I see that we are adding ARM tests in the series as well, so I am
> > thinking whether this work is a prepare work to test ARM SMMU?
> 
> That would be excellent. Adding Eric, he may be able to help here.

I should say "Alex" rather than "we". :)

And I posted this question since I see that Alex is adding addr
translation logic to PCI. That's what I am trying to do for unit tests
for Intel IOMMU devices. So I asked.

Though (as mentioned in the comment) I don't know whether PCI bar
needs translation. I thought only DMA requests are required to be
translated in device address space. Please correct if I am wrong.

Thanks,

-- peterx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Gordeev Oct. 12, 2016, 2:37 p.m. UTC | #4
On Fri, Sep 23, 2016 at 03:14:04PM +0800, Peter Xu wrote:
> > -unsigned long pci_bar_addr(pcidevaddr_t dev, int bar_num)
> > +phys_addr_t pci_bar_addr(pcidevaddr_t dev, int bar_num)
> >  {
> >  	uint32_t bar = pci_bar_get(dev, bar_num);
> > +	uint32_t mask = pci_bar_mask(bar);
> > +	uint64_t addr = bar & mask;
> >  
> > -	if (bar & PCI_BASE_ADDRESS_SPACE_IO)
> > -		return bar & PCI_BASE_ADDRESS_IO_MASK;
> > -	else
> > -		return bar & PCI_BASE_ADDRESS_MEM_MASK;
> > +	if (pci_bar_is64(dev, bar_num))
> > +		addr |= (uint64_t)pci_bar_get(dev, bar_num + 1) << 32;
> > +
> > +	return pci_translate_addr(dev, addr);
> 
> Raw question: do we need to translate bar addresses as well?

I believe, yes.

Unless we always have identity mapping between PCI address space and
CPU physical address space I can not realize how could it be done
otherwise. But even if we were, I would leave the translation routine
for clarity.

> [...]
> 
> > +static inline
> > +phys_addr_t pci_translate_addr(pcidevaddr_t dev __unused, uint64_t addr)
> > +{
> > +    return addr;
> > +}
> > +
> 
> We are not using dev here (and in following patches), I don't know
> ARM, but at least x86 will need this to translate IOVA into PA (in
> other words, each device can have its own IO address space). If this
> codes are for common, shall we consider that as well?

Could this function be extended or the prototype blocks the proper
implementation? (I guess, I will get the answer once I look into your
works).

> Thanks.
> 
> -- peterx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Xu Oct. 13, 2016, 6:40 a.m. UTC | #5
On Wed, Oct 12, 2016 at 04:37:54PM +0200, Alexander Gordeev wrote:
> On Fri, Sep 23, 2016 at 03:14:04PM +0800, Peter Xu wrote:
> > > -unsigned long pci_bar_addr(pcidevaddr_t dev, int bar_num)
> > > +phys_addr_t pci_bar_addr(pcidevaddr_t dev, int bar_num)
> > >  {
> > >  	uint32_t bar = pci_bar_get(dev, bar_num);
> > > +	uint32_t mask = pci_bar_mask(bar);
> > > +	uint64_t addr = bar & mask;
> > >  
> > > -	if (bar & PCI_BASE_ADDRESS_SPACE_IO)
> > > -		return bar & PCI_BASE_ADDRESS_IO_MASK;
> > > -	else
> > > -		return bar & PCI_BASE_ADDRESS_MEM_MASK;
> > > +	if (pci_bar_is64(dev, bar_num))
> > > +		addr |= (uint64_t)pci_bar_get(dev, bar_num + 1) << 32;
> > > +
> > > +	return pci_translate_addr(dev, addr);
> > 
> > Raw question: do we need to translate bar addresses as well?
> 
> I believe, yes.
> 
> Unless we always have identity mapping between PCI address space and
> CPU physical address space I can not realize how could it be done
> otherwise. But even if we were, I would leave the translation routine
> for clarity.

Sorry I didn't quite catch your point. Are we talking about IOMMU
address remapping here? IMHO BAR addresses are from CPU's point of
view. It's only used by CPU, not device. In that case, BAR address
should not be translated at least by IOMMU (no matter for x86/arm or
whatever).

Take Linux as example: pci_ioremap_bar() is responsible to be called
for any PCI drivers to map device memory bars into kernel virtual
address space. Basically it does:

void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
{
	struct resource *res = &pdev->resource[bar];
	return ioremap_nocache(res->start, resource_size(res));
}

So as it is written: I believe we don't translate the bar address
(which should be res->start). We use it as physical address.

Or, do you mean other kinds of translation that I don't aware of?

Another silly question: I see pci_translate_addr() defined in both
lib/x86/asm/pci.h and lib/asm-generic/pci-host-bridge.h. How's that
working?

> 
> > [...]
> > 
> > > +static inline
> > > +phys_addr_t pci_translate_addr(pcidevaddr_t dev __unused, uint64_t addr)
> > > +{
> > > +    return addr;
> > > +}
> > > +
> > 
> > We are not using dev here (and in following patches), I don't know
> > ARM, but at least x86 will need this to translate IOVA into PA (in
> > other words, each device can have its own IO address space). If this
> > codes are for common, shall we consider that as well?
> 
> Could this function be extended or the prototype blocks the proper
> implementation? (I guess, I will get the answer once I look into your
> works).

Yes, it can be extended.

Thanks,

-- peterx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Gordeev Oct. 13, 2016, 2:16 p.m. UTC | #6
On Thu, Oct 13, 2016 at 02:40:35PM +0800, Peter Xu wrote:
> > > > +	return pci_translate_addr(dev, addr);
> > > 
> > > Raw question: do we need to translate bar addresses as well?
> > 
> > I believe, yes.
> > 
> > Unless we always have identity mapping between PCI address space and
> > CPU physical address space I can not realize how could it be done
> > otherwise. But even if we were, I would leave the translation routine
> > for clarity.
> 
> Sorry I didn't quite catch your point. Are we talking about IOMMU
> address remapping here? IMHO BAR addresses are from CPU's point of
> view. It's only used by CPU, not device. In that case, BAR address
> should not be translated at least by IOMMU (no matter for x86/arm or
> whatever).
> 
> Take Linux as example: pci_ioremap_bar() is responsible to be called
> for any PCI drivers to map device memory bars into kernel virtual
> address space. Basically it does:
> 
> void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> {
> 	struct resource *res = &pdev->resource[bar];
> 	return ioremap_nocache(res->start, resource_size(res));
> }
> 
> So as it is written: I believe we don't translate the bar address
> (which should be res->start). We use it as physical address.
> 
> Or, do you mean other kinds of translation that I don't aware of?

Yes, I mean translation from PCI bus address space to CPU physical
address space. These two busses are different and hence need a
translation. I assume Linux pci_dev::resource[] have translated
address, but it is not what PCI devices see. Unless I do not terribly
missing somethig, BAR addresses is what a device sees on its AD[0..31]
pins.

> Another silly question: I see pci_translate_addr() defined in both
> lib/x86/asm/pci.h and lib/asm-generic/pci-host-bridge.h. How's that
> working?

So an architecutre should implement pci_translate_addr()  in order
to provide mapping between PCI bus address and CPU physical address.
These two are the implementations.

Thanks!

> Thanks,
> 
> -- peterx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Xu Oct. 14, 2016, 6:23 a.m. UTC | #7
On Thu, Oct 13, 2016 at 04:16:03PM +0200, Alexander Gordeev wrote:
> On Thu, Oct 13, 2016 at 02:40:35PM +0800, Peter Xu wrote:
> > > > > +	return pci_translate_addr(dev, addr);
> > > > 
> > > > Raw question: do we need to translate bar addresses as well?
> > > 
> > > I believe, yes.
> > > 
> > > Unless we always have identity mapping between PCI address space and
> > > CPU physical address space I can not realize how could it be done
> > > otherwise. But even if we were, I would leave the translation routine
> > > for clarity.
> > 
> > Sorry I didn't quite catch your point. Are we talking about IOMMU
> > address remapping here? IMHO BAR addresses are from CPU's point of
> > view. It's only used by CPU, not device. In that case, BAR address
> > should not be translated at least by IOMMU (no matter for x86/arm or
> > whatever).
> > 
> > Take Linux as example: pci_ioremap_bar() is responsible to be called
> > for any PCI drivers to map device memory bars into kernel virtual
> > address space. Basically it does:
> > 
> > void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> > {
> > 	struct resource *res = &pdev->resource[bar];
> > 	return ioremap_nocache(res->start, resource_size(res));
> > }
> > 
> > So as it is written: I believe we don't translate the bar address
> > (which should be res->start). We use it as physical address.
> > 
> > Or, do you mean other kinds of translation that I don't aware of?
> 
> Yes, I mean translation from PCI bus address space to CPU physical
> address space. These two busses are different and hence need a
> translation. I assume Linux pci_dev::resource[] have translated
> address, but it is not what PCI devices see. Unless I do not terribly
> missing somethig, BAR addresses is what a device sees on its AD[0..31]
> pins.

I believe pci_dev::resource[] should be assigned by BIOS or something
before Linux. At that time, IOMMU is possibly even not inited. So no
chance for a translation at all.

If you see PCI Local Bus Spec Rev 3.0, chap 6.2.5.1:

    Power-up software needs to build a consistent address map before
    booting the machine to an operating system. This means it has to
    determine how much memory is in the system, and how much address
    space the I/O controllers in the system require. After determining
    this information, power-up software can map the I/O controllers
    into reasonable locations and proceed with system boot. In order
    to do this mapping in a device independent manner, the base
    registers for this mapping are placed in the predefined header
    portion of Configuration Space.

BARs for each PCI devices should be pre-allocated during power-up
software, and a consistent map is built with the knowledge of existing
RAM in the system.

If you boot a VM with/without IOMMU, you'll see that BAR addresses
won't change before/after enabling IOMMU.

Thanks,

-- peterx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Jones Oct. 14, 2016, 6:55 a.m. UTC | #8
Hi Peter,

On Fri, Oct 14, 2016 at 02:23:55PM +0800, Peter Xu wrote:
> On Thu, Oct 13, 2016 at 04:16:03PM +0200, Alexander Gordeev wrote:
> > On Thu, Oct 13, 2016 at 02:40:35PM +0800, Peter Xu wrote:
> > > > > > +	return pci_translate_addr(dev, addr);
> > > > > 
> > > > > Raw question: do we need to translate bar addresses as well?
> > > > 
> > > > I believe, yes.
> > > > 
> > > > Unless we always have identity mapping between PCI address space and
> > > > CPU physical address space I can not realize how could it be done
> > > > otherwise. But even if we were, I would leave the translation routine
> > > > for clarity.
> > > 
> > > Sorry I didn't quite catch your point. Are we talking about IOMMU
> > > address remapping here? IMHO BAR addresses are from CPU's point of
> > > view. It's only used by CPU, not device. In that case, BAR address
> > > should not be translated at least by IOMMU (no matter for x86/arm or
> > > whatever).
> > > 
> > > Take Linux as example: pci_ioremap_bar() is responsible to be called
> > > for any PCI drivers to map device memory bars into kernel virtual
> > > address space. Basically it does:
> > > 
> > > void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> > > {
> > > 	struct resource *res = &pdev->resource[bar];
> > > 	return ioremap_nocache(res->start, resource_size(res));
> > > }
> > > 
> > > So as it is written: I believe we don't translate the bar address
> > > (which should be res->start). We use it as physical address.
> > > 
> > > Or, do you mean other kinds of translation that I don't aware of?
> > 
> > Yes, I mean translation from PCI bus address space to CPU physical
> > address space. These two busses are different and hence need a
> > translation. I assume Linux pci_dev::resource[] have translated
> > address, but it is not what PCI devices see. Unless I do not terribly
> > missing somethig, BAR addresses is what a device sees on its AD[0..31]
> > pins.
> 
> I believe pci_dev::resource[] should be assigned by BIOS or something
> before Linux. At that time, IOMMU is possibly even not inited. So no
> chance for a translation at all.

kvm-unit-tests != Linux

kvm-unit-tests/arm doesn't have a bootloader at all (not counting QEMU
initializing registers, and a handful of QEMU generated instructions
that gives us a kick)

Anyway, I'm glad you're reviewing this series (my PCI skills are
minimal), but you'll have to review it in the right context. In
this case, more of a seabios context.

Thanks,
drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Xu Oct. 14, 2016, 9:09 a.m. UTC | #9
On Fri, Oct 14, 2016 at 08:55:20AM +0200, Andrew Jones wrote:
> Hi Peter,
> 
> On Fri, Oct 14, 2016 at 02:23:55PM +0800, Peter Xu wrote:
> > On Thu, Oct 13, 2016 at 04:16:03PM +0200, Alexander Gordeev wrote:
> > > On Thu, Oct 13, 2016 at 02:40:35PM +0800, Peter Xu wrote:
> > > > > > > +	return pci_translate_addr(dev, addr);
> > > > > > 
> > > > > > Raw question: do we need to translate bar addresses as well?
> > > > > 
> > > > > I believe, yes.
> > > > > 
> > > > > Unless we always have identity mapping between PCI address space and
> > > > > CPU physical address space I can not realize how could it be done
> > > > > otherwise. But even if we were, I would leave the translation routine
> > > > > for clarity.
> > > > 
> > > > Sorry I didn't quite catch your point. Are we talking about IOMMU
> > > > address remapping here? IMHO BAR addresses are from CPU's point of
> > > > view. It's only used by CPU, not device. In that case, BAR address
> > > > should not be translated at least by IOMMU (no matter for x86/arm or
> > > > whatever).
> > > > 
> > > > Take Linux as example: pci_ioremap_bar() is responsible to be called
> > > > for any PCI drivers to map device memory bars into kernel virtual
> > > > address space. Basically it does:
> > > > 
> > > > void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> > > > {
> > > > 	struct resource *res = &pdev->resource[bar];
> > > > 	return ioremap_nocache(res->start, resource_size(res));
> > > > }
> > > > 
> > > > So as it is written: I believe we don't translate the bar address
> > > > (which should be res->start). We use it as physical address.
> > > > 
> > > > Or, do you mean other kinds of translation that I don't aware of?
> > > 
> > > Yes, I mean translation from PCI bus address space to CPU physical
> > > address space. These two busses are different and hence need a
> > > translation. I assume Linux pci_dev::resource[] have translated
> > > address, but it is not what PCI devices see. Unless I do not terribly
> > > missing somethig, BAR addresses is what a device sees on its AD[0..31]
> > > pins.
> > 
> > I believe pci_dev::resource[] should be assigned by BIOS or something
> > before Linux. At that time, IOMMU is possibly even not inited. So no
> > chance for a translation at all.
> 
> kvm-unit-tests != Linux
> 
> kvm-unit-tests/arm doesn't have a bootloader at all (not counting QEMU
> initializing registers, and a handful of QEMU generated instructions
> that gives us a kick)
> 
> Anyway, I'm glad you're reviewing this series (my PCI skills are
> minimal), but you'll have to review it in the right context. In
> this case, more of a seabios context.

Thank you for pointing out. :-)

I know little about PCI as well, just want to know the fact on how we
should treat BAR addresses. So I'd say my comments are more like
questions rather than I disagree with the changes. E.g., even if we
don't have BIOS, do we really need to translate BAR addresses on ARM?

Sorry if I brought too much noise to this thread. Looking forward to
Alex's further works.

Thanks!

-- peterx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Gordeev Oct. 14, 2016, 12:37 p.m. UTC | #10
On Fri, Oct 14, 2016 at 02:23:55PM +0800, Peter Xu wrote:
> On Thu, Oct 13, 2016 at 04:16:03PM +0200, Alexander Gordeev wrote:
> > On Thu, Oct 13, 2016 at 02:40:35PM +0800, Peter Xu wrote:
> > > > > > +	return pci_translate_addr(dev, addr);
> > > > > 
> > > > > Raw question: do we need to translate bar addresses as well?
> > > > 
> > > > I believe, yes.
> > > > 
> > > > Unless we always have identity mapping between PCI address space and
> > > > CPU physical address space I can not realize how could it be done
> > > > otherwise. But even if we were, I would leave the translation routine
> > > > for clarity.
> > > 
> > > Sorry I didn't quite catch your point. Are we talking about IOMMU
> > > address remapping here? IMHO BAR addresses are from CPU's point of
> > > view. It's only used by CPU, not device. In that case, BAR address
> > > should not be translated at least by IOMMU (no matter for x86/arm or
> > > whatever).
> > > 
> > > Take Linux as example: pci_ioremap_bar() is responsible to be called
> > > for any PCI drivers to map device memory bars into kernel virtual
> > > address space. Basically it does:
> > > 
> > > void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> > > {
> > > 	struct resource *res = &pdev->resource[bar];
> > > 	return ioremap_nocache(res->start, resource_size(res));
> > > }
> > > 
> > > So as it is written: I believe we don't translate the bar address
> > > (which should be res->start). We use it as physical address.
> > > 
> > > Or, do you mean other kinds of translation that I don't aware of?
> > 
> > Yes, I mean translation from PCI bus address space to CPU physical
> > address space. These two busses are different and hence need a
> > translation. I assume Linux pci_dev::resource[] have translated
> > address, but it is not what PCI devices see. Unless I do not terribly
> > missing somethig, BAR addresses is what a device sees on its AD[0..31]
> > pins.
> 
> I believe pci_dev::resource[] should be assigned by BIOS or something
> before Linux. At that time, IOMMU is possibly even not inited. So no
> chance for a translation at all.

So pci-host-generic.c is "BIOS or something" in this context. There is
no contradiction with the your excerpt. Please note "Figure 1-2: PCI
System Block Diagram" in the spec. The CPU and PCI busses are two
physically different busses interconnected by a bridge.

A CPU data access to PCI devices initiates data access on a PCI bus,
but adresses seen by the CPU and PCI devices do not necessarily match.

So in case of x86 they do (lib/x86/asm/pci.h):

static inline
phys_addr_t pci_translate_addr(pcidevaddr_t dev __unused, uint64_t addr)
{
	return addr;
}

But in case of ARM they might not (lib/asm-generic/pci-host-bridge.h):

static inline
phys_addr_t pci_translate_addr(pcidevaddr_t dev __unused, uint64_t addr)
{
	/*
	 * Assume we only have single PCI host bridge in a system.
	 */
	return pci_host_bridge_get_paddr(addr);
}

> If you see PCI Local Bus Spec Rev 3.0, chap 6.2.5.1:
> 
>     Power-up software needs to build a consistent address map before
>     booting the machine to an operating system. This means it has to
>     determine how much memory is in the system, and how much address
>     space the I/O controllers in the system require. After determining
>     this information, power-up software can map the I/O controllers
>     into reasonable locations and proceed with system boot. In order
>     to do this mapping in a device independent manner, the base
>     registers for this mapping are placed in the predefined header
>     portion of Configuration Space.
> 
> BARs for each PCI devices should be pre-allocated during power-up
> software, and a consistent map is built with the knowledge of existing
> RAM in the system.

Yep, see how pci_probe() / pci_alloc_resource() allocate and assign
BARs to each device. It is the case for PPC/ARM but in case of x86
it is BIOS who does it.

> If you boot a VM with/without IOMMU, you'll see that BAR addresses
> won't change before/after enabling IOMMU.

BAR addresses are in PCI bus address space. It sounds quite logical.
(I am not experienced with IOMMU address translation, though).

> Thanks,
> 
> -- peterx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Xu Oct. 19, 2016, 3:46 a.m. UTC | #11
On Fri, Oct 14, 2016 at 02:37:25PM +0200, Alexander Gordeev wrote:

[...]

> So pci-host-generic.c is "BIOS or something" in this context. There is
> no contradiction with the your excerpt. Please note "Figure 1-2: PCI
> System Block Diagram" in the spec. The CPU and PCI busses are two
> physically different busses interconnected by a bridge.
> 
> A CPU data access to PCI devices initiates data access on a PCI bus,
> but adresses seen by the CPU and PCI devices do not necessarily match.
> 
> So in case of x86 they do (lib/x86/asm/pci.h):
> 
> static inline
> phys_addr_t pci_translate_addr(pcidevaddr_t dev __unused, uint64_t addr)
> {
> 	return addr;
> }
> 
> But in case of ARM they might not (lib/asm-generic/pci-host-bridge.h):
> 
> static inline
> phys_addr_t pci_translate_addr(pcidevaddr_t dev __unused, uint64_t addr)
> {
> 	/*
> 	 * Assume we only have single PCI host bridge in a system.
> 	 */
> 	return pci_host_bridge_get_paddr(addr);
> }
> 
> > If you see PCI Local Bus Spec Rev 3.0, chap 6.2.5.1:
> > 
> >     Power-up software needs to build a consistent address map before
> >     booting the machine to an operating system. This means it has to
> >     determine how much memory is in the system, and how much address
> >     space the I/O controllers in the system require. After determining
> >     this information, power-up software can map the I/O controllers
> >     into reasonable locations and proceed with system boot. In order
> >     to do this mapping in a device independent manner, the base
> >     registers for this mapping are placed in the predefined header
> >     portion of Configuration Space.
> > 
> > BARs for each PCI devices should be pre-allocated during power-up
> > software, and a consistent map is built with the knowledge of existing
> > RAM in the system.
> 
> Yep, see how pci_probe() / pci_alloc_resource() allocate and assign
> BARs to each device. It is the case for PPC/ARM but in case of x86
> it is BIOS who does it.
> 
> > If you boot a VM with/without IOMMU, you'll see that BAR addresses
> > won't change before/after enabling IOMMU.
> 
> BAR addresses are in PCI bus address space. It sounds quite logical.
> (I am not experienced with IOMMU address translation, though).

Thank you for the thorough explanation. I am still not sure I fully
understand the whole thing, but looks like the translation here is not
the same as x86 IOMMU address translations, and x86 is special in this
case (BAR addresses should be the same as physical addresses in x86,
while not for ARM/PPC).

I am sorry for the noise. Looking forward to your next version. Thanks!

-- peterx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/lib/pci.c b/lib/pci.c
index ce481bbfadb6..e69e828523d4 100644
--- a/lib/pci.c
+++ b/lib/pci.c
@@ -21,19 +21,71 @@  pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id)
 	return PCIDEVADDR_INVALID;
 }
 
+static uint32_t pci_bar_mask(uint32_t bar)
+{
+	return (bar & PCI_BASE_ADDRESS_SPACE_IO) ?
+		PCI_BASE_ADDRESS_IO_MASK : PCI_BASE_ADDRESS_MEM_MASK;
+}
+
 static uint32_t pci_bar_get(pcidevaddr_t dev, int bar_num)
 {
 	return pci_config_readl(dev, PCI_BASE_ADDRESS_0 + bar_num * 4);
 }
 
-unsigned long pci_bar_addr(pcidevaddr_t dev, int bar_num)
+phys_addr_t pci_bar_addr(pcidevaddr_t dev, int bar_num)
 {
 	uint32_t bar = pci_bar_get(dev, bar_num);
+	uint32_t mask = pci_bar_mask(bar);
+	uint64_t addr = bar & mask;
 
-	if (bar & PCI_BASE_ADDRESS_SPACE_IO)
-		return bar & PCI_BASE_ADDRESS_IO_MASK;
-	else
-		return bar & PCI_BASE_ADDRESS_MEM_MASK;
+	if (pci_bar_is64(dev, bar_num))
+		addr |= (uint64_t)pci_bar_get(dev, bar_num + 1) << 32;
+
+	return pci_translate_addr(dev, addr);
+}
+
+/*
+ * To determine the amount of address space needed by a PCI device,
+ * one must save the original value of the BAR, write a value of
+ * all 1's to the register, and then read it back. The amount of
+ * memory can be then determined by masking the information bits,
+ * performing a bitwise NOT, and incrementing the value by 1.
+ *
+ * The following pci_bar_size_helper() and pci_bar_size() functions
+ * implement the algorithm.
+ */
+static uint32_t pci_bar_size_helper(pcidevaddr_t dev, int bar_num)
+{
+	int off = PCI_BASE_ADDRESS_0 + bar_num * 4;
+	uint32_t bar, size;
+
+	bar = pci_config_readl(dev, off);
+	pci_config_writel(dev, off, ~0u);
+	size = pci_config_readl(dev, off);
+	pci_config_writel(dev, off, bar);
+
+	return size;
+}
+
+phys_addr_t pci_bar_size(pcidevaddr_t dev, int bar_num)
+{
+	uint32_t bar, size;
+
+	size = pci_bar_size_helper(dev, bar_num);
+	if (!size)
+		return 0;
+
+	bar = pci_bar_get(dev, bar_num);
+	size &= pci_bar_mask(bar);
+
+	if (pci_bar_is64(dev, bar_num)) {
+		phys_addr_t size64 = pci_bar_size_helper(dev, bar_num + 1);
+		size64 = (size64 << 32) | size;
+
+		return ~size64 + 1;
+	} else {
+		return ~size + 1;
+	}
 }
 
 bool pci_bar_is_memory(pcidevaddr_t dev, int bar_num)
@@ -47,3 +99,14 @@  bool pci_bar_is_valid(pcidevaddr_t dev, int bar_num)
 {
 	return pci_bar_get(dev, bar_num);
 }
+
+bool pci_bar_is64(pcidevaddr_t dev, int bar_num)
+{
+	uint32_t bar = pci_bar_get(dev, bar_num);
+
+	if (bar & PCI_BASE_ADDRESS_SPACE_IO)
+		return false;
+
+	return (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+		      PCI_BASE_ADDRESS_MEM_TYPE_64;
+}
diff --git a/lib/pci.h b/lib/pci.h
index 066fac77b237..8eec236cb123 100644
--- a/lib/pci.h
+++ b/lib/pci.h
@@ -16,7 +16,22 @@  enum {
 };
 
 extern pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id);
-extern unsigned long pci_bar_addr(pcidevaddr_t dev, int bar_num);
+
+/*
+ * @bar_num in all BAR access functions below is the index of the 32-bit
+ * register starting from the PCI_BASE_ADDRESS_0 offset.
+ *
+ * In cases where the BAR size is 64-bit, a caller should still provide
+ * @bar_num in terms of 32-bit words. For example, if a device has a 64-bit
+ * BAR#0 and a 32-bit BAR#1, then caller should provide 2 to address BAR#1,
+ * not 1.
+ *
+ * It is expected the caller is aware of the device BAR layout and never
+ * tries to address the middle of a 64-bit register.
+ */
+extern phys_addr_t pci_bar_addr(pcidevaddr_t dev, int bar_num);
+extern phys_addr_t pci_bar_size(pcidevaddr_t dev, int bar_num);
+extern bool pci_bar_is64(pcidevaddr_t dev, int bar_num);
 extern bool pci_bar_is_memory(pcidevaddr_t dev, int bar_num);
 extern bool pci_bar_is_valid(pcidevaddr_t dev, int bar_num);
 
diff --git a/lib/x86/asm/pci.h b/lib/x86/asm/pci.h
index 821a2c1e180a..7384b91adba1 100644
--- a/lib/x86/asm/pci.h
+++ b/lib/x86/asm/pci.h
@@ -35,4 +35,10 @@  static inline void pci_config_writel(pcidevaddr_t dev, uint8_t reg, uint32_t val
     outl(val, 0xCFC);
 }
 
+static inline
+phys_addr_t pci_translate_addr(pcidevaddr_t dev __unused, uint64_t addr)
+{
+    return addr;
+}
+
 #endif