Message ID | 1404240214-9804-4-git-send-email-Liviu.Dudau@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tuesday 01 July 2014 19:43:28 Liviu Dudau wrote: > +/* > + * Record the PCI IO range (expressed as CPU physical address + size). > + * Return a negative value if an error has occured, zero otherwise > + */ > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) > +{ > +#ifdef PCI_IOBASE > + struct io_range *res; > + resource_size_t allocated_size = 0; > + > + /* check if the range hasn't been previously recorded */ > + list_for_each_entry(res, &io_range_list, list) { > + if (addr >= res->start && addr + size <= res->start + size) > + return 0; > + allocated_size += res->size; > + } > + > + /* range not registed yet, check for available space */ > + if (allocated_size + size - 1 > IO_SPACE_LIMIT) > + return -E2BIG; > + > + /* add the range to the list */ > + res = kzalloc(sizeof(*res), GFP_KERNEL); > + if (!res) > + return -ENOMEM; > + > + res->start = addr; > + res->size = size; > + > + list_add_tail(&res->list, &io_range_list); > + > + return 0; > +#else > + return -EINVAL; > +#endif > +} > + > unsigned long __weak pci_address_to_pio(phys_addr_t address) > { > +#ifdef PCI_IOBASE > + struct io_range *res; > + resource_size_t offset = 0; > + > + list_for_each_entry(res, &io_range_list, list) { > + if (address >= res->start && > + address < res->start + res->size) { > + return res->start - address + offset; > + } > + offset += res->size; > + } > + > + return (unsigned long)-1; > +#else > if (address > IO_SPACE_LIMIT) > return (unsigned long)-1; > > return (unsigned long) address; > +#endif > } This still conflicts with the other allocator you have in patch 9 for pci_remap_iospace: nothing guarantees that the mapping is the same for both. Also, this is a completely pointless exercise at this moment, because nobody cares about the result of pci_address_to_pio on architectures that don't already provide this function. If we ever get a proper Open Firmware implementation that wants to put hardcoded PCI devices into DT, we can add an implementation, but for now this seems overkill. The allocator in pci_register_io_range seems reasonable, why not merge this function with pci_remap_iospace() as I have asked you multiple times before? Just make it return the io_offset so the caller can put that into the PCI host resources. Arnd
On Tue, Jul 01, 2014 at 09:36:10PM +0200, Arnd Bergmann wrote: > On Tuesday 01 July 2014 19:43:28 Liviu Dudau wrote: > > +/* > > + * Record the PCI IO range (expressed as CPU physical address + size). > > + * Return a negative value if an error has occured, zero otherwise > > + */ > > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) > > +{ > > +#ifdef PCI_IOBASE > > + struct io_range *res; > > + resource_size_t allocated_size = 0; > > + > > + /* check if the range hasn't been previously recorded */ > > + list_for_each_entry(res, &io_range_list, list) { > > + if (addr >= res->start && addr + size <= res->start + size) > > + return 0; > > + allocated_size += res->size; > > + } > > + > > + /* range not registed yet, check for available space */ > > + if (allocated_size + size - 1 > IO_SPACE_LIMIT) > > + return -E2BIG; > > + > > + /* add the range to the list */ > > + res = kzalloc(sizeof(*res), GFP_KERNEL); > > + if (!res) > > + return -ENOMEM; > > + > > + res->start = addr; > > + res->size = size; > > + > > + list_add_tail(&res->list, &io_range_list); > > + > > + return 0; > > +#else > > + return -EINVAL; > > +#endif > > +} > > + > > unsigned long __weak pci_address_to_pio(phys_addr_t address) > > { > > +#ifdef PCI_IOBASE > > + struct io_range *res; > > + resource_size_t offset = 0; > > + > > + list_for_each_entry(res, &io_range_list, list) { > > + if (address >= res->start && > > + address < res->start + res->size) { > > + return res->start - address + offset; > > + } > > + offset += res->size; > > + } > > + > > + return (unsigned long)-1; > > +#else > > if (address > IO_SPACE_LIMIT) > > return (unsigned long)-1; > > > > return (unsigned long) address; > > +#endif > > } > > This still conflicts with the other allocator you have in patch 9 > for pci_remap_iospace: nothing guarantees that the mapping is the > same for both. > > Also, this is a completely pointless exercise at this moment, because > nobody cares about the result of pci_address_to_pio on architectures > that don't already provide this function. If we ever get a proper > Open Firmware implementation that wants to put hardcoded PCI devices > into DT, we can add an implementation, but for now this seems overkill. > > The allocator in pci_register_io_range seems reasonable, why not merge > this function with pci_remap_iospace() as I have asked you multiple > times before? Just make it return the io_offset so the caller can > put that into the PCI host resources. Hi Arnd, While I agree with you that at some moment the allocators were inconsistent wrt each other, for this version I would respectfully disagree on this. The allocator in pci_register_io_range() only makes sure that the ranges are not overlapping, it doesn't do any mapping whatsoever, while pci_remap_iospace() does only an ioremap_page_range(). The idea is that you get the offset out of pci_address_to_pio() and apply it to pci_remap_iospace(). Why do you think there are conflicts? Best regards, Liviu > > Arnd > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
On Tue, Jul 01, 2014 at 07:43:28PM +0100, Liviu Dudau wrote: > Some architectures do not have a simple view of the PCI I/O space > and instead use a range of CPU addresses that map to bus addresses. For > some architectures these ranges will be expressed by OF bindings > in a device tree file. > > Introduce a pci_register_io_range() helper function with a generic > implementation that can be used by such architectures to keep track > of the I/O ranges described by the PCI bindings. If the PCI_IOBASE > macro is not defined that signals lack of support for PCI and we > return an error. [...] > +/* > + * Record the PCI IO range (expressed as CPU physical address + size). > + * Return a negative value if an error has occured, zero otherwise > + */ > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) > +{ > +#ifdef PCI_IOBASE > + struct io_range *res; > + resource_size_t allocated_size = 0; > + > + /* check if the range hasn't been previously recorded */ > + list_for_each_entry(res, &io_range_list, list) { > + if (addr >= res->start && addr + size <= res->start + size) > + return 0; > + allocated_size += res->size; > + } > + > + /* range not registed yet, check for available space */ > + if (allocated_size + size - 1 > IO_SPACE_LIMIT) > + return -E2BIG; > + > + /* add the range to the list */ > + res = kzalloc(sizeof(*res), GFP_KERNEL); > + if (!res) > + return -ENOMEM; > + > + res->start = addr; > + res->size = size; > + > + list_add_tail(&res->list, &io_range_list); > + > + return 0; Hopefully a stupid question, but how is this serialised? I'm just surprised that adding to and searching a list are sufficient, unless there's a big lock somewhere. Will
On Tuesday 01 July 2014 21:45:09 Liviu Dudau wrote: > On Tue, Jul 01, 2014 at 09:36:10PM +0200, Arnd Bergmann wrote: > > On Tuesday 01 July 2014 19:43:28 Liviu Dudau wrote: > > > > This still conflicts with the other allocator you have in patch 9 > > for pci_remap_iospace: nothing guarantees that the mapping is the > > same for both. > > > > Also, this is a completely pointless exercise at this moment, because > > nobody cares about the result of pci_address_to_pio on architectures > > that don't already provide this function. If we ever get a proper > > Open Firmware implementation that wants to put hardcoded PCI devices > > into DT, we can add an implementation, but for now this seems overkill. > > > > The allocator in pci_register_io_range seems reasonable, why not merge > > this function with pci_remap_iospace() as I have asked you multiple > > times before? Just make it return the io_offset so the caller can > > put that into the PCI host resources. > > Hi Arnd, > > While I agree with you that at some moment the allocators were inconsistent > wrt each other, for this version I would respectfully disagree on this. > The allocator in pci_register_io_range() only makes sure that the ranges > are not overlapping, it doesn't do any mapping whatsoever, while > pci_remap_iospace() does only an ioremap_page_range(). The idea is that > you get the offset out of pci_address_to_pio() and apply it to > pci_remap_iospace(). Ok, got it now, I'm sorry I didn't read this properly at first. Your solution looks correct to me, just using different tradeoffs to what I was expecting: You get a working pci_address_to_pio() function, which is probably never needed, but in turn you need to keep the state of each host bridge in a global list. Arnd
Some more detailed comments now On Tuesday 01 July 2014 19:43:28 Liviu Dudau wrote: > +/* > + * Record the PCI IO range (expressed as CPU physical address + size). > + * Return a negative value if an error has occured, zero otherwise > + */ > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) > +{ > +#ifdef PCI_IOBASE > + struct io_range *res; I was confused by the variable naming here: A variable named 'res' is normally a 'struct resource'. Maybe better call this 'range'. > + resource_size_t allocated_size = 0; > + > + /* check if the range hasn't been previously recorded */ > + list_for_each_entry(res, &io_range_list, list) { > + if (addr >= res->start && addr + size <= res->start + size) > + return 0; > + allocated_size += res->size; > + } A spin_lock around the list lookup should be sufficient to get around the race that Will mentioned. > + /* range not registed yet, check for available space */ > + if (allocated_size + size - 1 > IO_SPACE_LIMIT) > + return -E2BIG; It might be better to limit the size to 64K if it doesn't fit at first. Arnd
On Wed, Jul 02, 2014 at 01:38:04PM +0100, Arnd Bergmann wrote: > Some more detailed comments now > > On Tuesday 01 July 2014 19:43:28 Liviu Dudau wrote: > > +/* > > + * Record the PCI IO range (expressed as CPU physical address + size). > > + * Return a negative value if an error has occured, zero otherwise > > + */ > > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) > > +{ > > +#ifdef PCI_IOBASE > > + struct io_range *res; > > I was confused by the variable naming here: A variable named 'res' is > normally a 'struct resource'. Maybe better call this 'range'. > > > + resource_size_t allocated_size = 0; > > + > > + /* check if the range hasn't been previously recorded */ > > + list_for_each_entry(res, &io_range_list, list) { > > + if (addr >= res->start && addr + size <= res->start + size) > > + return 0; > > + allocated_size += res->size; > > + } > > A spin_lock around the list lookup should be sufficient to get around > the race that Will mentioned. > > > + /* range not registed yet, check for available space */ > > + if (allocated_size + size - 1 > IO_SPACE_LIMIT) > > + return -E2BIG; > > It might be better to limit the size to 64K if it doesn't fit at first. Thanks Arnd for review. Will update and post a new patch soon if I don't get any other comments. Best regards, Liviu > > > Arnd > >
On Wed, Jul 02, 2014 at 01:30:31PM +0100, Arnd Bergmann wrote: > On Tuesday 01 July 2014 21:45:09 Liviu Dudau wrote: > > On Tue, Jul 01, 2014 at 09:36:10PM +0200, Arnd Bergmann wrote: > > > On Tuesday 01 July 2014 19:43:28 Liviu Dudau wrote: > > > > > > This still conflicts with the other allocator you have in patch 9 > > > for pci_remap_iospace: nothing guarantees that the mapping is the > > > same for both. > > > > > > Also, this is a completely pointless exercise at this moment, because > > > nobody cares about the result of pci_address_to_pio on architectures > > > that don't already provide this function. If we ever get a proper > > > Open Firmware implementation that wants to put hardcoded PCI devices > > > into DT, we can add an implementation, but for now this seems overkill. > > > > > > The allocator in pci_register_io_range seems reasonable, why not merge > > > this function with pci_remap_iospace() as I have asked you multiple > > > times before? Just make it return the io_offset so the caller can > > > put that into the PCI host resources. > > > > Hi Arnd, > > > > While I agree with you that at some moment the allocators were inconsistent > > wrt each other, for this version I would respectfully disagree on this. > > The allocator in pci_register_io_range() only makes sure that the ranges > > are not overlapping, it doesn't do any mapping whatsoever, while > > pci_remap_iospace() does only an ioremap_page_range(). The idea is that > > you get the offset out of pci_address_to_pio() and apply it to > > pci_remap_iospace(). > > Ok, got it now, I'm sorry I didn't read this properly at first. > > Your solution looks correct to me, just using different > tradeoffs to what I was expecting: You get a working pci_address_to_pio() > function, which is probably never needed, but in turn you need to > keep the state of each host bridge in a global list. Just a reminder that with my patchset I *do* start using pci_address_to_pio() in order to correctly parse the IO ranges from DT. Best regards, Liviu > > Arnd > >
On Wednesday 02 July 2014 15:23:03 Liviu Dudau wrote: > > > > Your solution looks correct to me, just using different > > tradeoffs to what I was expecting: You get a working pci_address_to_pio() > > function, which is probably never needed, but in turn you need to > > keep the state of each host bridge in a global list. > > Just a reminder that with my patchset I *do* start using pci_address_to_pio() > in order to correctly parse the IO ranges from DT. Yes, what I meant is that it would be easier not to do that. All existing drivers expect of_pci_range_to_resource() to return the CPU address for an I/O space register, not the Linux I/O port number that we want to pass to the PCI core. This is suboptimal because it's not obvious how it works, but it lets us get away without an extra registration step. Once all probe functions in PCI host drivers have been changed to the of_create_pci_host_bridge, that should not matter any more, because there is only one place left that calls it and we only have to get it right once. Also, when you change that of_pci_range_to_resource, you also have to audit all callers of that function and ensure they can deal with the new behavior. Arnd
On Wed, Jul 02, 2014 at 12:22:22PM +0100, Will Deacon wrote: > On Tue, Jul 01, 2014 at 07:43:28PM +0100, Liviu Dudau wrote: > > Some architectures do not have a simple view of the PCI I/O space > > and instead use a range of CPU addresses that map to bus addresses. For > > some architectures these ranges will be expressed by OF bindings > > in a device tree file. > > > > Introduce a pci_register_io_range() helper function with a generic > > implementation that can be used by such architectures to keep track > > of the I/O ranges described by the PCI bindings. If the PCI_IOBASE > > macro is not defined that signals lack of support for PCI and we > > return an error. > > [...] > > > +/* > > + * Record the PCI IO range (expressed as CPU physical address + size). > > + * Return a negative value if an error has occured, zero otherwise > > + */ > > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) > > +{ > > +#ifdef PCI_IOBASE > > + struct io_range *res; > > + resource_size_t allocated_size = 0; > > + > > + /* check if the range hasn't been previously recorded */ > > + list_for_each_entry(res, &io_range_list, list) { > > + if (addr >= res->start && addr + size <= res->start + size) > > + return 0; > > + allocated_size += res->size; > > + } > > + > > + /* range not registed yet, check for available space */ > > + if (allocated_size + size - 1 > IO_SPACE_LIMIT) > > + return -E2BIG; > > + > > + /* add the range to the list */ > > + res = kzalloc(sizeof(*res), GFP_KERNEL); > > + if (!res) > > + return -ENOMEM; > > + > > + res->start = addr; > > + res->size = size; > > + > > + list_add_tail(&res->list, &io_range_list); > > + > > + return 0; > > Hopefully a stupid question, but how is this serialised? I'm just surprised > that adding to and searching a list are sufficient, unless there's a big > lock somewhere. Sorry, tripped into my own filters! You are right, there is no serialisation here, will add one. Best regards, Liviu > > Will > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
On Tue, Jul 01, 2014 at 07:43:28PM +0100, Liviu Dudau wrote: > Some architectures do not have a simple view of the PCI I/O space > and instead use a range of CPU addresses that map to bus addresses. For > some architectures these ranges will be expressed by OF bindings > in a device tree file. > > Introduce a pci_register_io_range() helper function with a generic > implementation that can be used by such architectures to keep track > of the I/O ranges described by the PCI bindings. If the PCI_IOBASE > macro is not defined that signals lack of support for PCI and we > return an error. > > Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> > --- > drivers/of/address.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++ > include/linux/of_address.h | 1 + > 2 files changed, 62 insertions(+) > > diff --git a/drivers/of/address.c b/drivers/of/address.c > index 5edfcb0..1345733 100644 > --- a/drivers/of/address.c > +++ b/drivers/of/address.c > @@ -5,6 +5,7 @@ > #include <linux/module.h> > #include <linux/of_address.h> > #include <linux/pci_regs.h> > +#include <linux/slab.h> > #include <linux/string.h> > > /* Max address size we deal with */ > @@ -601,12 +602,72 @@ const __be32 *of_get_address(struct device_node *dev, int index, u64 *size, > } > EXPORT_SYMBOL(of_get_address); > > +struct io_range { > + struct list_head list; > + phys_addr_t start; > + resource_size_t size; > +}; > + > +static LIST_HEAD(io_range_list); > + > +/* > + * Record the PCI IO range (expressed as CPU physical address + size). > + * Return a negative value if an error has occured, zero otherwise > + */ > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) I don't understand the interface here. What's the mapping from CPU physical address to bus I/O port? For example, I have the following machine in mind: HWP0002:00: PCI Root Bridge (domain 0000 [bus 00-1b]) HWP0002:00: memory-mapped IO port space [mem 0xf8010000000-0xf8010000fff] HWP0002:00: host bridge window [io 0x0000-0x0fff] HWP0002:09: PCI Root Bridge (domain 0001 [bus 00-1b]) HWP0002:09: memory-mapped IO port space [mem 0xf8110000000-0xf8110000fff] HWP0002:09: host bridge window [io 0x1000000-0x1000fff] (PCI address [0x0-0xfff]) The CPU physical memory [mem 0xf8010000000-0xf8010000fff] is translated by the bridge to I/O ports 0x0000-0x0fff on PCI bus 0000:00. Drivers use, e.g., "inb(0)" to access it. Similarly, [mem 0xf8110000000-0xf8110000fff] is translated by the second bridge to I/O ports 0x0000-0x0fff on PCI bus 0001:00. Drivers use "inb(0x1000000)" to access it. pci_register_io_range() seems sort of like it's intended to track the memory-mapped IO port spaces, e.g., [mem 0xf8010000000-0xf8010000fff]. But I would think you'd want to keep track of at least the base port number on the PCI bus, too. Or is that why it's weak? Here's what these look like in /proc/iomem and /proc/ioports (note that there are two resource structs for each memory-mapped IO port space: one IORESOURCE_MEM for the memory-mapped area (used only by the host bridge driver), and one IORESOURCE_IO for the I/O port space (this becomes the parent of a region used by a regular device driver): /proc/iomem: PCI Bus 0000:00 I/O Ports 00000000-00000fff PCI Bus 0001:00 I/O Ports 01000000-01000fff /proc/ioports: 00000000-00000fff : PCI Bus 0000:00 01000000-01000fff : PCI Bus 0001:00 > +{ > +#ifdef PCI_IOBASE > + struct io_range *res; > + resource_size_t allocated_size = 0; > + > + /* check if the range hasn't been previously recorded */ > + list_for_each_entry(res, &io_range_list, list) { > + if (addr >= res->start && addr + size <= res->start + size) > + return 0; > + allocated_size += res->size; > + } > + > + /* range not registed yet, check for available space */ > + if (allocated_size + size - 1 > IO_SPACE_LIMIT) > + return -E2BIG; > + > + /* add the range to the list */ > + res = kzalloc(sizeof(*res), GFP_KERNEL); > + if (!res) > + return -ENOMEM; > + > + res->start = addr; > + res->size = size; > + > + list_add_tail(&res->list, &io_range_list); > + > + return 0; > +#else > + return -EINVAL; > +#endif > +} > + > unsigned long __weak pci_address_to_pio(phys_addr_t address) > { > +#ifdef PCI_IOBASE > + struct io_range *res; > + resource_size_t offset = 0; > + > + list_for_each_entry(res, &io_range_list, list) { > + if (address >= res->start && > + address < res->start + res->size) { > + return res->start - address + offset; > + } > + offset += res->size; > + } > + > + return (unsigned long)-1; > +#else > if (address > IO_SPACE_LIMIT) > return (unsigned long)-1; > > return (unsigned long) address; > +#endif > } > > static int __of_address_to_resource(struct device_node *dev, > diff --git a/include/linux/of_address.h b/include/linux/of_address.h > index c13b878..ac4aac4 100644 > --- a/include/linux/of_address.h > +++ b/include/linux/of_address.h > @@ -55,6 +55,7 @@ extern void __iomem *of_iomap(struct device_node *device, int index); > extern const __be32 *of_get_address(struct device_node *dev, int index, > u64 *size, unsigned int *flags); > > +extern int pci_register_io_range(phys_addr_t addr, resource_size_t size); > extern unsigned long pci_address_to_pio(phys_addr_t addr); > > extern int of_pci_range_parser_init(struct of_pci_range_parser *parser, > -- > 2.0.0 >
On Tuesday 08 July 2014, Bjorn Helgaas wrote: > On Tue, Jul 01, 2014 at 07:43:28PM +0100, Liviu Dudau wrote: > > +static LIST_HEAD(io_range_list); > > + > > +/* > > + * Record the PCI IO range (expressed as CPU physical address + size). > > + * Return a negative value if an error has occured, zero otherwise > > + */ > > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) > > I don't understand the interface here. What's the mapping from CPU > physical address to bus I/O port? For example, I have the following > machine in mind: > > HWP0002:00: PCI Root Bridge (domain 0000 [bus 00-1b]) > HWP0002:00: memory-mapped IO port space [mem 0xf8010000000-0xf8010000fff] > HWP0002:00: host bridge window [io 0x0000-0x0fff] > > HWP0002:09: PCI Root Bridge (domain 0001 [bus 00-1b]) > HWP0002:09: memory-mapped IO port space [mem 0xf8110000000-0xf8110000fff] > HWP0002:09: host bridge window [io 0x1000000-0x1000fff] (PCI address [0x0-0xfff]) > > The CPU physical memory [mem 0xf8010000000-0xf8010000fff] is translated by > the bridge to I/O ports 0x0000-0x0fff on PCI bus 0000:00. Drivers use, > e.g., "inb(0)" to access it. > > Similarly, [mem 0xf8110000000-0xf8110000fff] is translated by the second > bridge to I/O ports 0x0000-0x0fff on PCI bus 0001:00. Drivers use > "inb(0x1000000)" to access it. I guess you are thinking of the IA64 model here where you keep the virtual I/O port numbers in a per-bus lookup table that gets accessed for each inb() call. I've thought about this some more, and I believe there are good reasons for sticking with the model used on arm32 and powerpc for the generic OF implementation. The idea is that there is a single virtual memory range for all I/O port mappings and we use the MMU to do the translation rather than computing it manually in the inb() implemnetation. The main advantage is that all functions used in device drivers to (potentially) access I/O ports become trivial this way, which helps for code size and in some cases (e.g. SoC-internal registers with a low latency) it may even be performance relevant. What this scheme gives you is a set of functions that literally do: /* architecture specific virtual address */ #define PCI_IOBASE (void __iomem *)0xabcd00000000000 static inline u32 inl(unsigned long port) { return readl(port + PCI_IOBASE); } static inline void __iomem *ioport_map(unsigned long port, unsigned int nr) { return port + PCI_IOBASE; } static inline unsigned int ioread32(void __iomem *p) { return readl(p); } Since we want this to work on 32-bit machines, the virtual I/O space has to be rather tightly packed, so Liviu's algorithm just picks the next available address for each new I/O space. > pci_register_io_range() seems sort of like it's intended to track the > memory-mapped IO port spaces, e.g., [mem 0xf8010000000-0xf8010000fff]. > But I would think you'd want to keep track of at least the base port > number on the PCI bus, too. Or is that why it's weak? The PCI bus start address only gets factored in when the window is registered with the PCI core in patch 8/9, where we go over all ranges doing + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); With Liviu's patch, this can be done in exactly the same way for both MMIO and PIO spaces. > Here's what these look like in /proc/iomem and /proc/ioports (note that > there are two resource structs for each memory-mapped IO port space: one > IORESOURCE_MEM for the memory-mapped area (used only by the host bridge > driver), and one IORESOURCE_IO for the I/O port space (this becomes the > parent of a region used by a regular device driver): > > /proc/iomem: > PCI Bus 0000:00 I/O Ports 00000000-00000fff > PCI Bus 0001:00 I/O Ports 01000000-01000fff > > /proc/ioports: > 00000000-00000fff : PCI Bus 0000:00 > 01000000-01000fff : PCI Bus 0001:00 The only difference I'd expect here is that the last line would make it packed more tightly, so it's instead /proc/ioports: 00000000-00000fff : PCI Bus 0000:00 00001000-00001fff : PCI Bus 0001:00 In practice we'd probably have 64KB per host controller, and each of them would be a separate domain. I think we normally don't register the IORESOURCE_MEM resource, but I agree it's a good idea and we should always do that. Arnd
On Tue, Jul 08, 2014 at 01:14:18AM +0100, Bjorn Helgaas wrote: > On Tue, Jul 01, 2014 at 07:43:28PM +0100, Liviu Dudau wrote: > > Some architectures do not have a simple view of the PCI I/O space > > and instead use a range of CPU addresses that map to bus addresses. For > > some architectures these ranges will be expressed by OF bindings > > in a device tree file. > > > > Introduce a pci_register_io_range() helper function with a generic > > implementation that can be used by such architectures to keep track > > of the I/O ranges described by the PCI bindings. If the PCI_IOBASE > > macro is not defined that signals lack of support for PCI and we > > return an error. > > > > Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> > > --- > > drivers/of/address.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++ > > include/linux/of_address.h | 1 + > > 2 files changed, 62 insertions(+) > > > > diff --git a/drivers/of/address.c b/drivers/of/address.c > > index 5edfcb0..1345733 100644 > > --- a/drivers/of/address.c > > +++ b/drivers/of/address.c > > @@ -5,6 +5,7 @@ > > #include <linux/module.h> > > #include <linux/of_address.h> > > #include <linux/pci_regs.h> > > +#include <linux/slab.h> > > #include <linux/string.h> > > > > /* Max address size we deal with */ > > @@ -601,12 +602,72 @@ const __be32 *of_get_address(struct device_node *dev, int index, u64 *size, > > } > > EXPORT_SYMBOL(of_get_address); > > > > +struct io_range { > > + struct list_head list; > > + phys_addr_t start; > > + resource_size_t size; > > +}; > > + > > +static LIST_HEAD(io_range_list); > > + > > +/* > > + * Record the PCI IO range (expressed as CPU physical address + size). > > + * Return a negative value if an error has occured, zero otherwise > > + */ > > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) > > I don't understand the interface here. What's the mapping from CPU > physical address to bus I/O port? For example, I have the following > machine in mind: > > HWP0002:00: PCI Root Bridge (domain 0000 [bus 00-1b]) > HWP0002:00: memory-mapped IO port space [mem 0xf8010000000-0xf8010000fff] > HWP0002:00: host bridge window [io 0x0000-0x0fff] > > HWP0002:09: PCI Root Bridge (domain 0001 [bus 00-1b]) > HWP0002:09: memory-mapped IO port space [mem 0xf8110000000-0xf8110000fff] > HWP0002:09: host bridge window [io 0x1000000-0x1000fff] (PCI address [0x0-0xfff]) > > The CPU physical memory [mem 0xf8010000000-0xf8010000fff] is translated by > the bridge to I/O ports 0x0000-0x0fff on PCI bus 0000:00. Drivers use, > e.g., "inb(0)" to access it. > > Similarly, [mem 0xf8110000000-0xf8110000fff] is translated by the second > bridge to I/O ports 0x0000-0x0fff on PCI bus 0001:00. Drivers use > "inb(0x1000000)" to access it. > > pci_register_io_range() seems sort of like it's intended to track the > memory-mapped IO port spaces, e.g., [mem 0xf8010000000-0xf8010000fff]. > But I would think you'd want to keep track of at least the base port > number on the PCI bus, too. Or is that why it's weak? It's weak in case the default implementation doesn't fit someones requirements. And yes, it is trying to track the memory-mapped IO port spaces. When calling pci_address_to_pio() - which takes the CPU address - it will return the port number (0x0000 - 0x0fff and 0x1000000 - 0x1000fff respectively). pci_address_to_pio() uses the list built by calling pci_register_io_range() to calculate the correct offsets (although in this case it would move your second host bridge io ports to [io 0x1000 - 0x1fff] as it tries not to leave gaps in the reservations). > > Here's what these look like in /proc/iomem and /proc/ioports (note that > there are two resource structs for each memory-mapped IO port space: one > IORESOURCE_MEM for the memory-mapped area (used only by the host bridge > driver), and one IORESOURCE_IO for the I/O port space (this becomes the > parent of a region used by a regular device driver): > > /proc/iomem: > PCI Bus 0000:00 I/O Ports 00000000-00000fff > PCI Bus 0001:00 I/O Ports 01000000-01000fff > > /proc/ioports: > 00000000-00000fff : PCI Bus 0000:00 > 01000000-01000fff : PCI Bus 0001:00 OK, I have a question that might be ovbious to you but I have missed the answer so far: how does the IORESOURCE_MEM area gets created? Is it the host bridge driver's job to do it? Is it something that the framework should do when it notices that the IORESOURCE_IO is memory mapped? Many thanks, Liviu > > > +{ > > +#ifdef PCI_IOBASE > > + struct io_range *res; > > + resource_size_t allocated_size = 0; > > + > > + /* check if the range hasn't been previously recorded */ > > + list_for_each_entry(res, &io_range_list, list) { > > + if (addr >= res->start && addr + size <= res->start + size) > > + return 0; > > + allocated_size += res->size; > > + } > > + > > + /* range not registed yet, check for available space */ > > + if (allocated_size + size - 1 > IO_SPACE_LIMIT) > > + return -E2BIG; > > + > > + /* add the range to the list */ > > + res = kzalloc(sizeof(*res), GFP_KERNEL); > > + if (!res) > > + return -ENOMEM; > > + > > + res->start = addr; > > + res->size = size; > > + > > + list_add_tail(&res->list, &io_range_list); > > + > > + return 0; > > +#else > > + return -EINVAL; > > +#endif > > +} > > + > > unsigned long __weak pci_address_to_pio(phys_addr_t address) > > { > > +#ifdef PCI_IOBASE > > + struct io_range *res; > > + resource_size_t offset = 0; > > + > > + list_for_each_entry(res, &io_range_list, list) { > > + if (address >= res->start && > > + address < res->start + res->size) { > > + return res->start - address + offset; > > + } > > + offset += res->size; > > + } > > + > > + return (unsigned long)-1; > > +#else > > if (address > IO_SPACE_LIMIT) > > return (unsigned long)-1; > > > > return (unsigned long) address; > > +#endif > > } > > > > static int __of_address_to_resource(struct device_node *dev, > > diff --git a/include/linux/of_address.h b/include/linux/of_address.h > > index c13b878..ac4aac4 100644 > > --- a/include/linux/of_address.h > > +++ b/include/linux/of_address.h > > @@ -55,6 +55,7 @@ extern void __iomem *of_iomap(struct device_node *device, int index); > > extern const __be32 *of_get_address(struct device_node *dev, int index, > > u64 *size, unsigned int *flags); > > > > +extern int pci_register_io_range(phys_addr_t addr, resource_size_t size); > > extern unsigned long pci_address_to_pio(phys_addr_t addr); > > > > extern int of_pci_range_parser_init(struct of_pci_range_parser *parser, > > -- > > 2.0.0 > > >
On Tuesday 08 July 2014, Liviu Dudau wrote: > > Here's what these look like in /proc/iomem and /proc/ioports (note that > > there are two resource structs for each memory-mapped IO port space: one > > IORESOURCE_MEM for the memory-mapped area (used only by the host bridge > > driver), and one IORESOURCE_IO for the I/O port space (this becomes the > > parent of a region used by a regular device driver): > > > > /proc/iomem: > > PCI Bus 0000:00 I/O Ports 00000000-00000fff > > PCI Bus 0001:00 I/O Ports 01000000-01000fff > > > > /proc/ioports: > > 00000000-00000fff : PCI Bus 0000:00 > > 01000000-01000fff : PCI Bus 0001:00 > > OK, I have a question that might be ovbious to you but I have missed the answer > so far: how does the IORESOURCE_MEM area gets created? Is it the host bridge > driver's job to do it? Is it something that the framework should do when it > notices that the IORESOURCE_IO is memory mapped? The host bridge driver should either register the IORESOURCE_MEM resource itself from its probe or setup function, or it should get registered behind the covers in drivers using of_create_pci_host_bridge(). Your new pci_host_bridge_of_get_ranges already loops over all the resources, so it would be a good place to put that. Arnd
On Tue, Jul 8, 2014 at 1:00 AM, Arnd Bergmann <arnd@arndb.de> wrote: > On Tuesday 08 July 2014, Bjorn Helgaas wrote: >> On Tue, Jul 01, 2014 at 07:43:28PM +0100, Liviu Dudau wrote: >> > +static LIST_HEAD(io_range_list); >> > + >> > +/* >> > + * Record the PCI IO range (expressed as CPU physical address + size). >> > + * Return a negative value if an error has occured, zero otherwise >> > + */ >> > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) >> >> I don't understand the interface here. What's the mapping from CPU >> physical address to bus I/O port? For example, I have the following >> machine in mind: >> >> HWP0002:00: PCI Root Bridge (domain 0000 [bus 00-1b]) >> HWP0002:00: memory-mapped IO port space [mem 0xf8010000000-0xf8010000fff] >> HWP0002:00: host bridge window [io 0x0000-0x0fff] >> >> HWP0002:09: PCI Root Bridge (domain 0001 [bus 00-1b]) >> HWP0002:09: memory-mapped IO port space [mem 0xf8110000000-0xf8110000fff] >> HWP0002:09: host bridge window [io 0x1000000-0x1000fff] (PCI address [0x0-0xfff]) >> >> The CPU physical memory [mem 0xf8010000000-0xf8010000fff] is translated by >> the bridge to I/O ports 0x0000-0x0fff on PCI bus 0000:00. Drivers use, >> e.g., "inb(0)" to access it. >> >> Similarly, [mem 0xf8110000000-0xf8110000fff] is translated by the second >> bridge to I/O ports 0x0000-0x0fff on PCI bus 0001:00. Drivers use >> "inb(0x1000000)" to access it. > > I guess you are thinking of the IA64 model here where you keep the virtual > I/O port numbers in a per-bus lookup table that gets accessed for each > inb() call. I've thought about this some more, and I believe there are good > reasons for sticking with the model used on arm32 and powerpc for the > generic OF implementation. > > The idea is that there is a single virtual memory range for all I/O port > mappings and we use the MMU to do the translation rather than computing > it manually in the inb() implemnetation. The main advantage is that all > functions used in device drivers to (potentially) access I/O ports > become trivial this way, which helps for code size and in some cases > (e.g. SoC-internal registers with a low latency) it may even be performance > relevant. My example is from ia64, but I'm not advocating for the lookup table. The point is that the hardware works similarly (at least for dense ia64 I/O port spaces) in terms of mapping CPU physical addresses to PCI I/O space. I think my confusion is because your pci_register_io_range() and pci_addess_to_pci() implementations assume that every io_range starts at I/O port 0 on PCI (correct me if I'm wrong). I suspect that's why you don't save the I/O port number in struct io_range. Maybe that assumption is guaranteed by OF, but it doesn't hold for ACPI; ACPI can describe several I/O port apertures for a single bridge, each associated with a different CPU physical memory region. If my speculation here is correct, a comment to the effect that each io_range corresponds to a PCI I/O space range that starts at 0 might be enough. If you did add a PCI I/O port number argument to pci_register_io_range(), we might be able to make an ACPI-based implementation of it. But I guess that could be done if/when anybody ever wants to do that. >> Here's what these look like in /proc/iomem and /proc/ioports (note that >> there are two resource structs for each memory-mapped IO port space: one >> IORESOURCE_MEM for the memory-mapped area (used only by the host bridge >> driver), and one IORESOURCE_IO for the I/O port space (this becomes the >> parent of a region used by a regular device driver): >> >> /proc/iomem: >> PCI Bus 0000:00 I/O Ports 00000000-00000fff >> PCI Bus 0001:00 I/O Ports 01000000-01000fff Oops, I forgot the actual physical memory addresses here, but you got the idea anyway. It should have been something like this: /proc/iomem: f8010000000-f8010000fff PCI Bus 0000:00 I/O Ports 00000000-00000fff f8110000000-f8110000fff PCI Bus 0001:00 I/O Ports 01000000-01000fff Bjorn
On Tue, Jul 08, 2014 at 10:29:51PM +0100, Bjorn Helgaas wrote: > On Tue, Jul 8, 2014 at 1:00 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > On Tuesday 08 July 2014, Bjorn Helgaas wrote: > >> On Tue, Jul 01, 2014 at 07:43:28PM +0100, Liviu Dudau wrote: > >> > +static LIST_HEAD(io_range_list); > >> > + > >> > +/* > >> > + * Record the PCI IO range (expressed as CPU physical address + size). > >> > + * Return a negative value if an error has occured, zero otherwise > >> > + */ > >> > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) > >> > >> I don't understand the interface here. What's the mapping from CPU > >> physical address to bus I/O port? For example, I have the following > >> machine in mind: > >> > >> HWP0002:00: PCI Root Bridge (domain 0000 [bus 00-1b]) > >> HWP0002:00: memory-mapped IO port space [mem 0xf8010000000-0xf8010000fff] > >> HWP0002:00: host bridge window [io 0x0000-0x0fff] > >> > >> HWP0002:09: PCI Root Bridge (domain 0001 [bus 00-1b]) > >> HWP0002:09: memory-mapped IO port space [mem 0xf8110000000-0xf8110000fff] > >> HWP0002:09: host bridge window [io 0x1000000-0x1000fff] (PCI address [0x0-0xfff]) > >> > >> The CPU physical memory [mem 0xf8010000000-0xf8010000fff] is translated by > >> the bridge to I/O ports 0x0000-0x0fff on PCI bus 0000:00. Drivers use, > >> e.g., "inb(0)" to access it. > >> > >> Similarly, [mem 0xf8110000000-0xf8110000fff] is translated by the second > >> bridge to I/O ports 0x0000-0x0fff on PCI bus 0001:00. Drivers use > >> "inb(0x1000000)" to access it. > > > > I guess you are thinking of the IA64 model here where you keep the virtual > > I/O port numbers in a per-bus lookup table that gets accessed for each > > inb() call. I've thought about this some more, and I believe there are good > > reasons for sticking with the model used on arm32 and powerpc for the > > generic OF implementation. > > > > The idea is that there is a single virtual memory range for all I/O port > > mappings and we use the MMU to do the translation rather than computing > > it manually in the inb() implemnetation. The main advantage is that all > > functions used in device drivers to (potentially) access I/O ports > > become trivial this way, which helps for code size and in some cases > > (e.g. SoC-internal registers with a low latency) it may even be performance > > relevant. > > My example is from ia64, but I'm not advocating for the lookup table. > The point is that the hardware works similarly (at least for dense ia64 > I/O port spaces) in terms of mapping CPU physical addresses to PCI I/O > space. > > I think my confusion is because your pci_register_io_range() and > pci_addess_to_pci() implementations assume that every io_range starts at > I/O port 0 on PCI (correct me if I'm wrong). I suspect that's why you > don't save the I/O port number in struct io_range. > > Maybe that assumption is guaranteed by OF, but it doesn't hold for ACPI; > ACPI can describe several I/O port apertures for a single bridge, each > associated with a different CPU physical memory region. That is actually a good catch, I've completely missed the fact that io_range->pci_addr could be non-zero. > > If my speculation here is correct, a comment to the effect that each > io_range corresponds to a PCI I/O space range that starts at 0 might be > enough. > > If you did add a PCI I/O port number argument to pci_register_io_range(), > we might be able to make an ACPI-based implementation of it. But I guess > that could be done if/when anybody ever wants to do that. No, I think you are right, the PCI I/O port number needs to be recorded. I need to add that to pci_register_io_range(). > > >> Here's what these look like in /proc/iomem and /proc/ioports (note that > >> there are two resource structs for each memory-mapped IO port space: one > >> IORESOURCE_MEM for the memory-mapped area (used only by the host bridge > >> driver), and one IORESOURCE_IO for the I/O port space (this becomes the > >> parent of a region used by a regular device driver): > >> > >> /proc/iomem: > >> PCI Bus 0000:00 I/O Ports 00000000-00000fff > >> PCI Bus 0001:00 I/O Ports 01000000-01000fff > > Oops, I forgot the actual physical memory addresses here, but you got > the idea anyway. It should have been something like this: > > /proc/iomem: > f8010000000-f8010000fff PCI Bus 0000:00 I/O Ports 00000000-00000fff > f8110000000-f8110000fff PCI Bus 0001:00 I/O Ports 01000000-01000fff > > Bjorn > Thanks for being thorough with your review. Best regards, Liviu
On Tuesday 08 July 2014, Bjorn Helgaas wrote: > On Tue, Jul 8, 2014 at 1:00 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > On Tuesday 08 July 2014, Bjorn Helgaas wrote: > >> On Tue, Jul 01, 2014 at 07:43:28PM +0100, Liviu Dudau wrote: > >> > +static LIST_HEAD(io_range_list); > >> > + > >> > +/* > >> > + * Record the PCI IO range (expressed as CPU physical address + size). > >> > + * Return a negative value if an error has occured, zero otherwise > >> > + */ > >> > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) > >> > >> I don't understand the interface here. What's the mapping from CPU > >> physical address to bus I/O port? For example, I have the following > >> machine in mind: > >> > >> HWP0002:00: PCI Root Bridge (domain 0000 [bus 00-1b]) > >> HWP0002:00: memory-mapped IO port space [mem 0xf8010000000-0xf8010000fff] > >> HWP0002:00: host bridge window [io 0x0000-0x0fff] > >> > >> HWP0002:09: PCI Root Bridge (domain 0001 [bus 00-1b]) > >> HWP0002:09: memory-mapped IO port space [mem 0xf8110000000-0xf8110000fff] > >> HWP0002:09: host bridge window [io 0x1000000-0x1000fff] (PCI address [0x0-0xfff]) > >> > >> The CPU physical memory [mem 0xf8010000000-0xf8010000fff] is translated by > >> the bridge to I/O ports 0x0000-0x0fff on PCI bus 0000:00. Drivers use, > >> e.g., "inb(0)" to access it. > >> > >> Similarly, [mem 0xf8110000000-0xf8110000fff] is translated by the second > >> bridge to I/O ports 0x0000-0x0fff on PCI bus 0001:00. Drivers use > >> "inb(0x1000000)" to access it. > > > > I guess you are thinking of the IA64 model here where you keep the virtual > > I/O port numbers in a per-bus lookup table that gets accessed for each > > inb() call. I've thought about this some more, and I believe there are good > > reasons for sticking with the model used on arm32 and powerpc for the > > generic OF implementation. > > > > The idea is that there is a single virtual memory range for all I/O port > > mappings and we use the MMU to do the translation rather than computing > > it manually in the inb() implemnetation. The main advantage is that all > > functions used in device drivers to (potentially) access I/O ports > > become trivial this way, which helps for code size and in some cases > > (e.g. SoC-internal registers with a low latency) it may even be performance > > relevant. > > My example is from ia64, but I'm not advocating for the lookup table. > The point is that the hardware works similarly (at least for dense ia64 > I/O port spaces) in terms of mapping CPU physical addresses to PCI I/O > space. > > I think my confusion is because your pci_register_io_range() and > pci_addess_to_pci() implementations assume that every io_range starts at > I/O port 0 on PCI (correct me if I'm wrong). I suspect that's why you > don't save the I/O port number in struct io_range. I think you are just misreading the code, but I agree it's hard to understand and I made the same mistake in my initial reply to the first version. pci_register_io_range and pci_address_to_pci only worry about the mapping between CPU physical and Linux I/O address, they do not care which PCI port numbers are behind that. The mapping between PCI port numbers and Linux port numbers is done correctly in patch 8/9 in the pci_host_bridge_of_get_ranges() function. > Maybe that assumption is guaranteed by OF, but it doesn't hold for ACPI; > ACPI can describe several I/O port apertures for a single bridge, each > associated with a different CPU physical memory region. DT can have the same, although the common case is that each PCI host bridge has 64KB of I/O ports starting at address 0. Most driver writers get it wrong for the case where it starts at a different address, so I really want to have a generic implementation that gets it right. > If my speculation here is correct, a comment to the effect that each > io_range corresponds to a PCI I/O space range that starts at 0 might be > enough. > > If you did add a PCI I/O port number argument to pci_register_io_range(), > we might be able to make an ACPI-based implementation of it. But I guess > that could be done if/when anybody ever wants to do that. I think we shoulnd't worry about it before we actually need it. As far as I understand, the only user of that code (unless someone wants to convert ia64) would be ARM64 with ACPI, but that uses the SBSA hardware model that recommends having no I/O space at all. Arnd
On Wednesday 09 July 2014, Liviu Dudau wrote: > > Maybe that assumption is guaranteed by OF, but it doesn't hold for ACPI; > > ACPI can describe several I/O port apertures for a single bridge, each > > associated with a different CPU physical memory region. > > That is actually a good catch, I've completely missed the fact that > io_range->pci_addr could be non-zero. Hmm, that's what I thought in my initial review, but you convinced me that it's actually correct later on, and I still believe it is. Maybe now you got confused by your own code? Please have another look, I think your code in pci_host_bridge_of_get_ranges sufficiently handles the registration to the PCI code with the correct io_offset. The only thing that we might want to add is to record the PCI address along with the bridge->io_base: For the host driver to set up the mapping window correctly, you either need both of them, or you assume they are already set up. Arnd
On Tue, Jul 08, 2014 at 03:14:17PM +0100, Arnd Bergmann wrote: > On Tuesday 08 July 2014, Liviu Dudau wrote: > > > Here's what these look like in /proc/iomem and /proc/ioports (note that > > > there are two resource structs for each memory-mapped IO port space: one > > > IORESOURCE_MEM for the memory-mapped area (used only by the host bridge > > > driver), and one IORESOURCE_IO for the I/O port space (this becomes the > > > parent of a region used by a regular device driver): > > > > > > /proc/iomem: > > > PCI Bus 0000:00 I/O Ports 00000000-00000fff > > > PCI Bus 0001:00 I/O Ports 01000000-01000fff > > > > > > /proc/ioports: > > > 00000000-00000fff : PCI Bus 0000:00 > > > 01000000-01000fff : PCI Bus 0001:00 > > > > OK, I have a question that might be ovbious to you but I have missed the answer > > so far: how does the IORESOURCE_MEM area gets created? Is it the host bridge > > driver's job to do it? Is it something that the framework should do when it > > notices that the IORESOURCE_IO is memory mapped? > > The host bridge driver should either register the IORESOURCE_MEM resource > itself from its probe or setup function, or it should get registered behind > the covers in drivers using of_create_pci_host_bridge(). > > Your new pci_host_bridge_of_get_ranges already loops over all the > resources, so it would be a good place to put that. OK, so it is not something that I've missed, just something that x86-64 does and my version doesn't yet. Thanks for confirming that. Liviu > > Arnd >
On Wed, Jul 09, 2014 at 07:32:37AM +0100, Arnd Bergmann wrote: > On Wednesday 09 July 2014, Liviu Dudau wrote: > > > Maybe that assumption is guaranteed by OF, but it doesn't hold for ACPI; > > > ACPI can describe several I/O port apertures for a single bridge, each > > > associated with a different CPU physical memory region. > > > > That is actually a good catch, I've completely missed the fact that > > io_range->pci_addr could be non-zero. > > Hmm, that's what I thought in my initial review, but you convinced me > that it's actually correct later on, and I still believe it is. Maybe > now you got confused by your own code? Man, it has been too long. Yes, I am now confused by my own code, which is not a good sign. > > Please have another look, I think your code in pci_host_bridge_of_get_ranges > sufficiently handles the registration to the PCI code with the correct > io_offset. The only thing that we might want to add is to record the > PCI address along with the bridge->io_base: For the host driver to > set up the mapping window correctly, you either need both of them, or > you assume they are already set up. Hmm, having another look at pci_host_bridge_of_get_range() I'm not convinced that we need another storage for pci_addr. The resource gets added to the list of resources used by the bridge offsetted by range.pci_addr, so when re-creating the PCI bus address the value should come in play. I will double check but I think the code is correct as it is. Sorry for the early confusion. Best regards, Liviu > > Arnd >
On Wed, Jul 09, 2014 at 07:20:49AM +0100, Arnd Bergmann wrote: > On Tuesday 08 July 2014, Bjorn Helgaas wrote: > > On Tue, Jul 8, 2014 at 1:00 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > > On Tuesday 08 July 2014, Bjorn Helgaas wrote: > > >> On Tue, Jul 01, 2014 at 07:43:28PM +0100, Liviu Dudau wrote: > > >> > +static LIST_HEAD(io_range_list); > > >> > + > > >> > +/* > > >> > + * Record the PCI IO range (expressed as CPU physical address + size). > > >> > + * Return a negative value if an error has occured, zero otherwise > > >> > + */ > > >> > +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) > > >> > > >> I don't understand the interface here. What's the mapping from CPU > > >> physical address to bus I/O port? For example, I have the following > > >> machine in mind: > > >> > > >> HWP0002:00: PCI Root Bridge (domain 0000 [bus 00-1b]) > > >> HWP0002:00: memory-mapped IO port space [mem 0xf8010000000-0xf8010000fff] > > >> HWP0002:00: host bridge window [io 0x0000-0x0fff] > > >> > > >> HWP0002:09: PCI Root Bridge (domain 0001 [bus 00-1b]) > > >> HWP0002:09: memory-mapped IO port space [mem 0xf8110000000-0xf8110000fff] > > >> HWP0002:09: host bridge window [io 0x1000000-0x1000fff] (PCI address [0x0-0xfff]) > > >> > > >> The CPU physical memory [mem 0xf8010000000-0xf8010000fff] is translated by > > >> the bridge to I/O ports 0x0000-0x0fff on PCI bus 0000:00. Drivers use, > > >> e.g., "inb(0)" to access it. > > >> > > >> Similarly, [mem 0xf8110000000-0xf8110000fff] is translated by the second > > >> bridge to I/O ports 0x0000-0x0fff on PCI bus 0001:00. Drivers use > > >> "inb(0x1000000)" to access it. > > > > > > I guess you are thinking of the IA64 model here where you keep the virtual > > > I/O port numbers in a per-bus lookup table that gets accessed for each > > > inb() call. I've thought about this some more, and I believe there are good > > > reasons for sticking with the model used on arm32 and powerpc for the > > > generic OF implementation. > > > > > > The idea is that there is a single virtual memory range for all I/O port > > > mappings and we use the MMU to do the translation rather than computing > > > it manually in the inb() implemnetation. The main advantage is that all > > > functions used in device drivers to (potentially) access I/O ports > > > become trivial this way, which helps for code size and in some cases > > > (e.g. SoC-internal registers with a low latency) it may even be performance > > > relevant. > > > > My example is from ia64, but I'm not advocating for the lookup table. > > The point is that the hardware works similarly (at least for dense ia64 > > I/O port spaces) in terms of mapping CPU physical addresses to PCI I/O > > space. > > > > I think my confusion is because your pci_register_io_range() and > > pci_addess_to_pci() implementations assume that every io_range starts at > > I/O port 0 on PCI (correct me if I'm wrong). I suspect that's why you > > don't save the I/O port number in struct io_range. > > I think you are just misreading the code, but I agree it's hard to > understand and I made the same mistake in my initial reply to the > first version. I am willing to make the code more easy to understand and validate. Proof that things are not that easy to check is that I've also got confused last night without having all the code in front of me. Any suggestions? Best regards, Liviu > > pci_register_io_range and pci_address_to_pci only worry about the mapping > between CPU physical and Linux I/O address, they do not care which PCI > port numbers are behind that. The mapping between PCI port numbers and > Linux port numbers is done correctly in patch 8/9 in the > pci_host_bridge_of_get_ranges() function. > > > Maybe that assumption is guaranteed by OF, but it doesn't hold for ACPI; > > ACPI can describe several I/O port apertures for a single bridge, each > > associated with a different CPU physical memory region. > > DT can have the same, although the common case is that each PCI host > bridge has 64KB of I/O ports starting at address 0. Most driver writers > get it wrong for the case where it starts at a different address, so > I really want to have a generic implementation that gets it right. > > > If my speculation here is correct, a comment to the effect that each > > io_range corresponds to a PCI I/O space range that starts at 0 might be > > enough. > > > > If you did add a PCI I/O port number argument to pci_register_io_range(), > > we might be able to make an ACPI-based implementation of it. But I guess > > that could be done if/when anybody ever wants to do that. > > I think we shoulnd't worry about it before we actually need it. As far as > I understand, the only user of that code (unless someone wants to convert > ia64) would be ARM64 with ACPI, but that uses the SBSA hardware model that > recommends having no I/O space at all. > > Arnd >
On Wed, Jul 9, 2014 at 12:20 AM, Arnd Bergmann <arnd@arndb.de> wrote: > On Tuesday 08 July 2014, Bjorn Helgaas wrote: >> I think my confusion is because your pci_register_io_range() and >> pci_addess_to_pci() implementations assume that every io_range starts at >> I/O port 0 on PCI (correct me if I'm wrong). I suspect that's why you >> don't save the I/O port number in struct io_range. > > I think you are just misreading the code, but I agree it's hard to > understand and I made the same mistake in my initial reply to the > first version. > > pci_register_io_range and pci_address_to_pci only worry about the mapping > between CPU physical and Linux I/O address, they do not care which PCI > port numbers are behind that. The mapping between PCI port numbers and > Linux port numbers is done correctly in patch 8/9 in the > pci_host_bridge_of_get_ranges() function. Ah, I see now. Thanks for explaining this again (I see you explained it earlier; I just didn't understand it). Now that I see it, it *is* very slick to handle both MMIO and PIO spaces the same way. Bjorn
diff --git a/drivers/of/address.c b/drivers/of/address.c index 5edfcb0..1345733 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -5,6 +5,7 @@ #include <linux/module.h> #include <linux/of_address.h> #include <linux/pci_regs.h> +#include <linux/slab.h> #include <linux/string.h> /* Max address size we deal with */ @@ -601,12 +602,72 @@ const __be32 *of_get_address(struct device_node *dev, int index, u64 *size, } EXPORT_SYMBOL(of_get_address); +struct io_range { + struct list_head list; + phys_addr_t start; + resource_size_t size; +}; + +static LIST_HEAD(io_range_list); + +/* + * Record the PCI IO range (expressed as CPU physical address + size). + * Return a negative value if an error has occured, zero otherwise + */ +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) +{ +#ifdef PCI_IOBASE + struct io_range *res; + resource_size_t allocated_size = 0; + + /* check if the range hasn't been previously recorded */ + list_for_each_entry(res, &io_range_list, list) { + if (addr >= res->start && addr + size <= res->start + size) + return 0; + allocated_size += res->size; + } + + /* range not registed yet, check for available space */ + if (allocated_size + size - 1 > IO_SPACE_LIMIT) + return -E2BIG; + + /* add the range to the list */ + res = kzalloc(sizeof(*res), GFP_KERNEL); + if (!res) + return -ENOMEM; + + res->start = addr; + res->size = size; + + list_add_tail(&res->list, &io_range_list); + + return 0; +#else + return -EINVAL; +#endif +} + unsigned long __weak pci_address_to_pio(phys_addr_t address) { +#ifdef PCI_IOBASE + struct io_range *res; + resource_size_t offset = 0; + + list_for_each_entry(res, &io_range_list, list) { + if (address >= res->start && + address < res->start + res->size) { + return res->start - address + offset; + } + offset += res->size; + } + + return (unsigned long)-1; +#else if (address > IO_SPACE_LIMIT) return (unsigned long)-1; return (unsigned long) address; +#endif } static int __of_address_to_resource(struct device_node *dev, diff --git a/include/linux/of_address.h b/include/linux/of_address.h index c13b878..ac4aac4 100644 --- a/include/linux/of_address.h +++ b/include/linux/of_address.h @@ -55,6 +55,7 @@ extern void __iomem *of_iomap(struct device_node *device, int index); extern const __be32 *of_get_address(struct device_node *dev, int index, u64 *size, unsigned int *flags); +extern int pci_register_io_range(phys_addr_t addr, resource_size_t size); extern unsigned long pci_address_to_pio(phys_addr_t addr); extern int of_pci_range_parser_init(struct of_pci_range_parser *parser,
Some architectures do not have a simple view of the PCI I/O space and instead use a range of CPU addresses that map to bus addresses. For some architectures these ranges will be expressed by OF bindings in a device tree file. Introduce a pci_register_io_range() helper function with a generic implementation that can be used by such architectures to keep track of the I/O ranges described by the PCI bindings. If the PCI_IOBASE macro is not defined that signals lack of support for PCI and we return an error. Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> --- drivers/of/address.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++ include/linux/of_address.h | 1 + 2 files changed, 62 insertions(+)