Message ID | 20240115144655.32046-3-pstanner@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Make PCI's devres API more consistent | expand |
On Mon, Jan 15, 2024 at 03:46:12PM +0100, Philipp Stanner wrote: > PCI's devres API is not extensible to ranged mappings and has > bug-provoking features. Improve that by providing better alternatives. I guess "ranged mappings" means a mapping that doesn't cover an entire BAR? Maybe there's a way to clarify? > When the original devres API for PCI was implemented, priority was given > to the creation of a set of "pural functions" such as > pcim_request_regions(). These functions have bit masks as parameters to > specify which BARs shall get mapped. Most users, however, only use those > to mapp 1-3 BARs. > A complete set of "singular functions" does not exist. s/mapp/map/ Rewrap to fill 75 columns or add blank lines between paragraphs. Also below. > As functions mapping / requesting multiple BARs at once have (almost) no > mechanism in C to return the resources to the caller of the plural > function, the devres API utilizes the iomap-table administrated by the > function pcim_iomap_table(). > > The entire PCI devres implementation was strongly tied to that table > which only allows for mapping whole, complete BARs, as the BAR's index > is used as table index. Consequently, it's not possible to, e.g., have a > pcim_iomap_range() function with that mechanism. > > An additional problem is that pci-devres has been ipmlemented in a sort > of "hybrid-mode": Some unmanaged functions have managed counterparts > (e.g.: pci_iomap() <-> pcim_iomap()), making their managed nature > obvious to the programmer. However, the region-request functions in > pci.c, prefixed with pci_, behave either managed or unmanaged, depending > on whether pci_enable_device() or pcim_enable_device() has been called > in advance. s/ipmlemented/implemented/ > This hybrid API is confusing and should be more cleanly separated by > providing always-managed functions prefixed with pcim_. > > Thus, the existing devres API is not desirable because: > a) The vast majority of the users of the plural functions only > ever sets a single bit in the bit mask, consequently making > them singular functions anyways. > b) There is no mechanism to request / iomap only part of a BAR. > c) The iomap-table mechanism is over-engineered, complicated and > can by definition not perform bounds checks, thus, provoking > memory faults: pcim_iomap_table(pdev)[42] Not sure what "pcim_iomap_table(pdev)[42]" means. > d) region-request functions being sometimes managed and > sometimes not is bug-provoking. Indent with spaces (not tabs) so it still looks good when "git log" adds spaces to indent. > + * Legacy struct storing addresses to whole mapped bars. s/bar/BAR/ (several places). > + /* A region spaning an entire bar. */ > + PCIM_ADDR_DEVRES_TYPE_REGION, > + > + /* A region spaning an entire bar, and a mapping for that whole bar. */ > + PCIM_ADDR_DEVRES_TYPE_REGION_MAPPING, > + > + /* > + * A mapping within a bar, either spaning the whole bar or just a range. > + * Without a requested region. s/spaning/spanning/ (several places). > + if (start == 0 || len == 0) /* that's an unused BAR. */ s/that/That/ > + /* > + * Ranged mappings don't get added to the legacy-table, since the table > + * only ever keeps track of whole BARs. > + */ > + Spurious blank line. > + devres_add(&pdev->dev, res); > + return mapping; > +} > +EXPORT_SYMBOL(pcim_iomap_range); Bjorn
Mon, Jan 15, 2024 at 03:46:12PM +0100, Philipp Stanner kirjoitti: > PCI's devres API is not extensible to ranged mappings and has > bug-provoking features. Improve that by providing better alternatives. > > When the original devres API for PCI was implemented, priority was given > to the creation of a set of "pural functions" such as > pcim_request_regions(). These functions have bit masks as parameters to > specify which BARs shall get mapped. Most users, however, only use those > to mapp 1-3 BARs. > A complete set of "singular functions" does not exist. > > As functions mapping / requesting multiple BARs at once have (almost) no > mechanism in C to return the resources to the caller of the plural > function, the devres API utilizes the iomap-table administrated by the > function pcim_iomap_table(). > > The entire PCI devres implementation was strongly tied to that table > which only allows for mapping whole, complete BARs, as the BAR's index > is used as table index. Consequently, it's not possible to, e.g., have a > pcim_iomap_range() function with that mechanism. > > An additional problem is that pci-devres has been ipmlemented in a sort > of "hybrid-mode": Some unmanaged functions have managed counterparts > (e.g.: pci_iomap() <-> pcim_iomap()), making their managed nature > obvious to the programmer. However, the region-request functions in > pci.c, prefixed with pci_, behave either managed or unmanaged, depending > on whether pci_enable_device() or pcim_enable_device() has been called > in advance. > > This hybrid API is confusing and should be more cleanly separated by > providing always-managed functions prefixed with pcim_. > > Thus, the existing devres API is not desirable because: > a) The vast majority of the users of the plural functions only > ever sets a single bit in the bit mask, consequently making > them singular functions anyways. > b) There is no mechanism to request / iomap only part of a BAR. > c) The iomap-table mechanism is over-engineered, complicated and > can by definition not perform bounds checks, thus, provoking > memory faults: pcim_iomap_table(pdev)[42] > d) region-request functions being sometimes managed and > sometimes not is bug-provoking. > > Implement a set of singular pcim_ functions that use devres directly > and bypass the legacy iomap table mechanism. > Add devres.c to driver-api documentation. ... > +struct pcim_addr_devres { > + enum pcim_addr_devres_type type; > + void __iomem *baseaddr; > + unsigned long offset; > + unsigned long len; > + short bar; > +}; > + > +static inline void pcim_addr_devres_clear(struct pcim_addr_devres *res) > +{ > + res->type = PCIM_ADDR_DEVRES_TYPE_INVALID; > + res->bar = -1; > + res->baseaddr = NULL; > + res->offset = 0; > + res->len = 0; More robust (in case the data type gets extended) is memset() + individual (non-0) sets. > +} ... > +static int __pcim_request_region_range(struct pci_dev *pdev, int bar, > + unsigned long offset, unsigned long maxlen, > + const char *name, int exclusive) > +{ > + resource_size_t start = pci_resource_start(pdev, bar); > + resource_size_t len = pci_resource_len(pdev, bar); > + unsigned long flags = pci_resource_flags(pdev, bar); > + > + if (start == 0 || len == 0) /* that's an unused BAR. */ > + return 0; > + if (len <= offset) > + return -EINVAL; > + > + start += offset; > + len -= offset; > + if (len > maxlen && maxlen != 0) > + len = maxlen; if (maxlen && ...) ? > + if (flags & IORESOURCE_IO) { > + if (!request_region(start, len, name)) > + return -EBUSY; > + } else if (flags & IORESOURCE_MEM) { > + if (!__request_mem_region(start, len, name, exclusive)) > + return -EBUSY; > + } else { > + /* That's not a device we can request anything on. */ > + return -ENODEV; > + } Hmm... Not sure, but the switch-case against type might be considered: switch (resource_type(...)) { ... } > + return 0; > +} > +static void __pcim_release_region_range(struct pci_dev *pdev, int bar, > + unsigned long offset, unsigned long maxlen) > +{ > + resource_size_t start = pci_resource_start(pdev, bar); > + resource_size_t len = pci_resource_len(pdev, bar); > + unsigned long flags = pci_resource_flags(pdev, bar); > + > + if (len <= offset || start == 0) > + return; > + > + if (len == 0 || maxlen == 0) /* This an unused BAR. Do nothing. */ > + return; > + > + start += offset; > + len -= offset; > + > + if (len > maxlen) > + len = maxlen; This part is quite a duplication of the above function, no? > + if (flags & IORESOURCE_IO) > + release_region(start, len); > + else if (flags & IORESOURCE_MEM) > + release_mem_region(start, len); > +} ... > +static int __pcim_request_region(struct pci_dev *pdev, int bar, > + const char *name, int exclusive) > +{ > + const unsigned long offset = 0; > + const unsigned long len = pci_resource_len(pdev, bar); How const anyhow useful here? Ditto for other places like this. > + return __pcim_request_region_range(pdev, bar, offset, len, name, exclusive); > +} ... > +static int pcim_addr_resources_match(struct device *dev, void *a_raw, void *b_raw) > +{ > + struct pcim_addr_devres *a, *b; > + > + a = a_raw; > + b = b_raw; > + (void)dev; /* unused. */ Why do we need this? > + if (a->type != b->type) > + return 0; > + > + switch (a->type) { > + case PCIM_ADDR_DEVRES_TYPE_REGION: > + case PCIM_ADDR_DEVRES_TYPE_REGION_MAPPING: > + return a->bar == b->bar; > + case PCIM_ADDR_DEVRES_TYPE_MAPPING: > + return a->baseaddr == b->baseaddr; > + case PCIM_ADDR_DEVRES_TYPE_REGION_RANGE_MAPPING: > + return a->bar == b->bar && > + a->offset == b->offset && a->len == b->len; Indentation or made it a single line. > + default: > + break; > + } > + > + return 0; return directly from default case. > +} ... > +/** > + * pcim_iomap_region - Request and iomap a PCI BAR > + * @pdev: PCI device to map IO resources for > + * @bar: Index of a BAR to map > + * @name: Name associated with the request > + * > + * Returns __iomem pointer on success, an IOMEM_ERR_PTR on failure. Please, make sure the kernel-doc won't complain scripts/kernel-doc -v -none -Wall ... > + * Mapping and region will get automatically released on driver detach. If > + * desired, release manually only with pcim_iounmap_region(). > + */ > +void __iomem *pcim_iomap_region(struct pci_dev *pdev, int bar, const char *name) > +{ > + int ret = 0; Redundant assignment. > + struct pcim_addr_devres *res; Perhaps reversed xmas tree order? > + res = pcim_addr_devres_alloc(pdev); > + if (!res) > + return IOMEM_ERR_PTR(-ENOMEM); > + > + res->type = PCIM_ADDR_DEVRES_TYPE_REGION_MAPPING; > + res->bar = bar; > + > + ret = __pcim_request_region(pdev, bar, name, 0); > + if (ret != 0) > + goto err_region; > + > + res->baseaddr = pci_iomap(pdev, bar, 0); > + if (!res->baseaddr) { > + ret = -EINVAL; > + goto err_iomap; > + } > + > + devres_add(&pdev->dev, res); > + return res->baseaddr; > + > +err_iomap: > + __pcim_release_region(pdev, bar); > +err_region: > + pcim_addr_devres_free(res); > + > + return IOMEM_ERR_PTR(ret); > +} ... > +static int _pcim_request_region(struct pci_dev *pdev, int bar, const char *name, > + int request_flags) Indentation? > +{ > + int ret = 0; Unneded assignment. Also fix this in other places. > + struct pcim_addr_devres *res; > + > + res = pcim_addr_devres_alloc(pdev); > + if (!res) > + return -ENOMEM; > + res->type = PCIM_ADDR_DEVRES_TYPE_REGION; > + res->bar = bar; > + > + ret = __pcim_request_region(pdev, bar, name, request_flags); > + if (ret != 0) { if (ret) Also fix this in other places. > + pcim_addr_devres_free(res); > + return ret; > + } > + > + devres_add(&pdev->dev, res); > + return 0; > +}
On Tue, 2024-01-16 at 12:44 -0600, Bjorn Helgaas wrote: > On Mon, Jan 15, 2024 at 03:46:12PM +0100, Philipp Stanner wrote: > > PCI's devres API is not extensible to ranged mappings and has > > bug-provoking features. Improve that by providing better > > alternatives. > > I guess "ranged mappings" means a mapping that doesn't cover an > entire > BAR? Maybe there's a way to clarify? That's what it's supposed to mean, yes. We could give it the longer title "mappings smaller than the whole BAR" or something, I guess. > > > When the original devres API for PCI was implemented, priority was > > given > > to the creation of a set of "pural functions" such as > > pcim_request_regions(). These functions have bit masks as > > parameters to > > specify which BARs shall get mapped. Most users, however, only use > > those > > to mapp 1-3 BARs. > > A complete set of "singular functions" does not exist. > > s/mapp/map/ > > Rewrap to fill 75 columns or add blank lines between paragraphs. > Also > below. > > > As functions mapping / requesting multiple BARs at once have > > (almost) no > > mechanism in C to return the resources to the caller of the plural > > function, the devres API utilizes the iomap-table administrated by > > the > > function pcim_iomap_table(). > > > > The entire PCI devres implementation was strongly tied to that > > table > > which only allows for mapping whole, complete BARs, as the BAR's > > index > > is used as table index. Consequently, it's not possible to, e.g., > > have a > > pcim_iomap_range() function with that mechanism. > > > > An additional problem is that pci-devres has been ipmlemented in a > > sort > > of "hybrid-mode": Some unmanaged functions have managed > > counterparts > > (e.g.: pci_iomap() <-> pcim_iomap()), making their managed nature > > obvious to the programmer. However, the region-request functions in > > pci.c, prefixed with pci_, behave either managed or unmanaged, > > depending > > on whether pci_enable_device() or pcim_enable_device() has been > > called > > in advance. > > s/ipmlemented/implemented/ > > > This hybrid API is confusing and should be more cleanly separated > > by > > providing always-managed functions prefixed with pcim_. > > > > Thus, the existing devres API is not desirable because: > > a) The vast majority of the users of the plural functions > > only > > ever sets a single bit in the bit mask, consequently > > making > > them singular functions anyways. > > b) There is no mechanism to request / iomap only part of a > > BAR. > > c) The iomap-table mechanism is over-engineered, > > complicated and > > can by definition not perform bounds checks, thus, > > provoking > > memory faults: pcim_iomap_table(pdev)[42] > > Not sure what "pcim_iomap_table(pdev)[42]" means. That function currently is implemented with this prototype: void __iomem * const *pcim_iomap_table(struct pci_dev *pdev); And apparently, it's intended to index directly over the function. And that's how at least part of the users use it indeed. Here in drivers/crypto/inside-secure/safexcel.c, L.1919 for example: priv->base = pcim_iomap_table(pdev)[0]; I've never seen something that wonderful in C ever before, so it's not surprising that you weren't sure what I mean.... pcim_iomap_table() can not and does not perform any bounds check. If you do void __iomem *mappy_map_mapface = pcim_iomap_table(pdev)[42]; then it will just return random garbage, or it faults. No -EINVAL or anything. You won't even get NULL. That's why this function must die. > > > d) region-request functions being sometimes managed and > > sometimes not is bug-provoking. > > Indent with spaces (not tabs) so it still looks good when "git log" > adds spaces to indent. > > > + * Legacy struct storing addresses to whole mapped bars. > > s/bar/BAR/ (several places). > > > + /* A region spaning an entire bar. */ > > + PCIM_ADDR_DEVRES_TYPE_REGION, > > + > > + /* A region spaning an entire bar, and a mapping for that > > whole bar. */ > > + PCIM_ADDR_DEVRES_TYPE_REGION_MAPPING, > > + > > + /* > > + * A mapping within a bar, either spaning the whole bar or > > just a range. > > + * Without a requested region. > > s/spaning/spanning/ (several places). > > > + if (start == 0 || len == 0) /* that's an unused BAR. */ > > s/that/That/ > > > + /* > > + * Ranged mappings don't get added to the legacy-table, > > since the table > > + * only ever keeps track of whole BARs. > > + */ > > + > > Spurious blank line. I'll take care of the grammar and spelling stuff in v2. Thanks, P. > > > + devres_add(&pdev->dev, res); > > + return mapping; > > +} > > +EXPORT_SYMBOL(pcim_iomap_range); > > Bjorn >
On Tue, 2024-01-16 at 23:15 +0200, andy.shevchenko@gmail.com wrote: > Mon, Jan 15, 2024 at 03:46:12PM +0100, Philipp Stanner kirjoitti: > > PCI's devres API is not extensible to ranged mappings and has > > bug-provoking features. Improve that by providing better > > alternatives. > > > > When the original devres API for PCI was implemented, priority was > > given > > to the creation of a set of "pural functions" such as > > pcim_request_regions(). These functions have bit masks as > > parameters to > > specify which BARs shall get mapped. Most users, however, only use > > those > > to mapp 1-3 BARs. > > A complete set of "singular functions" does not exist. > > > > As functions mapping / requesting multiple BARs at once have > > (almost) no > > mechanism in C to return the resources to the caller of the plural > > function, the devres API utilizes the iomap-table administrated by > > the > > function pcim_iomap_table(). > > > > The entire PCI devres implementation was strongly tied to that > > table > > which only allows for mapping whole, complete BARs, as the BAR's > > index > > is used as table index. Consequently, it's not possible to, e.g., > > have a > > pcim_iomap_range() function with that mechanism. > > > > An additional problem is that pci-devres has been ipmlemented in a > > sort > > of "hybrid-mode": Some unmanaged functions have managed > > counterparts > > (e.g.: pci_iomap() <-> pcim_iomap()), making their managed nature > > obvious to the programmer. However, the region-request functions in > > pci.c, prefixed with pci_, behave either managed or unmanaged, > > depending > > on whether pci_enable_device() or pcim_enable_device() has been > > called > > in advance. > > > > This hybrid API is confusing and should be more cleanly separated > > by > > providing always-managed functions prefixed with pcim_. > > > > Thus, the existing devres API is not desirable because: > > a) The vast majority of the users of the plural functions > > only > > ever sets a single bit in the bit mask, consequently > > making > > them singular functions anyways. > > b) There is no mechanism to request / iomap only part of a > > BAR. > > c) The iomap-table mechanism is over-engineered, > > complicated and > > can by definition not perform bounds checks, thus, > > provoking > > memory faults: pcim_iomap_table(pdev)[42] > > d) region-request functions being sometimes managed and > > sometimes not is bug-provoking. > > > > Implement a set of singular pcim_ functions that use devres > > directly > > and bypass the legacy iomap table mechanism. > > Add devres.c to driver-api documentation. > > ... > > > +struct pcim_addr_devres { > > + enum pcim_addr_devres_type type; > > + void __iomem *baseaddr; > > + unsigned long offset; > > + unsigned long len; > > + short bar; > > +}; > > + > > +static inline void pcim_addr_devres_clear(struct pcim_addr_devres > > *res) > > +{ > > + res->type = PCIM_ADDR_DEVRES_TYPE_INVALID; > > + res->bar = -1; > > + res->baseaddr = NULL; > > + res->offset = 0; > > + res->len = 0; > > More robust (in case the data type gets extended) is memset() + > individual > (non-0) sets. ACK > > > +} > > ... > > > +static int __pcim_request_region_range(struct pci_dev *pdev, int > > bar, > > + unsigned long offset, unsigned long maxlen, > > + const char *name, int exclusive) > > +{ > > + resource_size_t start = pci_resource_start(pdev, bar); > > + resource_size_t len = pci_resource_len(pdev, bar); > > + unsigned long flags = pci_resource_flags(pdev, bar); > > + > > + if (start == 0 || len == 0) /* that's an unused BAR. */ > > + return 0; > > + if (len <= offset) > > + return -EINVAL; > > + > > + start += offset; > > + len -= offset; > > > + if (len > maxlen && maxlen != 0) > > + len = maxlen; > > if (maxlen && ...) > > ? I very much dislike this style, although I'm aware it's used in many (but not all) regions of the kernel. It makes your style inconsistent, because sometimes you do indeed check for something larger or smaller than 0. Plus, by checking for a number, everyone immediately sees that this is an integer, not a pointer, which improves readability at 0 cost. > > > + if (flags & IORESOURCE_IO) { > > + if (!request_region(start, len, name)) > > + return -EBUSY; > > + } else if (flags & IORESOURCE_MEM) { > > + if (!__request_mem_region(start, len, name, > > exclusive)) > > + return -EBUSY; > > + } else { > > + /* That's not a device we can request anything on. > > */ > > + return -ENODEV; > > + } > > Hmm... Not sure, but the switch-case against type might be > considered: > > switch (resource_type(...)) { > ... > } You mean resource_type() from ioport.h? How would that be useful here? Would you want to write a similar function? I'd say that switch (res->type) 's meaning is very obvious > > > + return 0; > > +} > > > +static void __pcim_release_region_range(struct pci_dev *pdev, int > > bar, > > + unsigned long offset, unsigned long maxlen) > > +{ > > + resource_size_t start = pci_resource_start(pdev, bar); > > + resource_size_t len = pci_resource_len(pdev, bar); > > + unsigned long flags = pci_resource_flags(pdev, bar); > > + > > + if (len <= offset || start == 0) > > + return; > > + > > + if (len == 0 || maxlen == 0) /* This an unused BAR. Do > > nothing. */ > > + return; > > + > > + start += offset; > > + len -= offset; > > + > > + if (len > maxlen) > > + len = maxlen; > > This part is quite a duplication of the above function, no? Yes. I once had a wrapper for that in mind, but such a wrapper also gets quite complicated quickly. Reason being that you don't just check, you also modify the parameters. You'd have sth like int __pcim_check_adjust_region_range_params(unsigned long *start, unsigned long *len); and then you'd have to return either -EINVAL or 0 and *check* for those again in the calling function. That's why I removed the wrapper again and just copied the code, because I thought that's cheaper, ultimately. > > > + if (flags & IORESOURCE_IO) > > + release_region(start, len); > > + else if (flags & IORESOURCE_MEM) > > + release_mem_region(start, len); > > +} > > ... > > > +static int __pcim_request_region(struct pci_dev *pdev, int bar, > > + const char *name, int exclusive) > > +{ > > + const unsigned long offset = 0; > > + const unsigned long len = pci_resource_len(pdev, bar); > > How const anyhow useful here? > Ditto for other places like this. Yeah, we can omit those > > > + return __pcim_request_region_range(pdev, bar, offset, len, > > name, exclusive); > > +} > > ... > > > +static int pcim_addr_resources_match(struct device *dev, void > > *a_raw, void *b_raw) > > +{ > > + struct pcim_addr_devres *a, *b; > > + > > + a = a_raw; > > + b = b_raw; > > > + (void)dev; /* unused. */ > > Why do we need this? Old instinct from another project where the compiler punched you for unused variables and function parameters. Can remove it. > > > + if (a->type != b->type) > > + return 0; > > + > > + switch (a->type) { > > + case PCIM_ADDR_DEVRES_TYPE_REGION: > > + case PCIM_ADDR_DEVRES_TYPE_REGION_MAPPING: > > + return a->bar == b->bar; > > + case PCIM_ADDR_DEVRES_TYPE_MAPPING: > > + return a->baseaddr == b->baseaddr; > > + case PCIM_ADDR_DEVRES_TYPE_REGION_RANGE_MAPPING: > > + return a->bar == b->bar && > > + a->offset == b->offset && a->len == b->len; > > Indentation or made it a single line. How do you want such an indentation to be performed. Tabs mixed with spaces? > > > + default: > > + break; > > + } > > + > > + return 0; > > return directly from default case. > > > +} > > ... > > > +/** > > + * pcim_iomap_region - Request and iomap a PCI BAR > > + * @pdev: PCI device to map IO resources for > > + * @bar: Index of a BAR to map > > + * @name: Name associated with the request > > + * > > + * Returns __iomem pointer on success, an IOMEM_ERR_PTR on > > failure. > > Please, make sure the kernel-doc won't complain > > scripts/kernel-doc -v -none -Wall ... I'll have a look > > > + * Mapping and region will get automatically released on driver > > detach. If > > + * desired, release manually only with pcim_iounmap_region(). > > + */ > > +void __iomem *pcim_iomap_region(struct pci_dev *pdev, int bar, > > const char *name) > > +{ > > + int ret = 0; > > Redundant assignment. I guess we can remove it, but do you think it's not just useless, but actually bad? After all, people like the Rust folks frequently complain about the 'problem' in C of variables not being initialized. I'm neutral about this, we can keep or remove it. > > > + struct pcim_addr_devres *res; > > Perhaps reversed xmas tree order? What do you mean? The struct's name? The function's structure? > > > + res = pcim_addr_devres_alloc(pdev); > > + if (!res) > > + return IOMEM_ERR_PTR(-ENOMEM); > > + > > + res->type = PCIM_ADDR_DEVRES_TYPE_REGION_MAPPING; > > + res->bar = bar; > > + > > + ret = __pcim_request_region(pdev, bar, name, 0); > > + if (ret != 0) > > + goto err_region; > > + > > + res->baseaddr = pci_iomap(pdev, bar, 0); > > + if (!res->baseaddr) { > > + ret = -EINVAL; > > + goto err_iomap; > > + } > > + > > + devres_add(&pdev->dev, res); > > + return res->baseaddr; > > + > > +err_iomap: > > + __pcim_release_region(pdev, bar); > > +err_region: > > + pcim_addr_devres_free(res); > > + > > + return IOMEM_ERR_PTR(ret); > > +} > > ... > > > +static int _pcim_request_region(struct pci_dev *pdev, int bar, > > const char *name, > > + int request_flags) > > Indentation? > > > +{ > > + int ret = 0; > > Unneded assignment. Also fix this in other places. > > > + struct pcim_addr_devres *res; > > + > > + res = pcim_addr_devres_alloc(pdev); > > + if (!res) > > + return -ENOMEM; > > + res->type = PCIM_ADDR_DEVRES_TYPE_REGION; > > + res->bar = bar; > > + > > + ret = __pcim_request_region(pdev, bar, name, > > request_flags); > > + if (ret != 0) { > > if (ret) > > Also fix this in other places. See above. Thx for the review, P. > > > + pcim_addr_devres_free(res); > > + return ret; > > + } > > + > > + devres_add(&pdev->dev, res); > > + return 0; > > +} >
On Wed, Jan 17, 2024 at 09:54:47AM +0100, Philipp Stanner wrote: > On Tue, 2024-01-16 at 12:44 -0600, Bjorn Helgaas wrote: > > On Mon, Jan 15, 2024 at 03:46:12PM +0100, Philipp Stanner wrote: > > > PCI's devres API is not extensible to ranged mappings and has > > > bug-provoking features. Improve that by providing better > > > alternatives. > > > > I guess "ranged mappings" means a mapping that doesn't cover an > > entire BAR? Maybe there's a way to clarify? > > That's what it's supposed to mean, yes. We could give it the longer > title "mappings smaller than the whole BAR" or something, I guess. "partial BAR mappings"? > > > to the creation of a set of "pural functions" such as s/pural/plural/ (I missed this before). > > > c) The iomap-table mechanism is over-engineered, > > > complicated and > > > can by definition not perform bounds checks, thus, > > > provoking > > > memory faults: pcim_iomap_table(pdev)[42] > > > > Not sure what "pcim_iomap_table(pdev)[42]" means. > > That function currently is implemented with this prototype: > void __iomem * const *pcim_iomap_table(struct pci_dev *pdev); > > And apparently, it's intended to index directly over the function. And > that's how at least part of the users use it indeed. > > Here in drivers/crypto/inside-secure/safexcel.c, L.1919 for example: > > priv->base = pcim_iomap_table(pdev)[0]; > > I've never seen something that wonderful in C ever before, so it's not > surprising that you weren't sure what I mean.... > > pcim_iomap_table() can not and does not perform any bounds check. If > you do > > void __iomem *mappy_map_mapface = pcim_iomap_table(pdev)[42]; > > then it will just return random garbage, or it faults. No -EINVAL or > anything. You won't even get NULL. > > That's why this function must die. No argument except that this example only makes sense after one looks up the prototype and connects the dots. Bjorn
diff --git a/Documentation/driver-api/pci/pci.rst b/Documentation/driver-api/pci/pci.rst index 4843cfad4f60..92b11775344e 100644 --- a/Documentation/driver-api/pci/pci.rst +++ b/Documentation/driver-api/pci/pci.rst @@ -4,6 +4,9 @@ PCI Support Library .. kernel-doc:: drivers/pci/pci.c :export: +.. kernel-doc:: drivers/pci/devres.c + :export: + .. kernel-doc:: drivers/pci/pci-driver.c :export: diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c index 4bd1e125bca1..cc8c1501eb13 100644 --- a/drivers/pci/devres.c +++ b/drivers/pci/devres.c @@ -8,10 +8,223 @@ */ #define PCIM_IOMAP_MAX PCI_STD_NUM_BARS +/* + * Legacy struct storing addresses to whole mapped bars. + */ struct pcim_iomap_devres { void __iomem *table[PCIM_IOMAP_MAX]; }; +enum pcim_addr_devres_type { + /* Default initializer. */ + PCIM_ADDR_DEVRES_TYPE_INVALID, + + /* A region spaning an entire bar. */ + PCIM_ADDR_DEVRES_TYPE_REGION, + + /* A region spaning an entire bar, and a mapping for that whole bar. */ + PCIM_ADDR_DEVRES_TYPE_REGION_MAPPING, + + /* + * A mapping within a bar, either spaning the whole bar or just a range. + * Without a requested region. + */ + PCIM_ADDR_DEVRES_TYPE_MAPPING, + + /* A ranged region within a bar, with a mapping spaning that range. */ + PCIM_ADDR_DEVRES_TYPE_REGION_RANGE_MAPPING +}; + +/* + * This struct envelopes IO or MEM addresses, that means mappings and region + * requests, because those are very frequently requested and released together. + */ +struct pcim_addr_devres { + enum pcim_addr_devres_type type; + void __iomem *baseaddr; + unsigned long offset; + unsigned long len; + short bar; +}; + +static inline void pcim_addr_devres_clear(struct pcim_addr_devres *res) +{ + res->type = PCIM_ADDR_DEVRES_TYPE_INVALID; + res->bar = -1; + res->baseaddr = NULL; + res->offset = 0; + res->len = 0; +} + +/* + * The following functions, __pcim_*_region*, exist as counterparts to the + * versions from pci.c - which, unfortunately, can be in "hybrid mode", i.e., + * sometimes managed, sometimes not. + * + * To separate the APIs cleanly, we define our own, simplified versions here. + */ + +/** + * __pcim_request_region_range - Request a ranged region + * @pdev: PCI device the region belongs to + * @bar: The BAR the region is within + * @offset: offset from the BAR's start address + * @maxlen: length in bytes, beginning at @offset + * @name: name associated with the request + * @exclusive: whether the mapping shall be exclusively for kernelspace + * + * Returns: 0 on success, a negative error code on failure. + * + * Request a ranged region within a device's PCI BAR. This function performs + * sanity checks on the input. + */ +static int __pcim_request_region_range(struct pci_dev *pdev, int bar, + unsigned long offset, unsigned long maxlen, + const char *name, int exclusive) +{ + resource_size_t start = pci_resource_start(pdev, bar); + resource_size_t len = pci_resource_len(pdev, bar); + unsigned long flags = pci_resource_flags(pdev, bar); + + if (start == 0 || len == 0) /* that's an unused BAR. */ + return 0; + if (len <= offset) + return -EINVAL; + + start += offset; + len -= offset; + + if (len > maxlen && maxlen != 0) + len = maxlen; + + if (flags & IORESOURCE_IO) { + if (!request_region(start, len, name)) + return -EBUSY; + } else if (flags & IORESOURCE_MEM) { + if (!__request_mem_region(start, len, name, exclusive)) + return -EBUSY; + } else { + /* That's not a device we can request anything on. */ + return -ENODEV; + } + + return 0; +} + +static void __pcim_release_region_range(struct pci_dev *pdev, int bar, + unsigned long offset, unsigned long maxlen) +{ + resource_size_t start = pci_resource_start(pdev, bar); + resource_size_t len = pci_resource_len(pdev, bar); + unsigned long flags = pci_resource_flags(pdev, bar); + + if (len <= offset || start == 0) + return; + + if (len == 0 || maxlen == 0) /* This an unused BAR. Do nothing. */ + return; + + start += offset; + len -= offset; + + if (len > maxlen) + len = maxlen; + + if (flags & IORESOURCE_IO) + release_region(start, len); + else if (flags & IORESOURCE_MEM) + release_mem_region(start, len); +} + +static int __pcim_request_region(struct pci_dev *pdev, int bar, + const char *name, int exclusive) +{ + const unsigned long offset = 0; + const unsigned long len = pci_resource_len(pdev, bar); + + return __pcim_request_region_range(pdev, bar, offset, len, name, exclusive); +} + +static void __pcim_release_region(struct pci_dev *pdev, int bar) +{ + const unsigned long offset = 0; + const unsigned long len = pci_resource_len(pdev, bar); + + __pcim_release_region_range(pdev, bar, offset, len); +} + +static void pcim_addr_resource_release(struct device *dev, void *resource_raw) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct pcim_addr_devres *res = resource_raw; + + switch (res->type) { + case PCIM_ADDR_DEVRES_TYPE_REGION: + __pcim_release_region(pdev, res->bar); + break; + case PCIM_ADDR_DEVRES_TYPE_REGION_MAPPING: + pci_iounmap(pdev, res->baseaddr); + __pcim_release_region(pdev, res->bar); + break; + case PCIM_ADDR_DEVRES_TYPE_MAPPING: + pci_iounmap(pdev, res->baseaddr); + break; + case PCIM_ADDR_DEVRES_TYPE_REGION_RANGE_MAPPING: + pci_iounmap(pdev, res->baseaddr); + __pcim_release_region_range(pdev, res->bar, res->offset, res->len); + break; + default: + break; + } +} + +static struct pcim_addr_devres *pcim_addr_devres_alloc(struct pci_dev *pdev) +{ + struct pcim_addr_devres *res; + + res = devres_alloc_node(pcim_addr_resource_release, sizeof(*res), + GFP_KERNEL, dev_to_node(&pdev->dev)); + if (res) + pcim_addr_devres_clear(res); + return res; +} + +/* Just for consistency and readability. */ +static inline void pcim_addr_devres_free(struct pcim_addr_devres *res) +{ + devres_free(res); +} + +/* + * Used to identify a resource in devres_*() functions. + */ +static int pcim_addr_resources_match(struct device *dev, void *a_raw, void *b_raw) +{ + struct pcim_addr_devres *a, *b; + + a = a_raw; + b = b_raw; + + (void)dev; /* unused. */ + + if (a->type != b->type) + return 0; + + switch (a->type) { + case PCIM_ADDR_DEVRES_TYPE_REGION: + case PCIM_ADDR_DEVRES_TYPE_REGION_MAPPING: + return a->bar == b->bar; + case PCIM_ADDR_DEVRES_TYPE_MAPPING: + return a->baseaddr == b->baseaddr; + case PCIM_ADDR_DEVRES_TYPE_REGION_RANGE_MAPPING: + return a->bar == b->bar && + a->offset == b->offset && a->len == b->len; + default: + break; + } + + return 0; +} static void devm_pci_unmap_iospace(struct device *dev, void *ptr) { @@ -92,8 +305,8 @@ EXPORT_SYMBOL(devm_pci_remap_cfgspace); * * All operations are managed and will be undone on driver detach. * - * Returns a pointer to the remapped memory or an ERR_PTR() encoded error code - * on failure. Usage example:: + * Returns a pointer to the remapped memory or an IOMEM_ERR_PTR() encoded error + * code on failure. Usage example:: * * res = platform_get_resource(pdev, IORESOURCE_MEM, 0); * base = devm_pci_remap_cfg_resource(&pdev->dev, res); @@ -341,15 +554,80 @@ void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr) tbl[i] = NULL; return; } - WARN_ON(1); } EXPORT_SYMBOL(pcim_iounmap); +/** + * pcim_iomap_region - Request and iomap a PCI BAR + * @pdev: PCI device to map IO resources for + * @bar: Index of a BAR to map + * @name: Name associated with the request + * + * Returns __iomem pointer on success, an IOMEM_ERR_PTR on failure. + * + * Mapping and region will get automatically released on driver detach. If + * desired, release manually only with pcim_iounmap_region(). + */ +void __iomem *pcim_iomap_region(struct pci_dev *pdev, int bar, const char *name) +{ + int ret = 0; + struct pcim_addr_devres *res; + + res = pcim_addr_devres_alloc(pdev); + if (!res) + return IOMEM_ERR_PTR(-ENOMEM); + + res->type = PCIM_ADDR_DEVRES_TYPE_REGION_MAPPING; + res->bar = bar; + + ret = __pcim_request_region(pdev, bar, name, 0); + if (ret != 0) + goto err_region; + + res->baseaddr = pci_iomap(pdev, bar, 0); + if (!res->baseaddr) { + ret = -EINVAL; + goto err_iomap; + } + + devres_add(&pdev->dev, res); + return res->baseaddr; + +err_iomap: + __pcim_release_region(pdev, bar); +err_region: + pcim_addr_devres_free(res); + + return IOMEM_ERR_PTR(ret); +} +EXPORT_SYMBOL(pcim_iomap_region); + +/** + * pcim_iounmap_region - Unmap and release a PCI BAR + * @pdev: PCI device to operate on + * @bar: Index of BAR to unmap and release + * + * Unmap a BAR and release its region manually. Only pass BARs that were + * previously mapped by pcim_iomap_region(). + */ +void pcim_iounmap_region(struct pci_dev *pdev, int bar) +{ + struct pcim_addr_devres res_searched; + + pcim_addr_devres_clear(&res_searched); + res_searched.type = PCIM_ADDR_DEVRES_TYPE_REGION_MAPPING; + res_searched.bar = bar; + + devres_release(&pdev->dev, pcim_addr_resource_release, + pcim_addr_resources_match, &res_searched); +} +EXPORT_SYMBOL(pcim_iounmap_region); + /** * pcim_iomap_regions - Request and iomap PCI BARs * @pdev: PCI device to map IO resources for * @mask: Mask of BARs to request and iomap - * @name: Name used when requesting regions + * @name: Name associated with the requests * * Request and iomap regions specified by @mask. */ @@ -402,7 +680,7 @@ EXPORT_SYMBOL(pcim_iomap_regions); * pcim_iomap_regions_request_all - Request all BARs and iomap specified ones * @pdev: PCI device to map IO resources for * @mask: Mask of BARs to iomap - * @name: Name used when requesting regions + * @name: Name associated with the requests * * Request all PCI BARs and iomap regions specified by @mask. */ @@ -448,3 +726,196 @@ void pcim_iounmap_regions(struct pci_dev *pdev, int mask) } } EXPORT_SYMBOL(pcim_iounmap_regions); + +static int _pcim_request_region(struct pci_dev *pdev, int bar, const char *name, + int request_flags) +{ + int ret = 0; + struct pcim_addr_devres *res; + + res = pcim_addr_devres_alloc(pdev); + if (!res) + return -ENOMEM; + res->type = PCIM_ADDR_DEVRES_TYPE_REGION; + res->bar = bar; + + ret = __pcim_request_region(pdev, bar, name, request_flags); + if (ret != 0) { + pcim_addr_devres_free(res); + return ret; + } + + devres_add(&pdev->dev, res); + return 0; +} + +/** + * pcim_request_region - Request a PCI BAR + * @pdev: PCI device to requestion region for + * @bar: Index of BAR to request + * @name: Name associated with the request + * + * Returns: 0 on success, a negative error code on failure. + * + * Request region specified by @bar. + * + * The region will automatically be released on driver detach. If desired, + * release manually only with pcim_release_region(). + */ +int pcim_request_region(struct pci_dev *pdev, int bar, const char *name) +{ + return _pcim_request_region(pdev, bar, name, 0); +} +EXPORT_SYMBOL(pcim_request_region); + +/** + * pcim_release_region - Release a PCI BAR + * @pdev: PCI device to operate on + * @bar: Index of BAR to release + * + * Release a region manually that was previously requested by + * pcim_request_region(). + */ +void pcim_release_region(struct pci_dev *pdev, int bar) +{ + struct pcim_addr_devres res_searched; + + pcim_addr_devres_clear(&res_searched); + res_searched.type = PCIM_ADDR_DEVRES_TYPE_REGION; + res_searched.bar = bar; + + devres_release(&pdev->dev, pcim_addr_resource_release, + pcim_addr_resources_match, &res_searched); +} +EXPORT_SYMBOL(pcim_release_region); + +/** + * pcim_iomap_range - Create a ranged __iomap mapping within a PCI BAR + * @pdev: PCI device to map IO resources for + * @bar: Index of the BAR + * @offset: Offset from the begin of the BAR + * @len: Length in bytes for the mapping + * + * Returns: __iomem pointer on success, an IOMEM_ERR_PTR on failure. + * + * Creates a new IO-Mapping within the specified @bar, ranging from @offset to + * @offset + @len. + * + * The mapping will automatically get unmapped on driver detach. If desired, + * release manually only with pcim_iounmap(). + */ +void __iomem *pcim_iomap_range(struct pci_dev *pdev, int bar, + unsigned long offset, unsigned long len) +{ + void __iomem *mapping; + struct pcim_addr_devres *res; + + res = pcim_addr_devres_alloc(pdev); + if (!res) + return IOMEM_ERR_PTR(-ENOMEM); + + mapping = pci_iomap_range(pdev, bar, offset, len); + if (!mapping) { + pcim_addr_devres_free(res); + return IOMEM_ERR_PTR(-EINVAL); + } + + res->type = PCIM_ADDR_DEVRES_TYPE_MAPPING; + res->baseaddr = mapping; + + /* + * Ranged mappings don't get added to the legacy-table, since the table + * only ever keeps track of whole BARs. + */ + + devres_add(&pdev->dev, res); + return mapping; +} +EXPORT_SYMBOL(pcim_iomap_range); + +/** + * pcim_iomap_region_range - Request and map a range within a PCI BAR + * @pdev: PCI device to map IO resources for + * @bar: Index of BAR to request within + * @offset: Offset from the begin of the BAR + * @len: Length in bytes for the mapping + * @name: Name associated with the request + * + * Returns: __iomem pointer on success, an IOMEM_ERR_PTR on failure. + * + * Request region with a range specified by @offset and @len within @bar and + * iomap it. + * + * The region will automatically be released and the mapping be unmapped on + * driver detach. If desired, release manually only with + * pcim_iounmap_region_range(). + * + * You probably should only use this function if you explicitly do not want to + * request the entire BAR. For most use-cases, combining pcim_request_region() + * and pcim_iomap_range() should be sufficient. + */ +void __iomem *pcim_iomap_region_range(struct pci_dev *pdev, int bar, + unsigned long offset, unsigned long len, const char *name) +{ + int ret = 0; + struct pcim_addr_devres *res; + + res = pcim_addr_devres_alloc(pdev); + if (!res) + return IOMEM_ERR_PTR(-ENOMEM); + + res->type = PCIM_ADDR_DEVRES_TYPE_REGION_RANGE_MAPPING; + res->bar = bar; + res->offset = offset; + res->len = len; + + ret = __pcim_request_region_range(pdev, bar, offset, len, name, 0); + if (ret != 0) + goto err_region; + + res->baseaddr = pci_iomap_range(pdev, bar, offset, len); + if (!res->baseaddr) { + ret = -EINVAL; + goto err_iomap; + } + + devres_add(&pdev->dev, res); + return res->baseaddr; + +err_iomap: + __pcim_release_region_range(pdev, bar, offset, len); +err_region: + pcim_addr_devres_free(res); + + return IOMEM_ERR_PTR(ret); +} +EXPORT_SYMBOL(pcim_iomap_region_range); + +/** + * pcim_iounmap_region_range - Unmap and release a range within a PCI BAR + * @pdev: PCI device to operate on + * @bar: Index of BAR containing the range + * @offset: Offset from the begin of the BAR + * @len: Length in bytes for the mapping + * + * Unmaps and releases a memory area within the specified PCI BAR. + * + * This function may not be used to free only part of a range. Only use this + * function with the exact parameters you previously used successfully in + * pcim_iomap_region_range(). + */ +void pcim_iounmap_region_range(struct pci_dev *pdev, int bar, + unsigned long offset, unsigned long len) +{ + struct pcim_addr_devres res_searched; + pcim_addr_devres_clear(&res_searched); + + res_searched.type = PCIM_ADDR_DEVRES_TYPE_REGION_RANGE_MAPPING; + res_searched.bar = bar; + res_searched.offset = offset; + res_searched.len = len; + + devres_release(&pdev->dev, pcim_addr_resource_release, + pcim_addr_resources_match, &res_searched); +} +EXPORT_SYMBOL(pcim_iounmap_region_range); diff --git a/include/linux/pci.h b/include/linux/pci.h index 58a4c976c39b..1b45a4888703 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2314,10 +2314,21 @@ static inline void pci_fixup_device(enum pci_fixup_pass pass, void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen); void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr); void __iomem * const *pcim_iomap_table(struct pci_dev *pdev); +void __iomem *pcim_iomap_region(struct pci_dev *pdev, int bar, const char *name); +void pcim_iounmap_region(struct pci_dev *pdev, int bar); int pcim_iomap_regions(struct pci_dev *pdev, int mask, const char *name); int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask, const char *name); void pcim_iounmap_regions(struct pci_dev *pdev, int mask); +int pcim_request_region(struct pci_dev *pdev, int bar, const char *res_name); +void pcim_release_region(struct pci_dev *pdev, int bar); +void __iomem *pcim_iomap_range(struct pci_dev *pdev, int bar, + unsigned long offset, unsigned long len); +void __iomem *pcim_iomap_region_range(struct pci_dev *pdev, int bar, + unsigned long offset, unsigned long len, + const char *res_name); +void pcim_iounmap_region_range(struct pci_dev *pdev, int bar, + unsigned long offset, unsigned long len); extern int pci_pci_problems; #define PCIPCI_FAIL 1 /* No PCI PCI DMA */
PCI's devres API is not extensible to ranged mappings and has bug-provoking features. Improve that by providing better alternatives. When the original devres API for PCI was implemented, priority was given to the creation of a set of "pural functions" such as pcim_request_regions(). These functions have bit masks as parameters to specify which BARs shall get mapped. Most users, however, only use those to mapp 1-3 BARs. A complete set of "singular functions" does not exist. As functions mapping / requesting multiple BARs at once have (almost) no mechanism in C to return the resources to the caller of the plural function, the devres API utilizes the iomap-table administrated by the function pcim_iomap_table(). The entire PCI devres implementation was strongly tied to that table which only allows for mapping whole, complete BARs, as the BAR's index is used as table index. Consequently, it's not possible to, e.g., have a pcim_iomap_range() function with that mechanism. An additional problem is that pci-devres has been ipmlemented in a sort of "hybrid-mode": Some unmanaged functions have managed counterparts (e.g.: pci_iomap() <-> pcim_iomap()), making their managed nature obvious to the programmer. However, the region-request functions in pci.c, prefixed with pci_, behave either managed or unmanaged, depending on whether pci_enable_device() or pcim_enable_device() has been called in advance. This hybrid API is confusing and should be more cleanly separated by providing always-managed functions prefixed with pcim_. Thus, the existing devres API is not desirable because: a) The vast majority of the users of the plural functions only ever sets a single bit in the bit mask, consequently making them singular functions anyways. b) There is no mechanism to request / iomap only part of a BAR. c) The iomap-table mechanism is over-engineered, complicated and can by definition not perform bounds checks, thus, provoking memory faults: pcim_iomap_table(pdev)[42] d) region-request functions being sometimes managed and sometimes not is bug-provoking. Implement a set of singular pcim_ functions that use devres directly and bypass the legacy iomap table mechanism. Add devres.c to driver-api documentation. Signed-off-by: Philipp Stanner <pstanner@redhat.com> --- Documentation/driver-api/pci/pci.rst | 3 + drivers/pci/devres.c | 481 ++++++++++++++++++++++++++- include/linux/pci.h | 11 + 3 files changed, 490 insertions(+), 5 deletions(-)