diff mbox series

[RFC,v2,1/3] resource: Request IO port regions from children of ioport_resource

Message ID 1553105650-28012-2-git-send-email-john.garry@huawei.com (mailing list archive)
State Superseded, archived
Headers show
Series Fix system crash for accessing unmapped IO port regions | expand

Commit Message

John Garry March 20, 2019, 6:14 p.m. UTC
Currently when we request an IO port region, the request is made directly
to the top resource, ioport_resource.

There is an issue here, in that drivers may successfully request an IO
port region even if the IO port region has not even been mapped in
(in pci_remap_iospace()).

This may lead to crashes when the system has no PCI host, or, has a host
but it has failed enumeration, while drivers still attempt to access PCI
IO ports, as below:

root@(none)$root@(none)$ insmod f71882fg.ko
[  152.215377] Unable to handle kernel paging request at virtual address ffff7dfffee0002e
[  152.231299] Mem abort info:
[  152.236898]   ESR = 0x96000046
[  152.243019]   Exception class = DABT (current EL), IL = 32 bits
[  152.254905]   SET = 0, FnV = 0
[  152.261024]   EA = 0, S1PTW = 0
[  152.267320] Data abort info:
[  152.273091]   ISV = 0, ISS = 0x00000046
[  152.280784]   CM = 0, WnR = 1
[  152.286730] swapper pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
[  152.300537] [ffff7dfffee0002e] pgd=000000000141c003, pud=000000000141d003, pmd=0000000000000000
[  152.318016] Internal error: Oops: 96000046 [#1] PREEMPT SMP
[  152.329199] Modules linked in: f71882fg(+)
[  152.337415] CPU: 8 PID: 2732 Comm: insmod Not tainted 5.1.0-rc1-00002-gab1a0e9200b8-dirty #102
[  152.354712] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
[  152.373058] pstate: 80000005 (Nzcv daif -PAN -UAO)
[  152.382675] pc : logic_outb+0x54/0xb8
[  152.390017] lr : f71882fg_find+0x64/0x390 [f71882fg]
[  152.399977] sp : ffff000013393aa0
[  152.406618] x29: ffff000013393aa0 x28: ffff000008b98b10
[  152.417278] x27: ffff000013393df0 x26: 0000000000000100
[  152.427938] x25: ffff801f8c872d30 x24: ffff000011420000
[  152.438598] x23: ffff801fb49d2940 x22: ffff000011291000
[  152.449257] x21: 000000000000002e x20: 0000000000000087
[  152.459917] x19: ffff000013393b44 x18: ffffffffffffffff
[  152.470577] x17: 0000000000000000 x16: 0000000000000000
[  152.481236] x15: ffff00001127d6c8 x14: ffff801f8cfd691c
[  152.491896] x13: 0000000000000000 x12: 0000000000000000
[  152.502555] x11: 0000000000000003 x10: 0000801feace2000
[  152.513215] x9 : 0000000000000000 x8 : ffff841fa654f280
[  152.523874] x7 : 0000000000000000 x6 : 0000000000ffc0e3
[  152.534534] x5 : ffff000011291360 x4 : ffff801fb4949f00
[  152.545194] x3 : 0000000000ffbffe x2 : 76e767a63713d500
[  152.555853] x1 : ffff7dfffee0002e x0 : ffff7dfffee00000
[  152.566514] Process insmod (pid: 2732, stack limit = 0x(____ptrval____))
[  152.579968] Call trace:
[  152.584863]  logic_outb+0x54/0xb8
[  152.591506]  f71882fg_find+0x64/0x390 [f71882fg]
[  152.600768]  f71882fg_init+0x38/0xc70 [f71882fg]
[  152.610031]  do_one_initcall+0x5c/0x198
[  152.617723]  do_init_module+0x54/0x1b0
[  152.625237]  load_module+0x1dc4/0x2158
[  152.632752]  __se_sys_init_module+0x14c/0x1e8
[  152.641490]  __arm64_sys_init_module+0x18/0x20
[  152.650404]  el0_svc_common+0x5c/0x100
[  152.657919]  el0_svc_handler+0x2c/0x80
[  152.665433]  el0_svc+0x8/0xc
[  152.671202] Code: d2bfdc00 f2cfbfe0 f2ffffe0 8b000021 (39000034)
[  152.683434] ---[ end trace fd4f35b610829a48 ]---
Segmentation fault
root@(none)$

Note that the f71882fg driver correctly calls request_muxed_region().

This issue was originally reported in [1].

This patch changes the functionality of request{muxed_}_region() to
request a region from a direct child descendent of the top
ioport_resource.

In this, if the IO port region has not been mapped for a particular IO
region, the PCI IO resource would also not have been inserted, and so a
suitable child region will not exist. As such,
request_{muxed_}region() calls will fail.

A side note: there are many drivers in the kernel which fail to even call
request_{muxed_}region() prior to IO port accesses, and they also need to
be fixed (to call request_{muxed_}region(), as appropriate) separately.

[1] https://www.spinics.net/lists/linux-pci/msg49821.html

Signed-off-by: John Garry <john.garry@huawei.com>
---
 include/linux/ioport.h | 12 +++++++++---
 kernel/resource.c      | 28 ++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 3 deletions(-)

Comments

Bjorn Helgaas March 25, 2019, 11:32 p.m. UTC | #1
Hi John,

On Thu, Mar 21, 2019 at 02:14:08AM +0800, John Garry wrote:
> Currently when we request an IO port region, the request is made directly
> to the top resource, ioport_resource.

Let's be explicit here, e.g.,

  Currently request_region() requests an IO port region directly from the
  top resource, ioport_resource.

> There is an issue here, in that drivers may successfully request an IO
> port region even if the IO port region has not even been mapped in
> (in pci_remap_iospace()).
> 
> This may lead to crashes when the system has no PCI host, or, has a host
> but it has failed enumeration, while drivers still attempt to access PCI
> IO ports, as below:

I don't understand the strategy here.  f71882fg is not a driver for a
PCI device, so it should work even if there is no PCI host in the
system.

On x86, I think inb/inw/inl from a port where nothing responds
probably just returns ~0, and outb/outw/outl just get dropped.
Shouldn't arm64 do the same, without crashing?

> root@(none)$root@(none)$ insmod f71882fg.ko
> [  152.215377] Unable to handle kernel paging request at virtual address ffff7dfffee0002e
> [  152.231299] Mem abort info:
> [  152.236898]   ESR = 0x96000046
> [  152.243019]   Exception class = DABT (current EL), IL = 32 bits
> [  152.254905]   SET = 0, FnV = 0
> [  152.261024]   EA = 0, S1PTW = 0
> [  152.267320] Data abort info:
> [  152.273091]   ISV = 0, ISS = 0x00000046
> [  152.280784]   CM = 0, WnR = 1
> [  152.286730] swapper pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
> [  152.300537] [ffff7dfffee0002e] pgd=000000000141c003, pud=000000000141d003, pmd=0000000000000000
> [  152.318016] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> [  152.329199] Modules linked in: f71882fg(+)
> [  152.337415] CPU: 8 PID: 2732 Comm: insmod Not tainted 5.1.0-rc1-00002-gab1a0e9200b8-dirty #102
> [  152.354712] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
> [  152.373058] pstate: 80000005 (Nzcv daif -PAN -UAO)
> [  152.382675] pc : logic_outb+0x54/0xb8
> [  152.390017] lr : f71882fg_find+0x64/0x390 [f71882fg]
> [  152.399977] sp : ffff000013393aa0
> [  152.406618] x29: ffff000013393aa0 x28: ffff000008b98b10
> [  152.417278] x27: ffff000013393df0 x26: 0000000000000100
> [  152.427938] x25: ffff801f8c872d30 x24: ffff000011420000
> [  152.438598] x23: ffff801fb49d2940 x22: ffff000011291000
> [  152.449257] x21: 000000000000002e x20: 0000000000000087
> [  152.459917] x19: ffff000013393b44 x18: ffffffffffffffff
> [  152.470577] x17: 0000000000000000 x16: 0000000000000000
> [  152.481236] x15: ffff00001127d6c8 x14: ffff801f8cfd691c
> [  152.491896] x13: 0000000000000000 x12: 0000000000000000
> [  152.502555] x11: 0000000000000003 x10: 0000801feace2000
> [  152.513215] x9 : 0000000000000000 x8 : ffff841fa654f280
> [  152.523874] x7 : 0000000000000000 x6 : 0000000000ffc0e3
> [  152.534534] x5 : ffff000011291360 x4 : ffff801fb4949f00
> [  152.545194] x3 : 0000000000ffbffe x2 : 76e767a63713d500
> [  152.555853] x1 : ffff7dfffee0002e x0 : ffff7dfffee00000
> [  152.566514] Process insmod (pid: 2732, stack limit = 0x(____ptrval____))
> [  152.579968] Call trace:
> [  152.584863]  logic_outb+0x54/0xb8
> [  152.591506]  f71882fg_find+0x64/0x390 [f71882fg]
> [  152.600768]  f71882fg_init+0x38/0xc70 [f71882fg]
> [  152.610031]  do_one_initcall+0x5c/0x198
> [  152.617723]  do_init_module+0x54/0x1b0
> [  152.625237]  load_module+0x1dc4/0x2158
> [  152.632752]  __se_sys_init_module+0x14c/0x1e8
> [  152.641490]  __arm64_sys_init_module+0x18/0x20
> [  152.650404]  el0_svc_common+0x5c/0x100
> [  152.657919]  el0_svc_handler+0x2c/0x80
> [  152.665433]  el0_svc+0x8/0xc
> [  152.671202] Code: d2bfdc00 f2cfbfe0 f2ffffe0 8b000021 (39000034)
> [  152.683434] ---[ end trace fd4f35b610829a48 ]---
> Segmentation fault
> root@(none)$

Please remove the timestamps (because they don't contribute useful
information) and indent the example a couple spaces (which is
conventional for quoted material).

> Note that the f71882fg driver correctly calls request_muxed_region().
> 
> This issue was originally reported in [1].
> 
> This patch changes the functionality of request{muxed_}_region() to
> request a region from a direct child descendent of the top
> ioport_resource.
> 
> In this, if the IO port region has not been mapped for a particular IO
> region, the PCI IO resource would also not have been inserted, and so a
> suitable child region will not exist. As such,
> request_{muxed_}region() calls will fail.
> 
> A side note: there are many drivers in the kernel which fail to even call
> request_{muxed_}region() prior to IO port accesses, and they also need to
> be fixed (to call request_{muxed_}region(), as appropriate) separately.
> 
> [1] https://www.spinics.net/lists/linux-pci/msg49821.html

Please use a https://lore.kernel.org/ URL instead of spinics.net.

> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  include/linux/ioport.h | 12 +++++++++---
>  kernel/resource.c      | 28 ++++++++++++++++++++++++++++
>  2 files changed, 37 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index da0ebaec25f0..d7b7e1e08291 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -217,19 +217,25 @@ static inline bool resource_contains(struct resource *r1, struct resource *r2)
>  
>  
>  /* Convenience shorthand with allocation */
> -#define request_region(start,n,name)		__request_region(&ioport_resource, (start), (n), (name), 0)
> -#define request_muxed_region(start,n,name)	__request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
> +#define request_region(start,n,name)		__request_region_from_children(&ioport_resource, (start), (n), (name), 0)
> +#define request_muxed_region(start,n,name)	__request_region_from_children(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
>  #define __request_mem_region(start,n,name, excl) __request_region(&iomem_resource, (start), (n), (name), excl)
>  #define request_mem_region(start,n,name) __request_region(&iomem_resource, (start), (n), (name), 0)
>  #define request_mem_region_exclusive(start,n,name) \
>  	__request_region(&iomem_resource, (start), (n), (name), IORESOURCE_EXCLUSIVE)
>  #define rename_region(region, newname) do { (region)->name = (newname); } while (0)
>  
> -extern struct resource * __request_region(struct resource *,
> +extern struct resource *__request_region(struct resource *,
>  					resource_size_t start,
>  					resource_size_t n,
>  					const char *name, int flags);
>  
> +extern struct resource *__request_region_from_children(struct resource *,
> +					resource_size_t start,
> +					resource_size_t n,
> +					const char *name, int flags);
> +
> +
>  /* Compatibility cruft */
>  #define release_region(start,n)	__release_region(&ioport_resource, (start), (n))
>  #define release_mem_region(start,n)	__release_region(&iomem_resource, (start), (n))
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 92190f62ebc5..87ed200eda8b 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -1097,6 +1097,34 @@ resource_size_t resource_alignment(struct resource *res)
>  
>  static DECLARE_WAIT_QUEUE_HEAD(muxed_resource_wait);
>  
> +/**
> + * __request_region_from_children - create a new busy region from a child
> + * @parent: parent resource descriptor
> + * @start: resource start address
> + * @n: resource region size
> + * @name: reserving caller's ID string
> + * @flags: IO resource flags
> + */
> +struct resource *__request_region_from_children(struct resource *parent,
> +						resource_size_t start,
> +						resource_size_t n,
> +						const char *name, int flags)
> +{
> +	struct resource *res = __request_region(parent, start, n, name, flags);
> +
> +	if (res && res->parent == parent) {
> +		/*
> +		 * This is a direct descendent of the parent, which is
> +		 * what we didn't want.
> +		 */
> +		__release_region(parent, start, n);
> +		res = NULL;
> +	}
> +
> +	return res;
> +}
> +EXPORT_SYMBOL(__request_region_from_children);
> +
>  /**
>   * __request_region - create a new busy resource region
>   * @parent: parent resource descriptor
> -- 
> 2.17.1
>
John Garry March 26, 2019, 4:33 p.m. UTC | #2
On 25/03/2019 23:32, Bjorn Helgaas wrote:
> Hi John,
>

Hi Bjorn,

Thanks for reviewing this.

> On Thu, Mar 21, 2019 at 02:14:08AM +0800, John Garry wrote:
>> Currently when we request an IO port region, the request is made directly
>> to the top resource, ioport_resource.
>
> Let's be explicit here, e.g.,
>
>   Currently request_region() requests an IO port region directly from the
>   top resource, ioport_resource.

ok

>
>> There is an issue here, in that drivers may successfully request an IO
>> port region even if the IO port region has not even been mapped in
>> (in pci_remap_iospace()).
>>
>> This may lead to crashes when the system has no PCI host, or, has a host
>> but it has failed enumeration, while drivers still attempt to access PCI
>> IO ports, as below:
>
> I don't understand the strategy here.  f71882fg is not a driver for a
> PCI device, so it should work even if there is no PCI host in the
> system.

 From my checking, the f71882fg hwmon is accessed via the super-io 
interface on the PCH on x86. The super-io interface is at fixed 
addresses, those being 0x2e and 0x4e.

Please see the following:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/hwmon/f71805f.c?h=v5.1-rc2#n1621

and

https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/8-series-chipset-pch-datasheet.pdf 
(Table 9.2).

On x86 systems, these PCH IO ports will be mapped on a PCI bus, like:

$more /proc/ioports
0000-0cf7 : PCI Bus 0000:00
   0000-001f : dma1
   0020-0021 : pic1
   0040-0043 : timer0
   0050-0053 : timer1
   0060-0060 : keyboard
   0064-0064 : keyboard
   0070-0077 : rtc0
   0080-008f : dma page reg
   00a0-00a1 : pic2
   00c0-00df : dma2
   00f0-00ff : fpu

So, the idea in the patch is that if PCI Bus 0000:00 does not exist 
because of no PCI host, then we should fail a request to an IO port region.

>
> On x86, I think inb/inw/inl from a port where nothing responds
> probably just returns ~0, and outb/outw/outl just get dropped.
> Shouldn't arm64 do the same, without crashing?

That would be ideal and we're doing something similar in patch 2/3.

So on ARM64 we have to IO remap the PCI IO resource. If this mapping is 
not done (due to no PCI host), then any inb/inw/inl calls will crash the 
system.

So in patch 2/3, I am also making the change to the logical PIO 
inb/inw/inl accessors to discard accesses when no PCI MMIO regions are 
registered in logical PIO space.

This is really a second line of defense (this patch being the first).

>
>> root@(none)$root@(none)$ insmod f71882fg.ko
>> [  152.215377] Unable to handle kernel paging request at virtual address ffff7dfffee0002e
>> [  152.231299] Mem abort info:
>> [  152.236898]   ESR = 0x96000046
>> [  152.243019]   Exception class = DABT (current EL), IL = 32 bits
>> [  152.254905]   SET = 0, FnV = 0
>> [  152.261024]   EA = 0, S1PTW = 0
>> [  152.267320] Data abort info:
>> [  152.273091]   ISV = 0, ISS = 0x00000046
>> [  152.280784]   CM = 0, WnR = 1
>> [  152.286730] swapper pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>> [  152.300537] [ffff7dfffee0002e] pgd=000000000141c003, pud=000000000141d003, pmd=0000000000000000
>> [  152.318016] Internal error: Oops: 96000046 [#1] PREEMPT SMP
>> [  152.329199] Modules linked in: f71882fg(+)
>> [  152.337415] CPU: 8 PID: 2732 Comm: insmod Not tainted 5.1.0-rc1-00002-gab1a0e9200b8-dirty #102
>> [  152.354712] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
>> [  152.373058] pstate: 80000005 (Nzcv daif -PAN -UAO)
>> [  152.382675] pc : logic_outb+0x54/0xb8
>> [  152.390017] lr : f71882fg_find+0x64/0x390 [f71882fg]
>> [  152.399977] sp : ffff000013393aa0
>> [  152.406618] x29: ffff000013393aa0 x28: ffff000008b98b10
>> [  152.417278] x27: ffff000013393df0 x26: 0000000000000100
>> [  152.427938] x25: ffff801f8c872d30 x24: ffff000011420000
>> [  152.438598] x23: ffff801fb49d2940 x22: ffff000011291000
>> [  152.449257] x21: 000000000000002e x20: 0000000000000087
>> [  152.459917] x19: ffff000013393b44 x18: ffffffffffffffff
>> [  152.470577] x17: 0000000000000000 x16: 0000000000000000
>> [  152.481236] x15: ffff00001127d6c8 x14: ffff801f8cfd691c
>> [  152.491896] x13: 0000000000000000 x12: 0000000000000000
>> [  152.502555] x11: 0000000000000003 x10: 0000801feace2000
>> [  152.513215] x9 : 0000000000000000 x8 : ffff841fa654f280
>> [  152.523874] x7 : 0000000000000000 x6 : 0000000000ffc0e3
>> [  152.534534] x5 : ffff000011291360 x4 : ffff801fb4949f00
>> [  152.545194] x3 : 0000000000ffbffe x2 : 76e767a63713d500
>> [  152.555853] x1 : ffff7dfffee0002e x0 : ffff7dfffee00000
>> [  152.566514] Process insmod (pid: 2732, stack limit = 0x(____ptrval____))
>> [  152.579968] Call trace:
>> [  152.584863]  logic_outb+0x54/0xb8
>> [  152.591506]  f71882fg_find+0x64/0x390 [f71882fg]
>> [  152.600768]  f71882fg_init+0x38/0xc70 [f71882fg]
>> [  152.610031]  do_one_initcall+0x5c/0x198
>> [  152.617723]  do_init_module+0x54/0x1b0
>> [  152.625237]  load_module+0x1dc4/0x2158
>> [  152.632752]  __se_sys_init_module+0x14c/0x1e8
>> [  152.641490]  __arm64_sys_init_module+0x18/0x20
>> [  152.650404]  el0_svc_common+0x5c/0x100
>> [  152.657919]  el0_svc_handler+0x2c/0x80
>> [  152.665433]  el0_svc+0x8/0xc
>> [  152.671202] Code: d2bfdc00 f2cfbfe0 f2ffffe0 8b000021 (39000034)
>> [  152.683434] ---[ end trace fd4f35b610829a48 ]---
>> Segmentation fault
>> root@(none)$
>
> Please remove the timestamps (because they don't contribute useful
> information) and indent the example a couple spaces (which is
> conventional for quoted material).

ok

>
>> Note that the f71882fg driver correctly calls request_muxed_region().
>>
>> This issue was originally reported in [1].
>>
>> This patch changes the functionality of request{muxed_}_region() to
>> request a region from a direct child descendent of the top
>> ioport_resource.
>>
>> In this, if the IO port region has not been mapped for a particular IO
>> region, the PCI IO resource would also not have been inserted, and so a
>> suitable child region will not exist. As such,
>> request_{muxed_}region() calls will fail.
>>
>> A side note: there are many drivers in the kernel which fail to even call
>> request_{muxed_}region() prior to IO port accesses, and they also need to
>> be fixed (to call request_{muxed_}region(), as appropriate) separately.
>>
>> [1] https://www.spinics.net/lists/linux-pci/msg49821.html
>
> Please use a https://lore.kernel.org/ URL instead of spinics.net.

ok, I hope that I can find this old thread.

>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---

Thanks!

>>  include/linux/ioport.h | 12 +++++++++---
>>  kernel/resource.c      | 28 ++++++++++++++++++++++++++++
>>  2 files changed, 37 insertions(+), 3 deletions(-)
>>

Leaving remaing text as a reference.

>> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
>> index da0ebaec25f0..d7b7e1e08291 100644
>> --- a/include/linux/ioport.h
>> +++ b/include/linux/ioport.h
>> @@ -217,19 +217,25 @@ static inline bool resource_contains(struct resource *r1, struct resource *r2)
>>
>>
>>  /* Convenience shorthand with allocation */
>> -#define request_region(start,n,name)		__request_region(&ioport_resource, (start), (n), (name), 0)
>> -#define request_muxed_region(start,n,name)	__request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
>> +#define request_region(start,n,name)		__request_region_from_children(&ioport_resource, (start), (n), (name), 0)
>> +#define request_muxed_region(start,n,name)	__request_region_from_children(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
>>  #define __request_mem_region(start,n,name, excl) __request_region(&iomem_resource, (start), (n), (name), excl)
>>  #define request_mem_region(start,n,name) __request_region(&iomem_resource, (start), (n), (name), 0)
>>  #define request_mem_region_exclusive(start,n,name) \
>>  	__request_region(&iomem_resource, (start), (n), (name), IORESOURCE_EXCLUSIVE)
>>  #define rename_region(region, newname) do { (region)->name = (newname); } while (0)
>>
>> -extern struct resource * __request_region(struct resource *,
>> +extern struct resource *__request_region(struct resource *,
>>  					resource_size_t start,
>>  					resource_size_t n,
>>  					const char *name, int flags);
>>
>> +extern struct resource *__request_region_from_children(struct resource *,
>> +					resource_size_t start,
>> +					resource_size_t n,
>> +					const char *name, int flags);
>> +
>> +
>>  /* Compatibility cruft */
>>  #define release_region(start,n)	__release_region(&ioport_resource, (start), (n))
>>  #define release_mem_region(start,n)	__release_region(&iomem_resource, (start), (n))
>> diff --git a/kernel/resource.c b/kernel/resource.c
>> index 92190f62ebc5..87ed200eda8b 100644
>> --- a/kernel/resource.c
>> +++ b/kernel/resource.c
>> @@ -1097,6 +1097,34 @@ resource_size_t resource_alignment(struct resource *res)
>>
>>  static DECLARE_WAIT_QUEUE_HEAD(muxed_resource_wait);
>>
>> +/**
>> + * __request_region_from_children - create a new busy region from a child
>> + * @parent: parent resource descriptor
>> + * @start: resource start address
>> + * @n: resource region size
>> + * @name: reserving caller's ID string
>> + * @flags: IO resource flags
>> + */
>> +struct resource *__request_region_from_children(struct resource *parent,
>> +						resource_size_t start,
>> +						resource_size_t n,
>> +						const char *name, int flags)
>> +{
>> +	struct resource *res = __request_region(parent, start, n, name, flags);
>> +
>> +	if (res && res->parent == parent) {
>> +		/*
>> +		 * This is a direct descendent of the parent, which is
>> +		 * what we didn't want.
>> +		 */
>> +		__release_region(parent, start, n);
>> +		res = NULL;
>> +	}
>> +
>> +	return res;
>> +}
>> +EXPORT_SYMBOL(__request_region_from_children);
>> +
>>  /**
>>   * __request_region - create a new busy resource region
>>   * @parent: parent resource descriptor
>> --
>> 2.17.1
>>
>
> .
>
Bjorn Helgaas March 26, 2019, 10:48 p.m. UTC | #3
[+cc Catalin, Will, linux-arm-kernel]

On Tue, Mar 26, 2019 at 04:33:55PM +0000, John Garry wrote:
> On 25/03/2019 23:32, Bjorn Helgaas wrote:
> > On Thu, Mar 21, 2019 at 02:14:08AM +0800, John Garry wrote:
> > > Currently when we request an IO port region, the request is made directly
> > > to the top resource, ioport_resource.
> > 
> > Let's be explicit here, e.g.,
> > 
> >   Currently request_region() requests an IO port region directly from the
> >   top resource, ioport_resource.
> 
> ok
> 
> > > There is an issue here, in that drivers may successfully request an IO
> > > port region even if the IO port region has not even been mapped in
> > > (in pci_remap_iospace()).
> > > 
> > > This may lead to crashes when the system has no PCI host, or, has a host
> > > but it has failed enumeration, while drivers still attempt to access PCI
> > > IO ports, as below:
> > 
> > I don't understand the strategy here.  f71882fg is not a driver for a
> > PCI device, so it should work even if there is no PCI host in the
> > system.
> 
> From my checking, the f71882fg hwmon is accessed via the super-io interface
> on the PCH on x86. The super-io interface is at fixed addresses, those being
> 0x2e and 0x4e.
> 
> Please see the following:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/hwmon/f71805f.c?h=v5.1-rc2#n1621
> 
> and
> 
> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/8-series-chipset-pch-datasheet.pdf
> (Table 9.2).
> 
> On x86 systems, these PCH IO ports will be mapped on a PCI bus, like:
> 
> $more /proc/ioports
> 0000-0cf7 : PCI Bus 0000:00
>   0000-001f : dma1
>   0020-0021 : pic1
>   0040-0043 : timer0
>   0050-0053 : timer1
>   0060-0060 : keyboard
>   0064-0064 : keyboard
>   0070-0077 : rtc0
>   0080-008f : dma page reg
>   00a0-00a1 : pic2
>   00c0-00df : dma2
>   00f0-00ff : fpu
> 
> So, the idea in the patch is that if PCI Bus 0000:00 does not exist because
> of no PCI host, then we should fail a request to an IO port region.

I'm not convinced about this last sentence.

It's true that on most modern systems, including that Intel PCH, the
Super I/O controller is attached via an LPC bridge on a PCI bus.

But I don't think it's an actual requirement that PCI be involved.
There certainly once were systems, e.g., PC/104, that had ISA devices
but no PCI.  Maybe Super I/O attached via ISA is obsolete enough that
we don't care any more, but I really don't know.

> > On x86, I think inb/inw/inl from a port where nothing responds
> > probably just returns ~0, and outb/outw/outl just get dropped.
> > Shouldn't arm64 do the same, without crashing?
> 
> That would be ideal and we're doing something similar in patch 2/3.
> 
> So on ARM64 we have to IO remap the PCI IO resource. If this mapping is not
> done (due to no PCI host), then any inb/inw/inl calls will crash the system.

My take is that ARM64 is responsible for implementing inb/inw/inl in
such a way that they don't crash.  I don't think it's practical to
update all the old ISA drivers or even the core code to work around
that.

> So in patch 2/3, I am also making the change to the logical PIO inb/inw/inl
> accessors to discard accesses when no PCI MMIO regions are registered in
> logical PIO space.
> 
> This is really a second line of defense (this patch being the first).
> 
> > > root@(none)$root@(none)$ insmod f71882fg.ko
> > > [  152.215377] Unable to handle kernel paging request at virtual address ffff7dfffee0002e
> > > [  152.231299] Mem abort info:
> > > [  152.236898]   ESR = 0x96000046
> > > [  152.243019]   Exception class = DABT (current EL), IL = 32 bits
> > > [  152.254905]   SET = 0, FnV = 0
> > > [  152.261024]   EA = 0, S1PTW = 0
> > > [  152.267320] Data abort info:
> > > [  152.273091]   ISV = 0, ISS = 0x00000046
> > > [  152.280784]   CM = 0, WnR = 1
> > > [  152.286730] swapper pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
> > > [  152.300537] [ffff7dfffee0002e] pgd=000000000141c003, pud=000000000141d003, pmd=0000000000000000
> > > [  152.318016] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> > > [  152.329199] Modules linked in: f71882fg(+)
> > > [  152.337415] CPU: 8 PID: 2732 Comm: insmod Not tainted 5.1.0-rc1-00002-gab1a0e9200b8-dirty #102
> > > [  152.354712] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
> > > [  152.373058] pstate: 80000005 (Nzcv daif -PAN -UAO)
> > > [  152.382675] pc : logic_outb+0x54/0xb8
> > > [  152.390017] lr : f71882fg_find+0x64/0x390 [f71882fg]
> > > [  152.399977] sp : ffff000013393aa0
> > > [  152.406618] x29: ffff000013393aa0 x28: ffff000008b98b10
> > > [  152.417278] x27: ffff000013393df0 x26: 0000000000000100
> > > [  152.427938] x25: ffff801f8c872d30 x24: ffff000011420000
> > > [  152.438598] x23: ffff801fb49d2940 x22: ffff000011291000
> > > [  152.449257] x21: 000000000000002e x20: 0000000000000087
> > > [  152.459917] x19: ffff000013393b44 x18: ffffffffffffffff
> > > [  152.470577] x17: 0000000000000000 x16: 0000000000000000
> > > [  152.481236] x15: ffff00001127d6c8 x14: ffff801f8cfd691c
> > > [  152.491896] x13: 0000000000000000 x12: 0000000000000000
> > > [  152.502555] x11: 0000000000000003 x10: 0000801feace2000
> > > [  152.513215] x9 : 0000000000000000 x8 : ffff841fa654f280
> > > [  152.523874] x7 : 0000000000000000 x6 : 0000000000ffc0e3
> > > [  152.534534] x5 : ffff000011291360 x4 : ffff801fb4949f00
> > > [  152.545194] x3 : 0000000000ffbffe x2 : 76e767a63713d500
> > > [  152.555853] x1 : ffff7dfffee0002e x0 : ffff7dfffee00000
> > > [  152.566514] Process insmod (pid: 2732, stack limit = 0x(____ptrval____))
> > > [  152.579968] Call trace:
> > > [  152.584863]  logic_outb+0x54/0xb8
> > > [  152.591506]  f71882fg_find+0x64/0x390 [f71882fg]
> > > [  152.600768]  f71882fg_init+0x38/0xc70 [f71882fg]
> > > [  152.610031]  do_one_initcall+0x5c/0x198
> > > [  152.617723]  do_init_module+0x54/0x1b0
> > > [  152.625237]  load_module+0x1dc4/0x2158
> > > [  152.632752]  __se_sys_init_module+0x14c/0x1e8
> > > [  152.641490]  __arm64_sys_init_module+0x18/0x20
> > > [  152.650404]  el0_svc_common+0x5c/0x100
> > > [  152.657919]  el0_svc_handler+0x2c/0x80
> > > [  152.665433]  el0_svc+0x8/0xc
> > > [  152.671202] Code: d2bfdc00 f2cfbfe0 f2ffffe0 8b000021 (39000034)
> > > [  152.683434] ---[ end trace fd4f35b610829a48 ]---
> > > Segmentation fault
> > > root@(none)$
> > 
> > > Note that the f71882fg driver correctly calls request_muxed_region().
> > > 
> > > This issue was originally reported in [1].
> > > 
> > > This patch changes the functionality of request{muxed_}_region() to
> > > request a region from a direct child descendent of the top
> > > ioport_resource.
> > > 
> > > In this, if the IO port region has not been mapped for a particular IO
> > > region, the PCI IO resource would also not have been inserted, and so a
> > > suitable child region will not exist. As such,
> > > request_{muxed_}region() calls will fail.
> > > 
> > > A side note: there are many drivers in the kernel which fail to even call
> > > request_{muxed_}region() prior to IO port accesses, and they also need to
> > > be fixed (to call request_{muxed_}region(), as appropriate) separately.
> > > 
> > > [1] https://www.spinics.net/lists/linux-pci/msg49821.html
> > 
> > Please use a https://lore.kernel.org/ URL instead of spinics.net.
> 
> ok, I hope that I can find this old thread.

The beauty of lore.kernel.org is that the URL contains the Message-ID, so
it's easy build the URL and it would contain useful information even if
lore.kernel.org disappeared:

https://lore.kernel.org/linux-pci/56F209A9.4040304@huawei.com

Bjorn

> > > Signed-off-by: John Garry <john.garry@huawei.com>
> > > ---
> 
> Thanks!
> 
> > >  include/linux/ioport.h | 12 +++++++++---
> > >  kernel/resource.c      | 28 ++++++++++++++++++++++++++++
> > >  2 files changed, 37 insertions(+), 3 deletions(-)
> > > 
> 
> Leaving remaing text as a reference.
> 
> > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > > index da0ebaec25f0..d7b7e1e08291 100644
> > > --- a/include/linux/ioport.h
> > > +++ b/include/linux/ioport.h
> > > @@ -217,19 +217,25 @@ static inline bool resource_contains(struct resource *r1, struct resource *r2)
> > > 
> > > 
> > >  /* Convenience shorthand with allocation */
> > > -#define request_region(start,n,name)		__request_region(&ioport_resource, (start), (n), (name), 0)
> > > -#define request_muxed_region(start,n,name)	__request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
> > > +#define request_region(start,n,name)		__request_region_from_children(&ioport_resource, (start), (n), (name), 0)
> > > +#define request_muxed_region(start,n,name)	__request_region_from_children(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
> > >  #define __request_mem_region(start,n,name, excl) __request_region(&iomem_resource, (start), (n), (name), excl)
> > >  #define request_mem_region(start,n,name) __request_region(&iomem_resource, (start), (n), (name), 0)
> > >  #define request_mem_region_exclusive(start,n,name) \
> > >  	__request_region(&iomem_resource, (start), (n), (name), IORESOURCE_EXCLUSIVE)
> > >  #define rename_region(region, newname) do { (region)->name = (newname); } while (0)
> > > 
> > > -extern struct resource * __request_region(struct resource *,
> > > +extern struct resource *__request_region(struct resource *,
> > >  					resource_size_t start,
> > >  					resource_size_t n,
> > >  					const char *name, int flags);
> > > 
> > > +extern struct resource *__request_region_from_children(struct resource *,
> > > +					resource_size_t start,
> > > +					resource_size_t n,
> > > +					const char *name, int flags);
> > > +
> > > +
> > >  /* Compatibility cruft */
> > >  #define release_region(start,n)	__release_region(&ioport_resource, (start), (n))
> > >  #define release_mem_region(start,n)	__release_region(&iomem_resource, (start), (n))
> > > diff --git a/kernel/resource.c b/kernel/resource.c
> > > index 92190f62ebc5..87ed200eda8b 100644
> > > --- a/kernel/resource.c
> > > +++ b/kernel/resource.c
> > > @@ -1097,6 +1097,34 @@ resource_size_t resource_alignment(struct resource *res)
> > > 
> > >  static DECLARE_WAIT_QUEUE_HEAD(muxed_resource_wait);
> > > 
> > > +/**
> > > + * __request_region_from_children - create a new busy region from a child
> > > + * @parent: parent resource descriptor
> > > + * @start: resource start address
> > > + * @n: resource region size
> > > + * @name: reserving caller's ID string
> > > + * @flags: IO resource flags
> > > + */
> > > +struct resource *__request_region_from_children(struct resource *parent,
> > > +						resource_size_t start,
> > > +						resource_size_t n,
> > > +						const char *name, int flags)
> > > +{
> > > +	struct resource *res = __request_region(parent, start, n, name, flags);
> > > +
> > > +	if (res && res->parent == parent) {
> > > +		/*
> > > +		 * This is a direct descendent of the parent, which is
> > > +		 * what we didn't want.
> > > +		 */
> > > +		__release_region(parent, start, n);
> > > +		res = NULL;
> > > +	}
> > > +
> > > +	return res;
> > > +}
> > > +EXPORT_SYMBOL(__request_region_from_children);
> > > +
> > >  /**
> > >   * __request_region - create a new busy resource region
> > >   * @parent: parent resource descriptor
> > > --
> > > 2.17.1
> > > 
> > 
> > .
> > 
> 
>
John Garry March 27, 2019, 11:24 a.m. UTC | #4
On 26/03/2019 22:48, Bjorn Helgaas wrote:
> [+cc Catalin, Will, linux-arm-kernel]
>
>> From my checking, the f71882fg hwmon is accessed via the super-io interface
>> on the PCH on x86. The super-io interface is at fixed addresses, those being
>> 0x2e and 0x4e.
>>
>> Please see the following:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/hwmon/f71805f.c?h=v5.1-rc2#n1621
>>
>> and
>>
>> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/8-series-chipset-pch-datasheet.pdf
>> (Table 9.2).
>>
>> On x86 systems, these PCH IO ports will be mapped on a PCI bus, like:
>>
>> $more /proc/ioports
>> 0000-0cf7 : PCI Bus 0000:00
>>   0000-001f : dma1
>>   0020-0021 : pic1
>>   0040-0043 : timer0
>>   0050-0053 : timer1
>>   0060-0060 : keyboard
>>   0064-0064 : keyboard
>>   0070-0077 : rtc0
>>   0080-008f : dma page reg
>>   00a0-00a1 : pic2
>>   00c0-00df : dma2
>>   00f0-00ff : fpu
>>
>> So, the idea in the patch is that if PCI Bus 0000:00 does not exist because
>> of no PCI host, then we should fail a request to an IO port region.
>

Hi Bjorn,

> I'm not convinced about this last sentence.
>
> It's true that on most modern systems, including that Intel PCH, the
> Super I/O controller is attached via an LPC bridge on a PCI bus.
>
> But I don't think it's an actual requirement that PCI be involved.
> There certainly once were systems, e.g., PC/104, that had ISA devices
> but no PCI.  Maybe Super I/O attached via ISA is obsolete enough that
> we don't care any more, but I really don't know.

OK, fine. So if this is true, then this patch falls apart.

However I don't know for sure either. I would still like to think that 
these legacy ISA system should still insert a bus resource under 
ioport_resource, from which devices on that bus should request resources.

>
>>> On x86, I think inb/inw/inl from a port where nothing responds
>>> probably just returns ~0, and outb/outw/outl just get dropped.
>>> Shouldn't arm64 do the same, without crashing?
>>
>> That would be ideal and we're doing something similar in patch 2/3.
>>
>> So on ARM64 we have to IO remap the PCI IO resource. If this mapping is not
>> done (due to no PCI host), then any inb/inw/inl calls will crash the system.
>
> My take is that ARM64 is responsible for implementing inb/inw/inl in
> such a way that they don't crash.  I don't think it's practical to
> update all the old ISA drivers or even the core code to work around
> that.

As I mentioned below, I was actually also fixing up inb/inw/inl et al 
for arm64 such that they don't crash the system in this case. This was 
in patch 2/3.

So on arm64 - which defines PCI_IOBASE - we need to IO remap the PCI IO 
space resource. If this is not done and we access PCI IO space, then we 
crash.

However with the introduction of logical PIO space in commit 
031e3601869c, we can test this mapping by ensuring that we have a 
logical PIO region registered. If there is none, then we can discard the 
access.

However this would only be for when INDIRECT_PIO is defined. Maybe I can 
make it work for when INDIRECT_PIO is not defined, or even drop 
!INDIRECT_PIO support.

A final note on hwmon f71882fg: even with the change in 2/3, this driver 
still accesses IO ports 0x2e and 0x4e, which would not be a PCH fixed IO 
port on !x86 systems, so far from ideal.

I saw that in commit 746cdfbf01c0 ("hwmon: Avoid building drivers for 
powerpc that read/write ISA addresses"), PPC would not build these 
drivers, as, like arm, it has no native ISA.

>
>> So in patch 2/3, I am also making the change to the logical PIO inb/inw/inl
>> accessors to discard accesses when no PCI MMIO regions are registered in
>> logical PIO space.
>>
>> This is really a second line of defense (this patch being the first).
>>
>>>> root@(none)$root@(none)$ insmod f71882fg.ko
>>>> [  152.215377] Unable to handle kernel paging request at virtual address ffff7dfffee0002e
>>>> [  152.231299] Mem abort info:
>>>> [  152.236898]   ESR = 0x96000046
>>>> [  152.243019]   Exception class = DABT (current EL), IL = 32 bits
>>>> [  152.254905]   SET = 0, FnV = 0
>>>> [  152.261024]   EA = 0, S1PTW = 0
>>>> [  152.267320] Data abort info:
>>>> [  152.273091]   ISV = 0, ISS = 0x00000046
>>>> [  152.280784]   CM = 0, WnR = 1
>>>> [  152.286730] swapper pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>>>> [  152.300537] [ffff7dfffee0002e] pgd=000000000141c003, pud=000000000141d003, pmd=0000000000000000
>>>> [  152.318016] Internal error: Oops: 96000046 [#1] PREEMPT SMP
>>>> [  152.329199] Modules linked in: f71882fg(+)
>>>> [  152.337415] CPU: 8 PID: 2732 Comm: insmod Not tainted 5.1.0-rc1-00002-gab1a0e9200b8-dirty #102
>>>> [  152.354712] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
>>>> [  152.373058] pstate: 80000005 (Nzcv daif -PAN -UAO)
>>>> [  152.382675] pc : logic_outb+0x54/0xb8
>>>> [  152.390017] lr : f71882fg_find+0x64/0x390 [f71882fg]
>>>> [  152.399977] sp : ffff000013393aa0
>>>> [  152.406618] x29: ffff000013393aa0 x28: ffff000008b98b10
>>>> [  152.417278] x27: ffff000013393df0 x26: 0000000000000100
>>>> [  152.427938] x25: ffff801f8c872d30 x24: ffff000011420000
>>>> [  152.438598] x23: ffff801fb49d2940 x22: ffff000011291000
>>>> [  152.449257] x21: 000000000000002e x20: 0000000000000087
>>>> [  152.459917] x19: ffff000013393b44 x18: ffffffffffffffff
>>>> [  152.470577] x17: 0000000000000000 x16: 0000000000000000
>>>> [  152.481236] x15: ffff00001127d6c8 x14: ffff801f8cfd691c
>>>> [  152.491896] x13: 0000000000000000 x12: 0000000000000000
>>>> [  152.502555] x11: 0000000000000003 x10: 0000801feace2000
>>>> [  152.513215] x9 : 0000000000000000 x8 : ffff841fa654f280
>>>> [  152.523874] x7 : 0000000000000000 x6 : 0000000000ffc0e3
>>>> [  152.534534] x5 : ffff000011291360 x4 : ffff801fb4949f00
>>>> [  152.545194] x3 : 0000000000ffbffe x2 : 76e767a63713d500
>>>> [  152.555853] x1 : ffff7dfffee0002e x0 : ffff7dfffee00000
>>>> [  152.566514] Process insmod (pid: 2732, stack limit = 0x(____ptrval____))
>>>> [  152.579968] Call trace:
>>>> [  152.584863]  logic_outb+0x54/0xb8
>>>> [  152.591506]  f71882fg_find+0x64/0x390 [f71882fg]
>>>> [  152.600768]  f71882fg_init+0x38/0xc70 [f71882fg]
>>>> [  152.610031]  do_one_initcall+0x5c/0x198
>>>> [  152.617723]  do_init_module+0x54/0x1b0
>>>> [  152.625237]  load_module+0x1dc4/0x2158
>>>> [  152.632752]  __se_sys_init_module+0x14c/0x1e8
>>>> [  152.641490]  __arm64_sys_init_module+0x18/0x20
>>>> [  152.650404]  el0_svc_common+0x5c/0x100
>>>> [  152.657919]  el0_svc_handler+0x2c/0x80
>>>> [  152.665433]  el0_svc+0x8/0xc
>>>> [  152.671202] Code: d2bfdc00 f2cfbfe0 f2ffffe0 8b000021 (39000034)
>>>> [  152.683434] ---[ end trace fd4f35b610829a48 ]---
>>>> Segmentation fault
>>>> root@(none)$
>>>
>>>> Note that the f71882fg driver correctly calls request_muxed_region().
>>>>
>>>> This issue was originally reported in [1].
>>>>
>>>> This patch changes the functionality of request{muxed_}_region() to
>>>> request a region from a direct child descendent of the top
>>>> ioport_resource.
>>>>
>>>> In this, if the IO port region has not been mapped for a particular IO
>>>> region, the PCI IO resource would also not have been inserted, and so a
>>>> suitable child region will not exist. As such,
>>>> request_{muxed_}region() calls will fail.
>>>>
>>>> A side note: there are many drivers in the kernel which fail to even call
>>>> request_{muxed_}region() prior to IO port accesses, and they also need to
>>>> be fixed (to call request_{muxed_}region(), as appropriate) separately.
>>>>
>>>> [1] https://www.spinics.net/lists/linux-pci/msg49821.html
>>>
>>> Please use a https://lore.kernel.org/ URL instead of spinics.net.
>>
>> ok, I hope that I can find this old thread.
>
> The beauty of lore.kernel.org is that the URL contains the Message-ID, so
> it's easy build the URL and it would contain useful information even if
> lore.kernel.org disappeared:
>
> https://lore.kernel.org/linux-pci/56F209A9.4040304@huawei.com
>

ok, great.

Thanks again,
John

> Bjorn
>
Lorenzo Pieralisi March 28, 2019, 5:46 p.m. UTC | #5
On Tue, Mar 26, 2019 at 05:48:10PM -0500, Bjorn Helgaas wrote:

[...]

> I'm not convinced about this last sentence.
> 
> It's true that on most modern systems, including that Intel PCH, the
> Super I/O controller is attached via an LPC bridge on a PCI bus.
> 
> But I don't think it's an actual requirement that PCI be involved.
> There certainly once were systems, e.g., PC/104, that had ISA devices
> but no PCI.  Maybe Super I/O attached via ISA is obsolete enough that
> we don't care any more, but I really don't know.
> 
> > > On x86, I think inb/inw/inl from a port where nothing responds
> > > probably just returns ~0, and outb/outw/outl just get dropped.
> > > Shouldn't arm64 do the same, without crashing?
> > 
> > That would be ideal and we're doing something similar in patch 2/3.
> > 
> > So on ARM64 we have to IO remap the PCI IO resource. If this mapping is not
> > done (due to no PCI host), then any inb/inw/inl calls will crash the system.
> 
> My take is that ARM64 is responsible for implementing inb/inw/inl in
> such a way that they don't crash.  I don't think it's practical to
> update all the old ISA drivers or even the core code to work around
> that.

The problem is that those drivers are accessing a resource that does not
exist in practice, it is taken for granted on x86 systems (and on IA64)
because that was an actual bus (actual or emulated) and was made part of
the architecture. The ISA space is not necessarily tied to PCI,
at least not always.

Side note: these drivers can't be compiled on PPC, it would be
good to understand why, I have a hunch it can be related.

> > So in patch 2/3, I am also making the change to the logical PIO inb/inw/inl
> > accessors to discard accesses when no PCI MMIO regions are registered in
> > logical PIO space.
> > 
> > This is really a second line of defense (this patch being the first).
> > 
> > > > root@(none)$root@(none)$ insmod f71882fg.ko
> > > > [  152.215377] Unable to handle kernel paging request at virtual address ffff7dfffee0002e
> > > > [  152.231299] Mem abort info:
> > > > [  152.236898]   ESR = 0x96000046
> > > > [  152.243019]   Exception class = DABT (current EL), IL = 32 bits
> > > > [  152.254905]   SET = 0, FnV = 0
> > > > [  152.261024]   EA = 0, S1PTW = 0
> > > > [  152.267320] Data abort info:
> > > > [  152.273091]   ISV = 0, ISS = 0x00000046
> > > > [  152.280784]   CM = 0, WnR = 1
> > > > [  152.286730] swapper pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
> > > > [  152.300537] [ffff7dfffee0002e] pgd=000000000141c003, pud=000000000141d003, pmd=0000000000000000
> > > > [  152.318016] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> > > > [  152.329199] Modules linked in: f71882fg(+)
> > > > [  152.337415] CPU: 8 PID: 2732 Comm: insmod Not tainted 5.1.0-rc1-00002-gab1a0e9200b8-dirty #102
> > > > [  152.354712] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
> > > > [  152.373058] pstate: 80000005 (Nzcv daif -PAN -UAO)
> > > > [  152.382675] pc : logic_outb+0x54/0xb8
> > > > [  152.390017] lr : f71882fg_find+0x64/0x390 [f71882fg]
> > > > [  152.399977] sp : ffff000013393aa0
> > > > [  152.406618] x29: ffff000013393aa0 x28: ffff000008b98b10
> > > > [  152.417278] x27: ffff000013393df0 x26: 0000000000000100
> > > > [  152.427938] x25: ffff801f8c872d30 x24: ffff000011420000
> > > > [  152.438598] x23: ffff801fb49d2940 x22: ffff000011291000
> > > > [  152.449257] x21: 000000000000002e x20: 0000000000000087
> > > > [  152.459917] x19: ffff000013393b44 x18: ffffffffffffffff
> > > > [  152.470577] x17: 0000000000000000 x16: 0000000000000000
> > > > [  152.481236] x15: ffff00001127d6c8 x14: ffff801f8cfd691c
> > > > [  152.491896] x13: 0000000000000000 x12: 0000000000000000
> > > > [  152.502555] x11: 0000000000000003 x10: 0000801feace2000
> > > > [  152.513215] x9 : 0000000000000000 x8 : ffff841fa654f280
> > > > [  152.523874] x7 : 0000000000000000 x6 : 0000000000ffc0e3
> > > > [  152.534534] x5 : ffff000011291360 x4 : ffff801fb4949f00
> > > > [  152.545194] x3 : 0000000000ffbffe x2 : 76e767a63713d500
> > > > [  152.555853] x1 : ffff7dfffee0002e x0 : ffff7dfffee00000
> > > > [  152.566514] Process insmod (pid: 2732, stack limit = 0x(____ptrval____))
> > > > [  152.579968] Call trace:
> > > > [  152.584863]  logic_outb+0x54/0xb8
> > > > [  152.591506]  f71882fg_find+0x64/0x390 [f71882fg]
> > > > [  152.600768]  f71882fg_init+0x38/0xc70 [f71882fg]
> > > > [  152.610031]  do_one_initcall+0x5c/0x198
> > > > [  152.617723]  do_init_module+0x54/0x1b0
> > > > [  152.625237]  load_module+0x1dc4/0x2158
> > > > [  152.632752]  __se_sys_init_module+0x14c/0x1e8
> > > > [  152.641490]  __arm64_sys_init_module+0x18/0x20
> > > > [  152.650404]  el0_svc_common+0x5c/0x100
> > > > [  152.657919]  el0_svc_handler+0x2c/0x80
> > > > [  152.665433]  el0_svc+0x8/0xc
> > > > [  152.671202] Code: d2bfdc00 f2cfbfe0 f2ffffe0 8b000021 (39000034)
> > > > [  152.683434] ---[ end trace fd4f35b610829a48 ]---
> > > > Segmentation fault
> > > > root@(none)$
> > > 
> > > > Note that the f71882fg driver correctly calls request_muxed_region().
> > > > 
> > > > This issue was originally reported in [1].
> > > > 
> > > > This patch changes the functionality of request{muxed_}_region() to
> > > > request a region from a direct child descendent of the top
> > > > ioport_resource.
> > > > 
> > > > In this, if the IO port region has not been mapped for a particular IO
> > > > region, the PCI IO resource would also not have been inserted, and so a
> > > > suitable child region will not exist. As such,
> > > > request_{muxed_}region() calls will fail.
> > > > 
> > > > A side note: there are many drivers in the kernel which fail to even call
> > > > request_{muxed_}region() prior to IO port accesses, and they also need to
> > > > be fixed (to call request_{muxed_}region(), as appropriate) separately.
> > > > 
> > > > [1] https://www.spinics.net/lists/linux-pci/msg49821.html
> > > 
> > > Please use a https://lore.kernel.org/ URL instead of spinics.net.
> > 
> > ok, I hope that I can find this old thread.
> 
> The beauty of lore.kernel.org is that the URL contains the Message-ID, so
> it's easy build the URL and it would contain useful information even if
> lore.kernel.org disappeared:
> 
> https://lore.kernel.org/linux-pci/56F209A9.4040304@huawei.com

Yes, the bottom line is what Arnd outlined in the thread above.

ISA IO port space is not necessarily PCI but it does not exist
architecturally on ARM systems.

Taking the example of IA64, the ISA space is memory mapped (like any
other arch except for x86) but IIUC the virtual mapping for the ISA
port space _always_ exists on IA64 so this issue won't happen.

Arnd pointed out a solution in the thread above but I need to check
if that's feasible.

Lorenzo
John Garry March 29, 2019, 10:42 a.m. UTC | #6
On 28/03/2019 17:46, Lorenzo Pieralisi wrote:
> On Tue, Mar 26, 2019 at 05:48:10PM -0500, Bjorn Helgaas wrote:
>
> [...]
>

Hi Lorenzo,


>> I'm not convinced about this last sentence.
>>
>> It's true that on most modern systems, including that Intel PCH, the
>> Super I/O controller is attached via an LPC bridge on a PCI bus.
>>
>> But I don't think it's an actual requirement that PCI be involved.
>> There certainly once were systems, e.g., PC/104, that had ISA devices
>> but no PCI.  Maybe Super I/O attached via ISA is obsolete enough that
>> we don't care any more, but I really don't know.
>>
>>>> On x86, I think inb/inw/inl from a port where nothing responds
>>>> probably just returns ~0, and outb/outw/outl just get dropped.
>>>> Shouldn't arm64 do the same, without crashing?
>>>
>>> That would be ideal and we're doing something similar in patch 2/3.
>>>
>>> So on ARM64 we have to IO remap the PCI IO resource. If this mapping is not
>>> done (due to no PCI host), then any inb/inw/inl calls will crash the system.
>>
>> My take is that ARM64 is responsible for implementing inb/inw/inl in
>> such a way that they don't crash.  I don't think it's practical to
>> update all the old ISA drivers or even the core code to work around
>> that.
>
> The problem is that those drivers are accessing a resource that does not
> exist in practice, it is taken for granted on x86 systems (and on IA64)
> because that was an actual bus (actual or emulated) and was made part of
> the architecture. The ISA space is not necessarily tied to PCI,
> at least not always.
>
> Side note: these drivers can't be compiled on PPC, it would be
> good to understand why, I have a hunch it can be related.

I mentioned this earlier:

I saw that in commits like 746cdfbf01c0 ("hwmon: Avoid building drivers 
forpowerpc that read/write ISA addresses"), PPC would not build these
drivers, as, like arm, it has no native ISA.

However I still don't think just avoiding compiling these drivers for 
certain archs solves the problem.

[...]

>>>>> [1] https://www.spinics.net/lists/linux-pci/msg49821.html
>>>>
>>>> Please use a https://lore.kernel.org/ URL instead of spinics.net.
>>>
>>> ok, I hope that I can find this old thread.
>>
>> The beauty of lore.kernel.org is that the URL contains the Message-ID, so
>> it's easy build the URL and it would contain useful information even if
>> lore.kernel.org disappeared:
>>
>> https://lore.kernel.org/linux-pci/56F209A9.4040304@huawei.com
>
> Yes, the bottom line is what Arnd outlined in the thread above.
>
> ISA IO port space is not necessarily PCI but it does not exist
> architecturally on ARM systems.
>
> Taking the example of IA64, the ISA space is memory mapped (like any
> other arch except for x86) but IIUC the virtual mapping for the ISA
> port space _always_ exists on IA64 so this issue won't happen.
>
> Arnd pointed out a solution in the thread above but I need to check
> if that's feasible.

I doubt that it can work now.

Since we when introduced the concept of logical PIO space, this IO space 
became sparely populated by 2 regions - MMIO and indirect IO - so we 
cannot grow it as we map in regions. I also don't think it works for 
when we IO unmap regions.

Thanks,
John

>
> Lorenzo
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> .
>
Lorenzo Pieralisi March 29, 2019, 12:22 p.m. UTC | #7
On Fri, Mar 29, 2019 at 10:42:17AM +0000, John Garry wrote:
> On 28/03/2019 17:46, Lorenzo Pieralisi wrote:
> >On Tue, Mar 26, 2019 at 05:48:10PM -0500, Bjorn Helgaas wrote:
> >
> >[...]
> >
> 
> Hi Lorenzo,
> 
> 
> >>I'm not convinced about this last sentence.
> >>
> >>It's true that on most modern systems, including that Intel PCH, the
> >>Super I/O controller is attached via an LPC bridge on a PCI bus.
> >>
> >>But I don't think it's an actual requirement that PCI be involved.
> >>There certainly once were systems, e.g., PC/104, that had ISA devices
> >>but no PCI.  Maybe Super I/O attached via ISA is obsolete enough that
> >>we don't care any more, but I really don't know.
> >>
> >>>>On x86, I think inb/inw/inl from a port where nothing responds
> >>>>probably just returns ~0, and outb/outw/outl just get dropped.
> >>>>Shouldn't arm64 do the same, without crashing?
> >>>
> >>>That would be ideal and we're doing something similar in patch 2/3.
> >>>
> >>>So on ARM64 we have to IO remap the PCI IO resource. If this mapping is not
> >>>done (due to no PCI host), then any inb/inw/inl calls will crash the system.
> >>
> >>My take is that ARM64 is responsible for implementing inb/inw/inl in
> >>such a way that they don't crash.  I don't think it's practical to
> >>update all the old ISA drivers or even the core code to work around
> >>that.
> >
> >The problem is that those drivers are accessing a resource that does not
> >exist in practice, it is taken for granted on x86 systems (and on IA64)
> >because that was an actual bus (actual or emulated) and was made part of
> >the architecture. The ISA space is not necessarily tied to PCI,
> >at least not always.
> >
> >Side note: these drivers can't be compiled on PPC, it would be
> >good to understand why, I have a hunch it can be related.
> 
> I mentioned this earlier:
> 
> I saw that in commits like 746cdfbf01c0 ("hwmon: Avoid building drivers
> forpowerpc that read/write ISA addresses"), PPC would not build these
> drivers, as, like arm, it has no native ISA.
> 
> However I still don't think just avoiding compiling these drivers for
> certain archs solves the problem.

No it does not but I would like to understand how relevant is fixing
those drivers (that should not use ISA IO space without first claiming
their resources, for the records) given that PPC did not even try and
apparently that's not a problem.

> 
> [...]
> 
> >>>>>[1] https://www.spinics.net/lists/linux-pci/msg49821.html
> >>>>
> >>>>Please use a https://lore.kernel.org/ URL instead of spinics.net.
> >>>
> >>>ok, I hope that I can find this old thread.
> >>
> >>The beauty of lore.kernel.org is that the URL contains the Message-ID, so
> >>it's easy build the URL and it would contain useful information even if
> >>lore.kernel.org disappeared:
> >>
> >>https://lore.kernel.org/linux-pci/56F209A9.4040304@huawei.com
> >
> >Yes, the bottom line is what Arnd outlined in the thread above.
> >
> >ISA IO port space is not necessarily PCI but it does not exist
> >architecturally on ARM systems.
> >
> >Taking the example of IA64, the ISA space is memory mapped (like any
> >other arch except for x86) but IIUC the virtual mapping for the ISA
> >port space _always_ exists on IA64 so this issue won't happen.
> >
> >Arnd pointed out a solution in the thread above but I need to check
> >if that's feasible.
> 
> I doubt that it can work now.
> 
> Since we when introduced the concept of logical PIO space, this IO space
> became sparely populated by 2 regions - MMIO and indirect IO - so we cannot
> grow it as we map in regions. I also don't think it works for when we IO
> unmap regions.

I do not have the full picture but I suspect that, apart from x86/IA64,
this is a common issue across architectures, I am trying to untangle
how ARM 32-bit deals with this (if it does).

Lorenzo
John Garry April 2, 2019, 9:46 a.m. UTC | #8
On 29/03/2019 12:22, Lorenzo Pieralisi wrote:
>>> > >Side note: these drivers can't be compiled on PPC, it would be
>>> > >good to understand why, I have a hunch it can be related.
>> >
>> > I mentioned this earlier:
>> >
>> > I saw that in commits like 746cdfbf01c0 ("hwmon: Avoid building drivers
>> > forpowerpc that read/write ISA addresses"), PPC would not build these
>> > drivers, as, like arm, it has no native ISA.
>> >
>> > However I still don't think just avoiding compiling these drivers for
>> > certain archs solves the problem.
> No it does not but I would like to understand how relevant is fixing
> those drivers (that should not use ISA IO space without first claiming
> their resources, for the records) given that PPC did not even try and
> apparently that's not a problem.
>

Hi Lorenzo,

Those drivers should still be fixed up separately.

The tricky part in this series is making the resource claim fail if 
there is no IO space mapped/accessible at the addresses requested.

However I would still like to fix up the low level IO port accessors to 
discard accesses when no IO space is mapped.

Thanks,
John

>> >
>> > [...]
>> >
>>>>>>> > >>>>>[1] https://www.spinics.net/lists/linux-pci
diff mbox series

Patch

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..d7b7e1e08291 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -217,19 +217,25 @@  static inline bool resource_contains(struct resource *r1, struct resource *r2)
 
 
 /* Convenience shorthand with allocation */
-#define request_region(start,n,name)		__request_region(&ioport_resource, (start), (n), (name), 0)
-#define request_muxed_region(start,n,name)	__request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
+#define request_region(start,n,name)		__request_region_from_children(&ioport_resource, (start), (n), (name), 0)
+#define request_muxed_region(start,n,name)	__request_region_from_children(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
 #define __request_mem_region(start,n,name, excl) __request_region(&iomem_resource, (start), (n), (name), excl)
 #define request_mem_region(start,n,name) __request_region(&iomem_resource, (start), (n), (name), 0)
 #define request_mem_region_exclusive(start,n,name) \
 	__request_region(&iomem_resource, (start), (n), (name), IORESOURCE_EXCLUSIVE)
 #define rename_region(region, newname) do { (region)->name = (newname); } while (0)
 
-extern struct resource * __request_region(struct resource *,
+extern struct resource *__request_region(struct resource *,
 					resource_size_t start,
 					resource_size_t n,
 					const char *name, int flags);
 
+extern struct resource *__request_region_from_children(struct resource *,
+					resource_size_t start,
+					resource_size_t n,
+					const char *name, int flags);
+
+
 /* Compatibility cruft */
 #define release_region(start,n)	__release_region(&ioport_resource, (start), (n))
 #define release_mem_region(start,n)	__release_region(&iomem_resource, (start), (n))
diff --git a/kernel/resource.c b/kernel/resource.c
index 92190f62ebc5..87ed200eda8b 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1097,6 +1097,34 @@  resource_size_t resource_alignment(struct resource *res)
 
 static DECLARE_WAIT_QUEUE_HEAD(muxed_resource_wait);
 
+/**
+ * __request_region_from_children - create a new busy region from a child
+ * @parent: parent resource descriptor
+ * @start: resource start address
+ * @n: resource region size
+ * @name: reserving caller's ID string
+ * @flags: IO resource flags
+ */
+struct resource *__request_region_from_children(struct resource *parent,
+						resource_size_t start,
+						resource_size_t n,
+						const char *name, int flags)
+{
+	struct resource *res = __request_region(parent, start, n, name, flags);
+
+	if (res && res->parent == parent) {
+		/*
+		 * This is a direct descendent of the parent, which is
+		 * what we didn't want.
+		 */
+		__release_region(parent, start, n);
+		res = NULL;
+	}
+
+	return res;
+}
+EXPORT_SYMBOL(__request_region_from_children);
+
 /**
  * __request_region - create a new busy resource region
  * @parent: parent resource descriptor