diff mbox

x86/PCI: setup data may be in highmem

Message ID 1369215782-32697-1-git-send-email-matt@console-pimps.org (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Matt Fleming May 22, 2013, 9:43 a.m. UTC
From: Matt Fleming <matt.fleming@intel.com>

pcibios_add_device() assumes that the physical addresses stored in
setup_data are accessible via the direct kernel mapping, and that
calling phys_to_virt() is valid. This isn't guaranteed to be true on x86
where the direct mapping range is much smaller than on x86-64.

Calling phys_to_virt() on a highmem address results in the following,

 BUG: unable to handle kernel paging request at 39a3c198
 IP: [<c262be0f>] pcibios_add_device+0x2f/0x90
 *pde = 00000000
 Oops: 0000 [#1] SMP
 Modules linked in:
 Pid: 1, comm: swapper/0 Tainted: G        W I  3.9.0-rc2+ #280
 EIP: 0060:[<c262be0f>] EFLAGS: 00010206 CPU: 1
 EIP is at pcibios_add_device+0x2f/0x90
 EAX: f6258800 EBX: f6258800 ECX: 79a3c190 EDX: 39a3c190
 ESI: f62d9814 EDI: f6258864 EBP: f60add38 ESP: f60add2c
  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 CR0: 8005003b CR2: 39a3c198 CR3: 02b91000 CR4: 001007d0
 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 DR6: ffff0ff0 DR7: 00000400
 Process swapper/0 (pid: 1, ti=f60ac000 task=f60b0000 task.ti=f60ac000)
 Stack:
  f6258800 f62d9814 f6258864 f60add4c c2370c73 00000000 f62d9800 00000000
  f60add6c c274640b 0000ea60 f6258800 0f008086 f62d9800 f62d9800 00000000
  f60add84 c2370d08 00000000 00000008 f62d9800 00000000 f60adda4 c2371904
 Call Trace:
  [<c2370c73>] pci_device_add+0xe3/0x130
  [<c274640b>] pci_scan_single_device+0x8b/0xb0
  [<c2370d08>] pci_scan_slot+0x48/0x100
  [<c2371904>] pci_scan_child_bus+0x24/0xc0
  [<c262a7b0>] pci_acpi_scan_root+0x2c0/0x490
  [<c23b7203>] acpi_pci_root_add+0x312/0x42f
  [<c23b29d7>] ? acpi_device_notify_fixed+0x1d/0x1d
  [<c23b36a8>] acpi_bus_device_attach+0x77/0xdd
  [<c23cb6be>] acpi_ns_walk_namespace+0xb1/0x163
  [<c23b3631>] ? acpi_bus_type_and_status+0x82/0x82
  [<c23cbd4e>] acpi_walk_namespace+0x7e/0xa8
  [<c23b3631>] ? acpi_bus_type_and_status+0x82/0x82
  [<c23b46e0>] acpi_bus_scan+0x9a/0xa6
  [<c23b3631>] ? acpi_bus_type_and_status+0x82/0x82
  [<c2b17ec9>] acpi_scan_init+0x51/0x144
  [<c2b252a2>] ? pci_mmcfg_late_init+0x49/0x4b
  [<c2b17cdc>] acpi_init+0x224/0x28c
  [<c2001144>] do_one_initcall+0x34/0x170
  [<c2b17ab8>] ? acpi_sleep_proc_init+0x2e/0x2e
  [<c2aeeb83>] kernel_init_freeable+0x119/0x1b6
  [<c2aee4da>] ? do_early_param+0x74/0x74
  [<c2743f10>] kernel_init+0x10/0xd0
  [<c2765697>] ret_from_kernel_thread+0x1b/0x28
  [<c2743f00>] ? rest_init+0x60/0x60

The most reliable way to trigger this crash seems to be booting a 32-bit
kernel via the EFI boot stub.

The solution is to use early_ioremap() instead of phys_to_virt() to map
the setup data into the kernel address space.

Tested-by: Jani Nikula <jani.nikula@intel.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---
 arch/x86/pci/common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

H. Peter Anvin May 23, 2013, 10:46 p.m. UTC | #1
On 05/22/2013 02:43 AM, Matt Fleming wrote:
> From: Matt Fleming <matt.fleming@intel.com>
> 
> pcibios_add_device() assumes that the physical addresses stored in
> setup_data are accessible via the direct kernel mapping, and that
> calling phys_to_virt() is valid. This isn't guaranteed to be true on x86
> where the direct mapping range is much smaller than on x86-64.
> 
> Calling phys_to_virt() on a highmem address results in the following,
> 

Bjorn: yours or mine?

	-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas May 24, 2013, 2:38 p.m. UTC | #2
On Wed, May 22, 2013 at 3:43 AM, Matt Fleming <matt@console-pimps.org> wrote:
> From: Matt Fleming <matt.fleming@intel.com>
>
> pcibios_add_device() assumes that the physical addresses stored in
> setup_data are accessible via the direct kernel mapping, and that
> calling phys_to_virt() is valid. This isn't guaranteed to be true on x86
> where the direct mapping range is much smaller than on x86-64.
>
> Calling phys_to_virt() on a highmem address results in the following,
>
>  BUG: unable to handle kernel paging request at 39a3c198
>  IP: [<c262be0f>] pcibios_add_device+0x2f/0x90
>  *pde = 00000000
>  Oops: 0000 [#1] SMP
>  Modules linked in:
>  Pid: 1, comm: swapper/0 Tainted: G        W I  3.9.0-rc2+ #280
>  EIP: 0060:[<c262be0f>] EFLAGS: 00010206 CPU: 1
>  EIP is at pcibios_add_device+0x2f/0x90
>  EAX: f6258800 EBX: f6258800 ECX: 79a3c190 EDX: 39a3c190
>  ESI: f62d9814 EDI: f6258864 EBP: f60add38 ESP: f60add2c
>   DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>  CR0: 8005003b CR2: 39a3c198 CR3: 02b91000 CR4: 001007d0
>  DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>  DR6: ffff0ff0 DR7: 00000400
>  Process swapper/0 (pid: 1, ti=f60ac000 task=f60b0000 task.ti=f60ac000)
>  Stack:
>   f6258800 f62d9814 f6258864 f60add4c c2370c73 00000000 f62d9800 00000000
>   f60add6c c274640b 0000ea60 f6258800 0f008086 f62d9800 f62d9800 00000000
>   f60add84 c2370d08 00000000 00000008 f62d9800 00000000 f60adda4 c2371904
>  Call Trace:
>   [<c2370c73>] pci_device_add+0xe3/0x130
>   [<c274640b>] pci_scan_single_device+0x8b/0xb0
>   [<c2370d08>] pci_scan_slot+0x48/0x100
>   [<c2371904>] pci_scan_child_bus+0x24/0xc0
>   [<c262a7b0>] pci_acpi_scan_root+0x2c0/0x490
>   [<c23b7203>] acpi_pci_root_add+0x312/0x42f
>   [<c23b29d7>] ? acpi_device_notify_fixed+0x1d/0x1d
>   [<c23b36a8>] acpi_bus_device_attach+0x77/0xdd
>   [<c23cb6be>] acpi_ns_walk_namespace+0xb1/0x163
>   [<c23b3631>] ? acpi_bus_type_and_status+0x82/0x82
>   [<c23cbd4e>] acpi_walk_namespace+0x7e/0xa8
>   [<c23b3631>] ? acpi_bus_type_and_status+0x82/0x82
>   [<c23b46e0>] acpi_bus_scan+0x9a/0xa6
>   [<c23b3631>] ? acpi_bus_type_and_status+0x82/0x82
>   [<c2b17ec9>] acpi_scan_init+0x51/0x144
>   [<c2b252a2>] ? pci_mmcfg_late_init+0x49/0x4b
>   [<c2b17cdc>] acpi_init+0x224/0x28c
>   [<c2001144>] do_one_initcall+0x34/0x170
>   [<c2b17ab8>] ? acpi_sleep_proc_init+0x2e/0x2e
>   [<c2aeeb83>] kernel_init_freeable+0x119/0x1b6
>   [<c2aee4da>] ? do_early_param+0x74/0x74
>   [<c2743f10>] kernel_init+0x10/0xd0
>   [<c2765697>] ret_from_kernel_thread+0x1b/0x28
>   [<c2743f00>] ? rest_init+0x60/0x60
>
> The most reliable way to trigger this crash seems to be booting a 32-bit
> kernel via the EFI boot stub.
>
> The solution is to use early_ioremap() instead of phys_to_virt() to map
> the setup data into the kernel address space.
>
> Tested-by: Jani Nikula <jani.nikula@intel.com>
> Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Seth Forshee <seth.forshee@canonical.com>
> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
> ---
>  arch/x86/pci/common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 305c68b..7ae6671 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -628,7 +628,7 @@ int pcibios_add_device(struct pci_dev *dev)
>
>         pa_data = boot_params.hdr.setup_data;
>         while (pa_data) {
> -               data = phys_to_virt(pa_data);
> +               data = early_ioremap(pa_data, sizeof(*rom));

pcibios_add_device() is mostly called at boot-time, when
early_ioremap() probably works well.  But it's also called when we
hot-add devices later, and it looks like early_ioremap() will then
generate warnings because "system_state != SYSTEM_BOOTING".

>                 if (data->type == SETUP_PCI) {
>                         rom = (struct pci_setup_rom *)data;
> @@ -645,6 +645,7 @@ int pcibios_add_device(struct pci_dev *dev)
>                         }
>                 }
>                 pa_data = data->next;
> +               early_iounmap(data, sizeof(*rom));
>         }
>         return 0;
>  }
> --
> 1.8.1.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Fleming May 28, 2013, 11:36 a.m. UTC | #3
On Fri, 24 May, at 08:38:10AM, Bjorn Helgaas wrote:
> pcibios_add_device() is mostly called at boot-time, when
> early_ioremap() probably works well.  But it's also called when we
> hot-add devices later, and it looks like early_ioremap() will then
> generate warnings because "system_state != SYSTEM_BOOTING".

Oops. Good point. Is there any reason we can't use ioremap()?
Bjorn Helgaas May 28, 2013, 4:31 p.m. UTC | #4
On Tue, May 28, 2013 at 5:36 AM, Matt Fleming <matt@console-pimps.org> wrote:
> On Fri, 24 May, at 08:38:10AM, Bjorn Helgaas wrote:
>> pcibios_add_device() is mostly called at boot-time, when
>> early_ioremap() probably works well.  But it's also called when we
>> hot-add devices later, and it looks like early_ioremap() will then
>> generate warnings because "system_state != SYSTEM_BOOTING".
>
> Oops. Good point. Is there any reason we can't use ioremap()?

I assume Matthew had some reason for avoiding that in the first place,
so I hope he'll chime in.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett May 28, 2013, 4:48 p.m. UTC | #5
On Tue, May 28, 2013 at 10:31:51AM -0600, Bjorn Helgaas wrote:
> On Tue, May 28, 2013 at 5:36 AM, Matt Fleming <matt@console-pimps.org> wrote:
> > On Fri, 24 May, at 08:38:10AM, Bjorn Helgaas wrote:
> >> pcibios_add_device() is mostly called at boot-time, when
> >> early_ioremap() probably works well.  But it's also called when we
> >> hot-add devices later, and it looks like early_ioremap() will then
> >> generate warnings because "system_state != SYSTEM_BOOTING".
> >
> > Oops. Good point. Is there any reason we can't use ioremap()?
> 
> I assume Matthew had some reason for avoiding that in the first place,
> so I hope he'll chime in.

I have no recollection of why I did it that way. If this is all 
happening late enough for ioremap() to work then that sounds fine.
Bjorn Helgaas May 28, 2013, 5:28 p.m. UTC | #6
On Tue, May 28, 2013 at 10:48 AM, Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> On Tue, May 28, 2013 at 10:31:51AM -0600, Bjorn Helgaas wrote:
>> On Tue, May 28, 2013 at 5:36 AM, Matt Fleming <matt@console-pimps.org> wrote:
>> > On Fri, 24 May, at 08:38:10AM, Bjorn Helgaas wrote:
>> >> pcibios_add_device() is mostly called at boot-time, when
>> >> early_ioremap() probably works well.  But it's also called when we
>> >> hot-add devices later, and it looks like early_ioremap() will then
>> >> generate warnings because "system_state != SYSTEM_BOOTING".
>> >
>> > Oops. Good point. Is there any reason we can't use ioremap()?
>>
>> I assume Matthew had some reason for avoiding that in the first place,
>> so I hope he'll chime in.
>
> I have no recollection of why I did it that way. If this is all
> happening late enough for ioremap() to work then that sounds fine.

OK, hopefully somebody will do the analysis to verify that ioremap()
is appropriate in both cases.  It seems likely, but I haven't looked
in detail.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bityutskiy, Artem June 8, 2013, 11:14 a.m. UTC | #7
T24gU2F0LCAyMDEzLTA2LTA4IGF0IDE0OjE1ICswMzAwLCBBcnRlbSBCaXR5dXRza2l5IHdyb3Rl
Og0KPiBPbiBXZWQsIDIwMTMtMDUtMjIgYXQgMTA6NDMgKzAxMDAsIE1hdHQgRmxlbWluZyB3cm90
ZToNCj4gPiBGcm9tOiBNYXR0IEZsZW1pbmcgPG1hdHQuZmxlbWluZ0BpbnRlbC5jb20+DQo+ID4g
DQo+ID4gcGNpYmlvc19hZGRfZGV2aWNlKCkgYXNzdW1lcyB0aGF0IHRoZSBwaHlzaWNhbCBhZGRy
ZXNzZXMgc3RvcmVkIGluDQo+ID4gc2V0dXBfZGF0YSBhcmUgYWNjZXNzaWJsZSB2aWEgdGhlIGRp
cmVjdCBrZXJuZWwgbWFwcGluZywgYW5kIHRoYXQNCj4gPiBjYWxsaW5nIHBoeXNfdG9fdmlydCgp
IGlzIHZhbGlkLiBUaGlzIGlzbid0IGd1YXJhbnRlZWQgdG8gYmUgdHJ1ZSBvbiB4ODYNCj4gPiB3
aGVyZSB0aGUgZGlyZWN0IG1hcHBpbmcgcmFuZ2UgaXMgbXVjaCBzbWFsbGVyIHRoYW4gb24geDg2
LTY0Lg0KPiA+IA0KPiA+IENhbGxpbmcgcGh5c190b192aXJ0KCkgb24gYSBoaWdobWVtIGFkZHJl
c3MgcmVzdWx0cyBpbiB0aGUgZm9sbG93aW5nLA0KPiANCj4gRml4ZXMgcGFnaW5nIHJlcXVlc3Qg
b29wcyBmb3IgbWUgYXMgd2VsbCwgdGhhbmtzISBXb3VsZCB5b3UgcGxlYXNlIGFkZCBhDQo+IC1z
dGFibGUgdGFnIHRvIHRoaXMgcGF0Y2g/DQoNCk9oLCBwYXJkb24sIHlvdSBkaWQgQ0MgLXN0YWJs
ZS4NCg0KLS0gDQpCZXN0IFJlZ2FyZHMsDQpBcnRlbSBCaXR5dXRza2l5DQotLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0K
SW50ZWwgRmlubGFuZCBPeQpSZWdpc3RlcmVkIEFkZHJlc3M6IFBMIDI4MSwgMDAxODEgSGVsc2lu
a2kgCkJ1c2luZXNzIElkZW50aXR5IENvZGU6IDAzNTc2MDYgLSA0IApEb21pY2lsZWQgaW4gSGVs
c2lua2kgCgpUaGlzIGUtbWFpbCBhbmQgYW55IGF0dGFjaG1lbnRzIG1heSBjb250YWluIGNvbmZp
ZGVudGlhbCBtYXRlcmlhbCBmb3IKdGhlIHNvbGUgdXNlIG9mIHRoZSBpbnRlbmRlZCByZWNpcGll
bnQocykuIEFueSByZXZpZXcgb3IgZGlzdHJpYnV0aW9uCmJ5IG90aGVycyBpcyBzdHJpY3RseSBw
cm9oaWJpdGVkLiBJZiB5b3UgYXJlIG5vdCB0aGUgaW50ZW5kZWQKcmVjaXBpZW50LCBwbGVhc2Ug
Y29udGFjdCB0aGUgc2VuZGVyIGFuZCBkZWxldGUgYWxsIGNvcGllcy4K

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Artem Bityutskiy June 8, 2013, 11:15 a.m. UTC | #8
On Wed, 2013-05-22 at 10:43 +0100, Matt Fleming wrote:
> From: Matt Fleming <matt.fleming@intel.com>
> 
> pcibios_add_device() assumes that the physical addresses stored in
> setup_data are accessible via the direct kernel mapping, and that
> calling phys_to_virt() is valid. This isn't guaranteed to be true on x86
> where the direct mapping range is much smaller than on x86-64.
> 
> Calling phys_to_virt() on a highmem address results in the following,

Fixes paging request oops for me as well, thanks! Would you please add a
-stable tag to this patch?
diff mbox

Patch

diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 305c68b..7ae6671 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -628,7 +628,7 @@  int pcibios_add_device(struct pci_dev *dev)
 
 	pa_data = boot_params.hdr.setup_data;
 	while (pa_data) {
-		data = phys_to_virt(pa_data);
+		data = early_ioremap(pa_data, sizeof(*rom));
 
 		if (data->type == SETUP_PCI) {
 			rom = (struct pci_setup_rom *)data;
@@ -645,6 +645,7 @@  int pcibios_add_device(struct pci_dev *dev)
 			}
 		}
 		pa_data = data->next;
+		early_iounmap(data, sizeof(*rom));
 	}
 	return 0;
 }