Message ID | 20230209072220.6836-1-jgross@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | x86/mtrr: fix handling with PAT but without MTRR | expand |
On Thu, 2023-02-09 at 08:22 +0100, Juergen Gross wrote: > This series tries to fix the rather special case of PAT being > available > without having MTRRs (either due to CONFIG_MTRR being not set, or > because the feature has been disabled e.g. by a hypervisor). debug_vm_pgtable fails in a KVM guest with CONFIG_MTRR=y. CONFIG_MTRR=n succeeds. [ 0.830280] debug_vm_pgtable: [debug_vm_pgtable ]: Validating architecture page table helpers [ 0.831906] ------------[ cut here ]------------ [ 0.832711] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:461 debug_vm_pgtable+0xb9a/0xe16 [ 0.833998] Modules linked in: [ 0.834450] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc7+ #2366 [ 0.835462] RIP: 0010:debug_vm_pgtable+0xb9a/0xe16 [ 0.836217] Code: e2 3a 73 4a 48 c7 00 00 00 00 00 48 8b b4 24 a0 00 00 00 48 8b 54 24 60 48 8b 7c 24 20 48 c4 [ 0.839068] RSP: 0000:ffffc90000013de0 EFLAGS: 00010246 [ 0.839735] RAX: 0000000000000000 RBX: ffff888100048868 RCX: bffffffffffffff0 [ 0.840646] RDX: 0000000000000000 RSI: 0000000040000000 RDI: 0000000000000000 [ 0.841661] RBP: ffff88810004d140 R08: 0000000000000000 R09: ffff888100280880 [ 0.842625] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888103810298 [ 0.843574] R13: ffff888100048780 R14: ffffffff8282e099 R15: 0000000000000000 [ 0.844524] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 [ 0.845706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.846499] CR2: ffff88813ffff000 CR3: 000000000222d001 CR4: 0000000000370ef0 [ 0.847464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.848432] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 0.849371] Call Trace: [ 0.849699] <TASK> [ 0.849997] ? destroy_args+0x131/0x131 [ 0.850487] do_one_initcall+0x61/0x250 [ 0.850983] ? rdinit_setup+0x2c/0x2c [ 0.851451] kernel_init_freeable+0x18e/0x1d8 [ 0.852033] ? rest_init+0x130/0x130 [ 0.852533] kernel_init+0x16/0x120 [ 0.853035] ret_from_fork+0x1f/0x30 [ 0.853507] </TASK> [ 0.853803] ---[ end trace 0000000000000000 ]--- [ 0.854421] ------------[ cut here ]------------ [ 0.855027] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:462 debug_vm_pgtable+0xbaa/0xe16 [ 0.856115] Modules linked in: [ 0.856517] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc7+ #2366 [ 0.857562] RIP: 0010:debug_vm_pgtable+0xbaa/0xe16 [ 0.858186] Code: 00 00 00 48 8b 54 24 60 48 8b 7c 24 20 48 c1 e6 0c e8 79 18 7f fe 85 c0 75 02 0f 0b 48 8b 7b [ 0.860778] RSP: 0000:ffffc90000013de0 EFLAGS: 00010246 [ 0.861519] RAX: 0000000000000000 RBX: ffff888100048868 RCX: bffffffffffffff0 [ 0.862530] RDX: 0000000000000000 RSI: 0000000040000000 RDI: ffff88810380e7f8 [ 0.863522] RBP: ffff88810004d140 R08: 0000000000000000 R09: ffff888100280880 [ 0.864449] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888103810298 [ 0.865454] R13: ffff888100048780 R14: ffffffff8282e099 R15: 0000000000000000 [ 0.866401] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 [ 0.867438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.868181] CR2: ffff88813ffff000 CR3: 000000000222d001 CR4: 0000000000370ef0 [ 0.869097] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.870026] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 0.870943] Call Trace: [ 0.871259] <TASK> [ 0.871537] ? destroy_args+0x131/0x131 [ 0.872030] do_one_initcall+0x61/0x250 [ 0.872521] ? rdinit_setup+0x2c/0x2c [ 0.873005] kernel_init_freeable+0x18e/0x1d8 [ 0.873607] ? rest_init+0x130/0x130 [ 0.874116] kernel_init+0x16/0x120 [ 0.874618] ret_from_fork+0x1f/0x30 [ 0.875123] </TASK> [ 0.875411] ---[ end trace 0000000000000000 ]---
On 11.02.23 01:06, Edgecombe, Rick P wrote: > On Thu, 2023-02-09 at 08:22 +0100, Juergen Gross wrote: >> This series tries to fix the rather special case of PAT being >> available >> without having MTRRs (either due to CONFIG_MTRR being not set, or >> because the feature has been disabled e.g. by a hypervisor). > > debug_vm_pgtable fails in a KVM guest with CONFIG_MTRR=y. CONFIG_MTRR=n > succeeds. > > [ 0.830280] debug_vm_pgtable: [debug_vm_pgtable ]: > Validating architecture page table helpers > [ 0.831906] ------------[ cut here ]------------ > [ 0.832711] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:461 > debug_vm_pgtable+0xb9a/0xe16 > [ 0.833998] Modules linked in: > [ 0.834450] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc7+ > #2366 > [ 0.835462] RIP: 0010:debug_vm_pgtable+0xb9a/0xe16 > [ 0.836217] Code: e2 3a 73 4a 48 c7 00 00 00 00 00 48 8b b4 24 a0 00 > 00 00 48 8b 54 24 60 48 8b 7c 24 20 48 c4 > [ 0.839068] RSP: 0000:ffffc90000013de0 EFLAGS: 00010246 > [ 0.839735] RAX: 0000000000000000 RBX: ffff888100048868 RCX: > bffffffffffffff0 > [ 0.840646] RDX: 0000000000000000 RSI: 0000000040000000 RDI: > 0000000000000000 > [ 0.841661] RBP: ffff88810004d140 R08: 0000000000000000 R09: > ffff888100280880 > [ 0.842625] R10: 0000000000000001 R11: 0000000000000001 R12: > ffff888103810298 > [ 0.843574] R13: ffff888100048780 R14: ffffffff8282e099 R15: > 0000000000000000 > [ 0.844524] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) > knlGS:0000000000000000 > [ 0.845706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.846499] CR2: ffff88813ffff000 CR3: 000000000222d001 CR4: > 0000000000370ef0 > [ 0.847464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 0.848432] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 0.849371] Call Trace: > [ 0.849699] <TASK> > [ 0.849997] ? destroy_args+0x131/0x131 > [ 0.850487] do_one_initcall+0x61/0x250 > [ 0.850983] ? rdinit_setup+0x2c/0x2c > [ 0.851451] kernel_init_freeable+0x18e/0x1d8 > [ 0.852033] ? rest_init+0x130/0x130 > [ 0.852533] kernel_init+0x16/0x120 > [ 0.853035] ret_from_fork+0x1f/0x30 > [ 0.853507] </TASK> > [ 0.853803] ---[ end trace 0000000000000000 ]--- > [ 0.854421] ------------[ cut here ]------------ > [ 0.855027] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:462 > debug_vm_pgtable+0xbaa/0xe16 > [ 0.856115] Modules linked in: > [ 0.856517] CPU: 0 PID: 1 Comm: swapper/0 Tainted: > G W 6.2.0-rc7+ #2366 > [ 0.857562] RIP: 0010:debug_vm_pgtable+0xbaa/0xe16 > [ 0.858186] Code: 00 00 00 48 8b 54 24 60 48 8b 7c 24 20 48 c1 e6 0c > e8 79 18 7f fe 85 c0 75 02 0f 0b 48 8b 7b > [ 0.860778] RSP: 0000:ffffc90000013de0 EFLAGS: 00010246 > [ 0.861519] RAX: 0000000000000000 RBX: ffff888100048868 RCX: > bffffffffffffff0 > [ 0.862530] RDX: 0000000000000000 RSI: 0000000040000000 RDI: > ffff88810380e7f8 > [ 0.863522] RBP: ffff88810004d140 R08: 0000000000000000 R09: > ffff888100280880 > [ 0.864449] R10: 0000000000000001 R11: 0000000000000001 R12: > ffff888103810298 > [ 0.865454] R13: ffff888100048780 R14: ffffffff8282e099 R15: > 0000000000000000 > [ 0.866401] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) > knlGS:0000000000000000 > [ 0.867438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.868181] CR2: ffff88813ffff000 CR3: 000000000222d001 CR4: > 0000000000370ef0 > [ 0.869097] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 0.870026] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 0.870943] Call Trace: > [ 0.871259] <TASK> > [ 0.871537] ? destroy_args+0x131/0x131 > [ 0.872030] do_one_initcall+0x61/0x250 > [ 0.872521] ? rdinit_setup+0x2c/0x2c > [ 0.873005] kernel_init_freeable+0x18e/0x1d8 > [ 0.873607] ? rest_init+0x130/0x130 > [ 0.874116] kernel_init+0x16/0x120 > [ 0.874618] ret_from_fork+0x1f/0x30 > [ 0.875123] </TASK> > [ 0.875411] ---[ end trace 0000000000000000 ]--- Thanks for the report. I'll have a look. Probably I'll need to re-add the check for WB in patch 7. Juergen
On Mon, 2023-02-13 at 07:12 +0100, Juergen Gross wrote: > > Thanks for the report. > > I'll have a look. Probably I'll need to re-add the check for WB in > patch 7. Sure, let me know if you need any more details about by setup.
On 13.02.23 19:21, Edgecombe, Rick P wrote: > On Mon, 2023-02-13 at 07:12 +0100, Juergen Gross wrote: >> >> Thanks for the report. >> >> I'll have a look. Probably I'll need to re-add the check for WB in >> patch 7. > > Sure, let me know if you need any more details about by setup. I have reproduced the issue. Adding back the test for WB will fix it, but I'm not sure this is really what I should do. The problem arises in case a large mapping is spanning multiple MTRRs, even if they define the same caching type (uniform is set to 0 in this case). So the basic question for me is: shouldn't the semantics of uniform be adpated? Today it means "the range is covered by only one MTRR or by none". Looking at the use cases I'm wondering whether it shouldn't be "the whole range has the same caching type". Thoughts? Juergen
On Wed, Feb 15, 2023 at 12:25 AM Juergen Gross <jgross@suse.com> wrote: > > The problem arises in case a large mapping is spanning multiple MTRRs, > even if they define the same caching type (uniform is set to 0 in this > case). Oh, I think then you should fix uniform to be 1. IOW, we should not think "multiple MTRRs" means "non-uniform". Only "different actual memory types" should mean non-uniformity. If I remember correctly, there were good reasons to have overlapping MTRR's. In fact, you can generate a single MTRR that described a memory ttype that wasn't even contiguous if you had odd memory setups. Intel definitely defines how overlapping MTRR's work, and "same types overlaps" is documented as a real thing. Linus
On 16.02.23 00:22, Linus Torvalds wrote: > On Wed, Feb 15, 2023 at 12:25 AM Juergen Gross <jgross@suse.com> wrote: >> >> The problem arises in case a large mapping is spanning multiple MTRRs, >> even if they define the same caching type (uniform is set to 0 in this >> case). > > Oh, I think then you should fix uniform to be 1. > > IOW, we should not think "multiple MTRRs" means "non-uniform". Only > "different actual memory types" should mean non-uniformity. Thanks for confirmation. I completely agree. > If I remember correctly, there were good reasons to have overlapping > MTRR's. In fact, you can generate a single MTRR that described a > memory ttype that wasn't even contiguous if you had odd memory setups. > > Intel definitely defines how overlapping MTRR's work, and "same types > overlaps" is documented as a real thing. Yes. And it is handled wrong in current code. Handling it correctly will require quite some reworking of the code, which I've already started to work on. I will defer the pud_set_huge()/ pmd_set_huge() modifying patch to after this rework. Juergen