Message ID | 20160817203238.GA9408@char.us.oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 17.08.16 at 22:32, <konrad.wilk@oracle.com> wrote: > One of the interesting things about XSA 154 fix ("x86: enforce consistent > cachability of MMIO mappings") is that when certain applications (mcelog) > are trying to map /dev/mmap and lurk in ISA regions - we get: DYM /dev/mem ? Most accesses to which are bogus in PV guests (often including Dom0) anyway. > [ 49.399053] WARNING: CPU: 0 PID: 2471 at arch/x86/mm/pat.c:913 untrack_pfn+0x93/0xc0() What Linux version is this? untrack_pfn() doesn't span line 913 in 4.8-rc2. And follow_phys() appears to only check whether the write flag is set as expected; I can't see any cachability checks. Plus it gets called only when both incoming address and size are zero. > Anyhow what I am wondering: > > a) Should we add a edge case in the hypervisor to allow multiple mappings > for this region? I am thinking no.. but it sounds like mapping ISA region > as WB is safe even in baremetal? We should never allow multiple mappings with different cachability. And I don't understand what makes you think the ISA region is safe to map WB? There might be ROMs, MMIO regions, or simply nothing there, neither of which is safe to map WB. ROMs certainly could be WP, but that would require a way to reliably size not only ISA extension ROMs, but also main and video BIOS. > b) Or would it be better to let Linux do its thing and treat 640KB->1MB > as uncached instead of writeback? According to what you wrote earlier the two parts of the sentence read contradictory to me. > Looking at the kernel it assumes that WB is ok for 640KB->1MB. > The comment says: > " /* Low ISA region is always mapped WB in page table. No need to track *" As per above it's not clear to me what this comment is backed by. Jan
On 18/08/16 11:06, Jan Beulich wrote: >>>> On 17.08.16 at 22:32, <konrad.wilk@oracle.com> wrote: >> One of the interesting things about XSA 154 fix ("x86: enforce consistent >> cachability of MMIO mappings") is that when certain applications (mcelog) >> are trying to map /dev/mmap and lurk in ISA regions - we get: > DYM /dev/mem ? Most accesses to which are bogus in PV guests > (often including Dom0) anyway. > >> [ 49.399053] WARNING: CPU: 0 PID: 2471 at arch/x86/mm/pat.c:913 untrack_pfn+0x93/0xc0() > What Linux version is this? untrack_pfn() doesn't span line 913 in > 4.8-rc2. And follow_phys() appears to only check whether the write > flag is set as expected; I can't see any cachability checks. Plus it > gets called only when both incoming address and size are zero. > >> Anyhow what I am wondering: >> >> a) Should we add a edge case in the hypervisor to allow multiple mappings >> for this region? I am thinking no.. but it sounds like mapping ISA region >> as WB is safe even in baremetal? > We should never allow multiple mappings with different cachability. > And I don't understand what makes you think the ISA region is safe > to map WB? There might be ROMs, MMIO regions, or simply nothing > there, neither of which is safe to map WB. ROMs certainly could be > WP, but that would require a way to reliably size not only ISA > extension ROMs, but also main and video BIOS. > >> b) Or would it be better to let Linux do its thing and treat 640KB->1MB >> as uncached instead of writeback? > According to what you wrote earlier the two parts of the sentence > read contradictory to me. > >> Looking at the kernel it assumes that WB is ok for 640KB->1MB. >> The comment says: >> " /* Low ISA region is always mapped WB in page table. No need to track *" > As per above it's not clear to me what this comment is backed by. This states what is in the pagetables. Not the combined result with MTRRs. WB in the pagetables and WC/UB in the MTRRs is a legal combination which functions correctly. ~Andrew
>>> On 18.08.16 at 12:16, <andrew.cooper3@citrix.com> wrote: > On 18/08/16 11:06, Jan Beulich wrote: >>>>> On 17.08.16 at 22:32, <konrad.wilk@oracle.com> wrote: >>> Looking at the kernel it assumes that WB is ok for 640KB->1MB. >>> The comment says: >>> " /* Low ISA region is always mapped WB in page table. No need to track > *" >> As per above it's not clear to me what this comment is backed by. > > This states what is in the pagetables. Not the combined result with MTRRs. > > WB in the pagetables and WC/UB in the MTRRs is a legal combination which > functions correctly. True, but then again - haven't I been told multiple times that Linux nowadays prefers to run without using MTRRs? Jan
On Thu, 18 Aug 2016 05:12:54 -0600 "Jan Beulich" <JBeulich@suse.com> wrote: > >>> On 18.08.16 at 12:16, <andrew.cooper3@citrix.com> wrote: > > On 18/08/16 11:06, Jan Beulich wrote: > >>>>> On 17.08.16 at 22:32, <konrad.wilk@oracle.com> wrote: > >>> Looking at the kernel it assumes that WB is ok for 640KB->1MB. > >>> The comment says: > >>> " /* Low ISA region is always mapped WB in page table. No need to track > > *" > >> As per above it's not clear to me what this comment is backed by. > > > > This states what is in the pagetables. Not the combined result with MTRRs. > > > > WB in the pagetables and WC/UB in the MTRRs is a legal combination which > > functions correctly. > > True, but then again - haven't I been told multiple times that Linux > nowadays prefers to run without using MTRRs? The BIOS sets up the fixed MTRR registers for the 640K-1MB window. Those are separate to the variable range MTRR registers used for main memory with specific mappings for segments A000 to BFFF then C000-C7FF / C800-CFFF / etc up to FFFF. Alan
On Thu, Aug 18, 2016 at 04:06:33AM -0600, Jan Beulich wrote: > >>> On 17.08.16 at 22:32, <konrad.wilk@oracle.com> wrote: > > One of the interesting things about XSA 154 fix ("x86: enforce consistent > > cachability of MMIO mappings") is that when certain applications (mcelog) > > are trying to map /dev/mmap and lurk in ISA regions - we get: > > DYM /dev/mem ? Most accesses to which are bogus in PV guests > (often including Dom0) anyway. Yes. > > > [ 49.399053] WARNING: CPU: 0 PID: 2471 at arch/x86/mm/pat.c:913 untrack_pfn+0x93/0xc0() > > What Linux version is this? untrack_pfn() doesn't span line 913 in 4.1 > 4.8-rc2. And follow_phys() appears to only check whether the write > flag is set as expected; I can't see any cachability checks. Plus it > gets called only when both incoming address and size are zero. The error that happens is much sooner - that is when the VMA is setup with the incorrect page attributes. Specifically: reserve_memtype which 548 /* Low ISA region is always mapped WB in page table. No need to track */ 549 if (x86_platform.is_untracked_pat_range(start, end)) { 550 if (new_type) 551 *new_type = _PAGE_CACHE_MODE_WB; 552 return 0; 553 } (And this for a change is v4.8-rc2) > > > Anyhow what I am wondering: > > > > a) Should we add a edge case in the hypervisor to allow multiple mappings > > for this region? I am thinking no.. but it sounds like mapping ISA region > > as WB is safe even in baremetal? > > We should never allow multiple mappings with different cachability. > And I don't understand what makes you think the ISA region is safe > to map WB? There might be ROMs, MMIO regions, or simply nothing > there, neither of which is safe to map WB. ROMs certainly could be > WP, but that would require a way to reliably size not only ISA > extension ROMs, but also main and video BIOS. > > > b) Or would it be better to let Linux do its thing and treat 640KB->1MB > > as uncached instead of writeback? > > According to what you wrote earlier the two parts of the sentence > read contradictory to me. > > > Looking at the kernel it assumes that WB is ok for 640KB->1MB. > > The comment says: > > " /* Low ISA region is always mapped WB in page table. No need to track *" > > As per above it's not clear to me what this comment is backed by. I was hoping you would know :-) Ah, commit 2e5d9c857d4e6c9e7b7d8c8c86a68a7842d213d6 Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> Date: Tue Mar 18 17:00:14 2008 -0700 x86: PAT infrastructure patch Sets up pat_init() infrastructure. which sets the MTRR for that region. > > Jan >
>>> On 19.08.16 at 16:52, <konrad.wilk@oracle.com> wrote: > On Thu, Aug 18, 2016 at 04:06:33AM -0600, Jan Beulich wrote: >> >>> On 17.08.16 at 22:32, <konrad.wilk@oracle.com> wrote: >> > Looking at the kernel it assumes that WB is ok for 640KB->1MB. >> > The comment says: >> > " /* Low ISA region is always mapped WB in page table. No need to track *" >> >> As per above it's not clear to me what this comment is backed by. > > I was hoping you would know :-) > > Ah, commit 2e5d9c857d4e6c9e7b7d8c8c86a68a7842d213d6 > Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> > Date: Tue Mar 18 17:00:14 2008 -0700 > > x86: PAT infrastructure patch > > Sets up pat_init() infrastructure. > > > which sets the MTRR for that region. Hmm, that's the commit which introduced pat.c years ago. I can't see it writing an MTRRs though, nor can I see it backing the comment in adds in any way. I guess I'm confused... Jan
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 70a38c1..e5ff5a5 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -288,6 +287,12 @@ static void __init xen_banner(void) version >> 16, version & 0xffff, extra.extraversion, xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : ""); } + +static bool xen_ignore(u64 s, u64 e) +{ + return false; +} + /* Check if running on Xen version (major, minor) or later */ bool xen_running_on_version_or_later(unsigned int major, unsigned int minor) @@ -1563,7 +1570,7 @@ asmlinkage __visible void __init xen_start_kernel(void) x86_init.resources.memory_setup = xen_memory_setup; x86_init.oem.arch_setup = xen_arch_setup; x86_init.oem.banner = xen_banner; - + x86_platform.is_untracked_pat_range = xen_ignore; xen_init_time_ops(); /*