diff mbox

[v10,11/12] x86, mm, pat: Refactor !pat_enabled handling

Message ID 1432739944-22633-12-git-send-email-toshi.kani@hp.com (mailing list archive)
State Superseded
Headers show

Commit Message

Toshi Kani May 27, 2015, 3:19 p.m. UTC
This patch refactors the !pat_enabled code paths and integrates
them into the PAT abstraction code.  The PAT table is emulated by
corresponding to the two cache attribute bits, PWT (Write Through)
and PCD (Cache Disable).  The emulated PAT table is the same as the
BIOS default setup when the system has PAT but the "nopat" boot
option is specified.  The emulated PAT table is also used when
MSR_IA32_CR_PAT returns 0 (9d34cfdf4).

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/mm/init.c     |    6 ++--
 arch/x86/mm/iomap_32.c |   12 ++++---
 arch/x86/mm/ioremap.c  |   10 +-----
 arch/x86/mm/pageattr.c |    6 ----
 arch/x86/mm/pat.c      |   77 +++++++++++++++++++++++++++++-------------------
 5 files changed, 57 insertions(+), 54 deletions(-)

Comments

Borislav Petkov May 29, 2015, 8:58 a.m. UTC | #1
On Wed, May 27, 2015 at 09:19:03AM -0600, Toshi Kani wrote:
> This patch refactors the !pat_enabled code paths and integrates

Please refrain from using such empty phrases like "This patch does this
and that" in your commit messages - it is implicitly obvious that it is
"this patch" when one reads it.

> them into the PAT abstraction code.  The PAT table is emulated by
> corresponding to the two cache attribute bits, PWT (Write Through)
> and PCD (Cache Disable).  The emulated PAT table is the same as the
> BIOS default setup when the system has PAT but the "nopat" boot
> option is specified.  The emulated PAT table is also used when
> MSR_IA32_CR_PAT returns 0 (9d34cfdf4).

9d34cfdf4 - what is that thing? A commit message? If so, we quote them
like this:

  9d34cfdf4796 ("x86: Don't rely on VMWare emulating PAT MSR correctly")

note the 12 chars length of the commit id.

> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
> ---
>  arch/x86/mm/init.c     |    6 ++--
>  arch/x86/mm/iomap_32.c |   12 ++++---
>  arch/x86/mm/ioremap.c  |   10 +-----
>  arch/x86/mm/pageattr.c |    6 ----
>  arch/x86/mm/pat.c      |   77 +++++++++++++++++++++++++++++-------------------
>  5 files changed, 57 insertions(+), 54 deletions(-)

So I started applying your pile and everything was ok-ish until I came
about this trainwreck. You have a lot of changes in here, the commit
message is certainly lacking sufficient explanation as to why and this
patch is changing stuff which the previous one adds.

So a lot of unnecesary code movement.

Then you have stuff like this:

	+       } else if (!cpu_has_pat && pat_enabled) {

How can a CPU not have PAT but have it enabled?!?

So this is not how we do patchsets.

Please do the cleanups *first*. Do them in small, self-contained changes
explaining *why* you're doing them.

*Then* add the new functionality, .i.e. the WT.

Oh, and when you do your next version, do the patches against tip/master
because there are a bunch of changes in the PAT code already.

Thanks.
Toshi Kani May 29, 2015, 2:27 p.m. UTC | #2
On Fri, 2015-05-29 at 10:58 +0200, Borislav Petkov wrote:
> On Wed, May 27, 2015 at 09:19:03AM -0600, Toshi Kani wrote:
> > This patch refactors the !pat_enabled code paths and integrates
> 
> Please refrain from using such empty phrases like "This patch does this
> and that" in your commit messages - it is implicitly obvious that it is
> "this patch" when one reads it.
> 
> > them into the PAT abstraction code.  The PAT table is emulated by
> > corresponding to the two cache attribute bits, PWT (Write Through)
> > and PCD (Cache Disable).  The emulated PAT table is the same as the
> > BIOS default setup when the system has PAT but the "nopat" boot
> > option is specified.  The emulated PAT table is also used when
> > MSR_IA32_CR_PAT returns 0 (9d34cfdf4).
> 
> 9d34cfdf4 - what is that thing? A commit message? If so, we quote them
> like this:
> 
>   9d34cfdf4796 ("x86: Don't rely on VMWare emulating PAT MSR correctly")
> 
> note the 12 chars length of the commit id.

Yes, it refers the commit message above.

> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > Reviewed-by: Juergen Gross <jgross@suse.com>
> > ---
> >  arch/x86/mm/init.c     |    6 ++--
> >  arch/x86/mm/iomap_32.c |   12 ++++---
> >  arch/x86/mm/ioremap.c  |   10 +-----
> >  arch/x86/mm/pageattr.c |    6 ----
> >  arch/x86/mm/pat.c      |   77 +++++++++++++++++++++++++++++-------------------
> >  5 files changed, 57 insertions(+), 54 deletions(-)
> 
> So I started applying your pile and everything was ok-ish until I came
> about this trainwreck. You have a lot of changes in here, the commit
> message is certainly lacking sufficient explanation as to why and this
> patch is changing stuff which the previous one adds.

This !pat_enabled path cleanup was suggested during review and is
independent from the WT enablement.  So, I thought it'd be better to
place it as an additional change on top of the WT set, so that it'd be
easier to bisect when there is any issue found in the !pat_enabled path.

> So a lot of unnecesary code movement.
>
> Then you have stuff like this:
> 
> 	+       } else if (!cpu_has_pat && pat_enabled) {
> 
> How can a CPU not have PAT but have it enabled?!?

This simply preserves the original error check in the code.  This error
check makes sure that all CPUs have the PAT feature supported when PAT
is enabled.  This error can only happen when heterogeneous CPUs are
installed/emulated on the system/guest.  This check may be paranoid, but
this cleanup is not meant to modify such an error check.

> So this is not how we do patchsets.
> 
> Please do the cleanups *first*. Do them in small, self-contained changes
> explaining *why* you're doing them.
> 
> *Then* add the new functionality, .i.e. the WT.

Can you consider the patch 10/12-11/12 as a separate patchset from the
WT series?  If that is OK, I will resubmit 10/12 (BUG->panic) and 11/12
(commit log update). 

> Oh, and when you do your next version, do the patches against tip/master
> because there are a bunch of changes in the PAT code already.

Thanks,
-Toshi
Borislav Petkov May 29, 2015, 3:13 p.m. UTC | #3
On Fri, May 29, 2015 at 08:27:08AM -0600, Toshi Kani wrote:
> This simply preserves the original error check in the code.  This error
> check makes sure that all CPUs have the PAT feature supported when PAT
> is enabled.  This error can only happen when heterogeneous CPUs are
> installed/emulated on the system/guest.  This check may be paranoid, but
> this cleanup is not meant to modify such an error check.

No, this is a ridiculous attempt to justify crazy code. Please do it
right. If the cleanup makes the code more insane than it is, then don't
do it in the first place.

> Can you consider the patch 10/12-11/12 as a separate patchset from the
> WT series?  If that is OK, I will resubmit 10/12 (BUG->panic) and 11/12
> (commit log update).

That's not enough. 11/12 is a convoluted mess which needs splitting and
more detailed explanations in the commit messages.

So no. Read what I said: do the cleanup *first* , *then* add the new
functionality.

The WT patches shouldn't change all too much from what you have now.
Also, 11/12 changes stuff which you add in 1/12. This churn is useless
and shouldn't be there at all.

So you should be able to do the cleanup first and have the WT stuff
ontop just fine.
Toshi Kani May 29, 2015, 3:17 p.m. UTC | #4
On Fri, 2015-05-29 at 17:13 +0200, Borislav Petkov wrote:
> On Fri, May 29, 2015 at 08:27:08AM -0600, Toshi Kani wrote:
> > This simply preserves the original error check in the code.  This error
> > check makes sure that all CPUs have the PAT feature supported when PAT
> > is enabled.  This error can only happen when heterogeneous CPUs are
> > installed/emulated on the system/guest.  This check may be paranoid, but
> > this cleanup is not meant to modify such an error check.
> 
> No, this is a ridiculous attempt to justify crazy code. Please do it
> right. If the cleanup makes the code more insane than it is, then don't
> do it in the first place.

Well, the change is based on this review comment.  So, I am not sure
what would be the right thing to do.  I am not 100% certain that this
check can be removed, either.
https://lkml.org/lkml/2015/5/22/148

> > Can you consider the patch 10/12-11/12 as a separate patchset from the
> > WT series?  If that is OK, I will resubmit 10/12 (BUG->panic) and 11/12
> > (commit log update).
> 
> That's not enough. 11/12 is a convoluted mess which needs splitting and
> more detailed explanations in the commit messages.
> 
> So no. Read what I said: do the cleanup *first* , *then* add the new
> functionality.
> 
> The WT patches shouldn't change all too much from what you have now.
> Also, 11/12 changes stuff which you add in 1/12. This churn is useless
> and shouldn't be there at all.
> 
> So you should be able to do the cleanup first and have the WT stuff
> ontop just fine.

OK, I will do the cleanup first and resubmit the patchset based on
tip/master.

Thanks,
-Toshi
diff mbox

Patch

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 1d55318..8533b46 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -40,7 +40,7 @@ 
  */
 uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
 	[_PAGE_CACHE_MODE_WB      ]	= 0         | 0        ,
-	[_PAGE_CACHE_MODE_WC      ]	= _PAGE_PWT | 0        ,
+	[_PAGE_CACHE_MODE_WC      ]	= 0         | _PAGE_PCD,
 	[_PAGE_CACHE_MODE_UC_MINUS]	= 0         | _PAGE_PCD,
 	[_PAGE_CACHE_MODE_UC      ]	= _PAGE_PWT | _PAGE_PCD,
 	[_PAGE_CACHE_MODE_WT      ]	= 0         | _PAGE_PCD,
@@ -50,11 +50,11 @@  EXPORT_SYMBOL(__cachemode2pte_tbl);
 
 uint8_t __pte2cachemode_tbl[8] = {
 	[__pte2cm_idx( 0        | 0         | 0        )] = _PAGE_CACHE_MODE_WB,
-	[__pte2cm_idx(_PAGE_PWT | 0         | 0        )] = _PAGE_CACHE_MODE_WC,
+	[__pte2cm_idx(_PAGE_PWT | 0         | 0        )] = _PAGE_CACHE_MODE_UC_MINUS,
 	[__pte2cm_idx( 0        | _PAGE_PCD | 0        )] = _PAGE_CACHE_MODE_UC_MINUS,
 	[__pte2cm_idx(_PAGE_PWT | _PAGE_PCD | 0        )] = _PAGE_CACHE_MODE_UC,
 	[__pte2cm_idx( 0        | 0         | _PAGE_PAT)] = _PAGE_CACHE_MODE_WB,
-	[__pte2cm_idx(_PAGE_PWT | 0         | _PAGE_PAT)] = _PAGE_CACHE_MODE_WC,
+	[__pte2cm_idx(_PAGE_PWT | 0         | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC_MINUS,
 	[__pte2cm_idx(0         | _PAGE_PCD | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC_MINUS,
 	[__pte2cm_idx(_PAGE_PWT | _PAGE_PCD | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC,
 };
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
index 9ca35fc..2c51a2b 100644
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -77,13 +77,13 @@  void __iomem *
 iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
 {
 	/*
-	 * For non-PAT systems, promote PAGE_KERNEL_WC to PAGE_KERNEL_UC_MINUS.
-	 * PAGE_KERNEL_WC maps to PWT, which translates to uncached if the
-	 * MTRR is UC or WC.  UC_MINUS gets the real intention, of the
-	 * user, which is "WC if the MTRR is WC, UC if you can't do that."
+	 * For non-PAT systems, translate non-WB request to UC- just in
+	 * case the caller set the PWT bit to prot directly without using
+	 * pgprot_writecombine(). UC- translates to uncached if the MTRR
+	 * is UC or WC. UC- gets the real intention, of the user, which is
+	 * "WC if the MTRR is WC, UC if you can't do that."
 	 */
-	if (!pat_enabled && pgprot_val(prot) ==
-	    (__PAGE_KERNEL | cachemode2protval(_PAGE_CACHE_MODE_WC)))
+	if (!pat_enabled && pgprot2cachemode(prot) != _PAGE_CACHE_MODE_WB)
 		prot = __pgprot(__PAGE_KERNEL |
 				cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
 
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 7e702dc..f966129 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -265,11 +265,8 @@  EXPORT_SYMBOL(ioremap_nocache);
  */
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
-	if (pat_enabled)
-		return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
+	return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
 					__builtin_return_address(0));
-	else
-		return ioremap_nocache(phys_addr, size);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -285,11 +282,8 @@  EXPORT_SYMBOL(ioremap_wc);
  */
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
-	if (pat_enabled)
-		return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
+	return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
 					__builtin_return_address(0));
-	else
-		return ioremap_nocache(phys_addr, size);
 }
 EXPORT_SYMBOL(ioremap_wt);
 
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 6427273..5a25e95 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1578,9 +1578,6 @@  int set_memory_wc(unsigned long addr, int numpages)
 {
 	int ret;
 
-	if (!pat_enabled)
-		return set_memory_uc(addr, numpages);
-
 	ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
 		_PAGE_CACHE_MODE_WC, NULL);
 	if (ret)
@@ -1604,9 +1601,6 @@  int set_memory_wt(unsigned long addr, int numpages)
 {
 	int ret;
 
-	if (!pat_enabled)
-		return set_memory_uc(addr, numpages);
-
 	ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
 			      _PAGE_CACHE_MODE_WT, NULL);
 	if (ret)
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 7cfd995..0533867 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -186,7 +186,11 @@  void pat_init_cache_modes(void)
 	char pat_msg[33];
 	u64 pat;
 
-	rdmsrl(MSR_IA32_CR_PAT, pat);
+	if (pat_enabled)
+		rdmsrl(MSR_IA32_CR_PAT, pat);
+	else
+		pat = boot_pat_state;
+
 	pat_msg[32] = 0;
 	for (i = 7; i >= 0; i--) {
 		cache = pat_get_cache_mode((pat >> (i * 8)) & 7,
@@ -204,21 +208,16 @@  void pat_init(void)
 	struct cpuinfo_x86 *c = &boot_cpu_data;
 	static bool boot_cpu_done;
 
-	if (!pat_enabled)
-		return;
-
 	if (!boot_cpu_done) {
-		if (!cpu_has_pat) {
+		if (!cpu_has_pat)
 			pat_disable("PAT not supported by CPU.");
-			return;
-		}
 
-		rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
-		if (!boot_pat_state) {
-			pat_disable("PAT read returns always zero, disabled.");
-			return;
+		if (pat_enabled) {
+			rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
+			if (!boot_pat_state)
+				pat_disable("PAT read returns always zero, disabled.");
 		}
-	} else if (!cpu_has_pat) {
+	} else if (!cpu_has_pat && pat_enabled) {
 		/*
 		 * If this happens we are on a secondary CPU, but
 		 * switched to PAT on the boot CPU. We have no way to
@@ -228,9 +227,35 @@  void pat_init(void)
 		BUG();
 	}
 
-	if ((c->x86_vendor == X86_VENDOR_INTEL) &&
-	    (((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
-	     ((c->x86 == 0xf) && (c->x86_model <= 0x6)))) {
+	if (!pat_enabled) {
+		/*
+		 * No PAT. Emulate the PAT table that corresponds to the two
+		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
+		 * This setup is the same as the BIOS default setup when the
+		 * system has PAT but the "nopat" boot option is specified.
+		 * This emulated PAT table is also used when MSR_IA32_CR_PAT
+		 * returns 0.
+		 *
+		 *  PTE encoding used in Linux:
+		 *       PCD
+		 *       |PWT  PAT
+		 *       ||    slot
+		 *       00    0    WB : _PAGE_CACHE_MODE_WB
+		 *       01    1    WT : _PAGE_CACHE_MODE_WT
+		 *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
+		 *       11    3    UC : _PAGE_CACHE_MODE_UC
+		 *
+		 * NOTE: When WC or WP is used, it is redirected to UC- per
+		 * the default setup in __cachemode2pte_tbl[].
+		 */
+		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
+		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+		if (!boot_pat_state)
+			boot_pat_state = pat;
+
+	} else if ((c->x86_vendor == X86_VENDOR_INTEL) &&
+		   (((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
+		    ((c->x86 == 0xf) && (c->x86_model <= 0x6)))) {
 		/*
 		 * PAT support with the lower four entries. Intel Pentium 2,
 		 * 3, M, and 4 are affected by PAT errata, which makes the
@@ -284,7 +309,8 @@  void pat_init(void)
 		      PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
 	}
 
-	wrmsrl(MSR_IA32_CR_PAT, pat);
+	if (pat_enabled)
+		wrmsrl(MSR_IA32_CR_PAT, pat);
 
 	if (!boot_cpu_done) {
 		pat_init_cache_modes();
@@ -457,13 +483,8 @@  int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
 	BUG_ON(start >= end); /* end is exclusive */
 
 	if (!pat_enabled) {
-		/* WB and UC- are the only types supported without PAT */
-		if (new_type) {
-			if (req_type == _PAGE_CACHE_MODE_WB)
-				*new_type = _PAGE_CACHE_MODE_WB;
-			else
-				*new_type = _PAGE_CACHE_MODE_UC_MINUS;
-		}
+		if (new_type)
+			*new_type = req_type;
 		return 0;
 	}
 
@@ -962,21 +983,15 @@  void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
 
 pgprot_t pgprot_writecombine(pgprot_t prot)
 {
-	if (pat_enabled)
-		return __pgprot(pgprot_val(prot) |
+	return __pgprot(pgprot_val(prot) |
 				cachemode2protval(_PAGE_CACHE_MODE_WC));
-	else
-		return pgprot_noncached(prot);
 }
 EXPORT_SYMBOL_GPL(pgprot_writecombine);
 
 pgprot_t pgprot_writethrough(pgprot_t prot)
 {
-	if (pat_enabled)
-		return __pgprot(pgprot_val(prot) |
+	return __pgprot(pgprot_val(prot) |
 				cachemode2protval(_PAGE_CACHE_MODE_WT));
-	else
-		return pgprot_noncached(prot);
 }
 EXPORT_SYMBOL_GPL(pgprot_writethrough);