diff mbox

parisc: report inequivalent aliases only for writeable mappings

Message ID alpine.LRH.2.02.1401221402190.12739@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive)
State Rejected, archived
Headers show

Commit Message

Mikulas Patocka Jan. 22, 2014, 7:11 p.m. UTC
Hi

Here I'm sending a fix for parisc for Debian 5 userspace.

I'm just curious - why are those kmap_atomic and kunmap_atomic overrides 
needed while other architectures with virtually indexed caches (such as 
sparc) don't override these functions? It is that the other architectures 
flush cache at differnet points where parisc doesn't flush it?

Mikulas


From: Mikulas Patocka <mpatocka@redhat.com>

The patch f8dae00684d678afa13041ef170cecfd1297ed40 breaks Debian 5
userspace. After application of the patch, you get a lot of INEQUIVALENT
ALIASES messages on various dynamic libraries - so many that the system is
unbootable.

This patch changes it so that INEQUIVALENT ALIASES are only reported for
writeable mappings. PA-RISC specification allows inequivalent aliases for
read-only mappings, so there's no need to report them as an error.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org

---
 arch/parisc/kernel/cache.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

James Bottomley Jan. 22, 2014, 7:24 p.m. UTC | #1
On Wed, 2014-01-22 at 14:11 -0500, Mikulas Patocka wrote:
[no comment on the merits of the patch, just the wording of the change
log]
> This patch changes it so that INEQUIVALENT ALIASES are only reported for
> writeable mappings. PA-RISC specification allows inequivalent aliases for
> read-only mappings, so there's no need to report them as an error.

No, it doesn't.  The spec says no inequivalent aliases at all for
certain types of CPU (that was the cause of the inability to boot on the
CPU with the combined PIPT/VIPT cache) ... we believe we skirted the
requirements with some judicious flushing but we can't say it was
supported by the docs.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mikulas Patocka Jan. 22, 2014, 8:39 p.m. UTC | #2
On Wed, 22 Jan 2014, James Bottomley wrote:

> On Wed, 2014-01-22 at 14:11 -0500, Mikulas Patocka wrote:
> [no comment on the merits of the patch, just the wording of the change
> log]
> > This patch changes it so that INEQUIVALENT ALIASES are only reported for
> > writeable mappings. PA-RISC specification allows inequivalent aliases for
> > read-only mappings, so there's no need to report them as an error.
> 
> No, it doesn't.  The spec says no inequivalent aliases at all for
> certain types of CPU (that was the cause of the inability to boot on the
> CPU with the combined PIPT/VIPT cache) ... we believe we skirted the
> requirements with some judicious flushing but we can't say it was
> supported by the docs.
> 
> James

A citation from Parisc 2.0 specification, Appendix F, section Address 
Aliasing:

"Software is allowed to have any number of read-only non-equivalently 
aliased translations to a physical page, as long as there are no other 
translations to the page. This is referred to as read-only non-equivalent 
aliasing."

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Jan. 22, 2014, 8:57 p.m. UTC | #3
On Wed, 2014-01-22 at 15:39 -0500, Mikulas Patocka wrote:
> 
> On Wed, 22 Jan 2014, James Bottomley wrote:
> 
> > On Wed, 2014-01-22 at 14:11 -0500, Mikulas Patocka wrote:
> > [no comment on the merits of the patch, just the wording of the change
> > log]
> > > This patch changes it so that INEQUIVALENT ALIASES are only reported for
> > > writeable mappings. PA-RISC specification allows inequivalent aliases for
> > > read-only mappings, so there's no need to report them as an error.
> > 
> > No, it doesn't.  The spec says no inequivalent aliases at all for
> > certain types of CPU (that was the cause of the inability to boot on the
> > CPU with the combined PIPT/VIPT cache) ... we believe we skirted the
> > requirements with some judicious flushing but we can't say it was
> > supported by the docs.
> > 
> > James
> 
> A citation from Parisc 2.0 specification, Appendix F, section Address 
> Aliasing:
> 
> "Software is allowed to have any number of read-only non-equivalently 
> aliased translations to a physical page, as long as there are no other 
> translations to the page. This is referred to as read-only non-equivalent 
> aliasing."

The kernel alias is read/write.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mikulas Patocka Jan. 22, 2014, 9:31 p.m. UTC | #4
On Wed, 22 Jan 2014, James Bottomley wrote:

> On Wed, 2014-01-22 at 15:39 -0500, Mikulas Patocka wrote:
> > 
> > On Wed, 22 Jan 2014, James Bottomley wrote:
> > 
> > > On Wed, 2014-01-22 at 14:11 -0500, Mikulas Patocka wrote:
> > > [no comment on the merits of the patch, just the wording of the change
> > > log]
> > > > This patch changes it so that INEQUIVALENT ALIASES are only reported for
> > > > writeable mappings. PA-RISC specification allows inequivalent aliases for
> > > > read-only mappings, so there's no need to report them as an error.
> > > 
> > > No, it doesn't.  The spec says no inequivalent aliases at all for
> > > certain types of CPU (that was the cause of the inability to boot on the
> > > CPU with the combined PIPT/VIPT cache) ... we believe we skirted the
> > > requirements with some judicious flushing but we can't say it was
> > > supported by the docs.
> > > 
> > > James
> > 
> > A citation from Parisc 2.0 specification, Appendix F, section Address 
> > Aliasing:
> > 
> > "Software is allowed to have any number of read-only non-equivalently 
> > aliased translations to a physical page, as long as there are no other 
> > translations to the page. This is referred to as read-only non-equivalent 
> > aliasing."
> 
> The kernel alias is read/write.
> 
> James

But the kernel alias shouldn't be loaded in the TLB for dynamic libraries 
that are mapped read-only.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John David Anglin Jan. 22, 2014, 9:45 p.m. UTC | #5
On 1/22/2014 4:31 PM, Mikulas Patocka wrote:
>
> On Wed, 22 Jan 2014, James Bottomley wrote:
>
>> On Wed, 2014-01-22 at 15:39 -0500, Mikulas Patocka wrote:
>>> On Wed, 22 Jan 2014, James Bottomley wrote:
>>>
>>>> On Wed, 2014-01-22 at 14:11 -0500, Mikulas Patocka wrote:
>>>> [no comment on the merits of the patch, just the wording of the change
>>>> log]
>>>>> This patch changes it so that INEQUIVALENT ALIASES are only reported for
>>>>> writeable mappings. PA-RISC specification allows inequivalent aliases for
>>>>> read-only mappings, so there's no need to report them as an error.
>>>> No, it doesn't.  The spec says no inequivalent aliases at all for
>>>> certain types of CPU (that was the cause of the inability to boot on the
>>>> CPU with the combined PIPT/VIPT cache) ... we believe we skirted the
>>>> requirements with some judicious flushing but we can't say it was
>>>> supported by the docs.
>>>>
>>>> James
>>> A citation from Parisc 2.0 specification, Appendix F, section Address
>>> Aliasing:
>>>
>>> "Software is allowed to have any number of read-only non-equivalently
>>> aliased translations to a physical page, as long as there are no other
>>> translations to the page. This is referred to as read-only non-equivalent
>>> aliasing."
>> The kernel alias is read/write.
>>
>> James
> But the kernel alias shouldn't be loaded in the TLB for dynamic libraries
> that are mapped read-only.
The majority messages occur because of a binutils bug that was fixed 
several years ago.
There's a non equivalent mapping between the last code page and the 
start of writable
data in almost every application and shared library in Debian 5. This is 
fixed in the current
Debian unstable and Gentoo.  So, I recommend updating.

When the user aliases are re-enabled, we have the following situation 
when  non equivalent
aliases exist:

"All other uses of non-equivalent aliasing (including simultaneously 
enabling multiple non-equivalently
aliased translations where one or more allow for write access) are 
prohibited, and can cause machine
checks or silent data corruption, including data corruption of unrelated 
memory on unrelated pages."

I'm not sure that we handle correctly handle the case where there are 
only equivalent user aliases.
Calling flush_dcache_page() was a step in this direction but 
unfortunately Helge and I have found
a side effect (zombies run by expect in gcc/gdb testsuites).  I've also 
found another situation where
non equivalent aliases are generated.

I tend to think message should be a debug message.

Dave
Mikulas Patocka Jan. 22, 2014, 10:27 p.m. UTC | #6
> The majority messages occur because of a binutils bug that was fixed several
> years ago.
> There's a non equivalent mapping between the last code page and the start of
> writable
> data in almost every application and shared library in Debian 5. This is fixed
> in the current
> Debian unstable and Gentoo.  So, I recommend updating.

So far I haven't had any problems with Debian 5, so I prefer it to the 
constantly changing unstable.

Anyway, the kernel should work with Debian 5 - the only way how to install 
a new parisc system is to install Debian 5 and then switch to 
debian-ports.

> When the user aliases are re-enabled, we have the following situation when
> non equivalent
> aliases exist:
> 
> "All other uses of non-equivalent aliasing (including simultaneously enabling
> multiple non-equivalently
> aliased translations where one or more allow for write access) are prohibited,
> and can cause machine
> checks or silent data corruption, including data corruption of unrelated
> memory on unrelated pages."
> 
> I'm not sure that we handle correctly handle the case where there are only
> equivalent user aliases.
> Calling flush_dcache_page() was a step in this direction but unfortunately
> Helge and I have found
> a side effect (zombies run by expect in gcc/gdb testsuites).  I've also found
> another situation where
> non equivalent aliases are generated.
> 
> I tend to think message should be a debug message.
> 
> Dave
> 
> -- 
> John David Anglin    dave.anglin@bell.net


There is another problem - flushing the cache in kmap_atomic doesn't fix 
inequivalent aliasing because there may be other threads on other CPUs 
touching that page from userspace simultaneously.


I got an idea that it could be possible to implement kmap_atomic without 
flushing the cache - currently, 64-bit pagetables map 2^41 bytes of 
memory. You can hack the kernel tlb handler, so that the addresses above 
2^41 map to the same memory as base kernel space, just shifted by a few 
pages.

Suppose that the following ranges in the kernel address space map to the 
same memory:

0 ... 2^41-1	(the original kernel mapping)
2^41 + 4096 ... 2*2^41 + 4095 (an alias shifted by 4k)
2*2^41 + 8192 ... 3*2^41 + 8191 (an alias shifted by 8k)
3*2^41 + 12288 ... 4*2^41 + 12287 (an alias shifted by 12k)
... etc for all 1024 page aliasings.
1023*2^41 + 4190208 ... 1024*2^41 + 4190207 (an alias shifted by 4M-4k)

Then, kmap_atomic could select a kernel mapping that has the same 
cache-equivalence as the existing userspace mapping and simply return it 
to kernelspace without flushing the cache.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John David Anglin Jan. 23, 2014, 2:26 a.m. UTC | #7
On 22-Jan-14, at 5:27 PM, Mikulas Patocka wrote:

>> The majority messages occur because of a binutils bug that was  
>> fixed several
>> years ago.
>> There's a non equivalent mapping between the last code page and the  
>> start of
>> writable
>> data in almost every application and shared library in Debian 5.  
>> This is fixed
>> in the current
>> Debian unstable and Gentoo.  So, I recommend updating.
>
> So far I haven't had any problems with Debian 5, so I prefer it to the
> constantly changing unstable.
>
> Anyway, the kernel should work with Debian 5 - the only way how to  
> install
> a new parisc system is to install Debian 5 and then switch to
> debian-ports.

Actually, Helge has a lifimage that allows a new Debian system to be  
setup with debootstrap.
Helge is working toward new installer.  There's documentation on wiki  
on how to do it.

At the moment, we are stuck with the constantly changing unstable.  I  
want to say though
that even if your suggestion below works, there are non equivalent   
aliases in most Debian 5
applications and libraries.  So, even if kernel updates to pages are  
done equivalently,
this might cause issues.

>
>> When the user aliases are re-enabled, we have the following  
>> situation when
>> non equivalent
>> aliases exist:
>>
>> "All other uses of non-equivalent aliasing (including  
>> simultaneously enabling
>> multiple non-equivalently
>> aliased translations where one or more allow for write access) are  
>> prohibited,
>> and can cause machine
>> checks or silent data corruption, including data corruption of  
>> unrelated
>> memory on unrelated pages."
>>
>> I'm not sure that we handle correctly handle the case where there  
>> are only
>> equivalent user aliases.
>> Calling flush_dcache_page() was a step in this direction but  
>> unfortunately
>> Helge and I have found
>> a side effect (zombies run by expect in gcc/gdb testsuites).  I've  
>> also found
>> another situation where
>> non equivalent aliases are generated.
>>
>> I tend to think message should be a debug message.
>>
>> Dave
>>
>> -- 
>> John David Anglin    dave.anglin@bell.net
>
>
> There is another problem - flushing the cache in kmap_atomic doesn't  
> fix
> inequivalent aliasing because there may be other threads on other CPUs
> touching that page from userspace simultaneously.

That is the fundamental issue.  In part, it may be the assumptions  
surrounding how
COW is implemented.  I know reverting the kmap part of the change  
works better.
In that implementation, copy_user_page() flushes the from page itself.

I know the above is pretty solid as I ran with it for two weeks  
without any obvious cache issues.

What led us to the kmap flush is that the aio code reads and writes  
the kernel pages.
Is it possible that user access isn't involved there or there's a user  
flush before the aio
operation?

In some sense, this would seem to be a Linux core design problem if  
access
to shared pages isn't controlled.  I imagine various arm variants  
would also break
from these issues.


>
>
> I got an idea that it could be possible to implement kmap_atomic  
> without
> flushing the cache - currently, 64-bit pagetables map 2^41 bytes of
> memory. You can hack the kernel tlb handler, so that the addresses  
> above
> 2^41 map to the same memory as base kernel space, just shifted by a  
> few
> pages.
>
> Suppose that the following ranges in the kernel address space map to  
> the
> same memory:
>
> 0 ... 2^41-1	(the original kernel mapping)
> 2^41 + 4096 ... 2*2^41 + 4095 (an alias shifted by 4k)
> 2*2^41 + 8192 ... 3*2^41 + 8191 (an alias shifted by 8k)
> 3*2^41 + 12288 ... 4*2^41 + 12287 (an alias shifted by 12k)
> ... etc for all 1024 page aliasings.
> 1023*2^41 + 4190208 ... 1024*2^41 + 4190207 (an alias shifted by  
> 4M-4k)
>
> Then, kmap_atomic could select a kernel mapping that has the same
> cache-equivalence as the existing userspace mapping and simply  
> return it
> to kernelspace without flushing the cache.

This is a very interesting suggestion.  I wasn't a aware that the  
kernel mapping
could be controlled in this way.

Dave
--
John David Anglin	dave.anglin@bell.net



--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller Jan. 23, 2014, 7:21 p.m. UTC | #8
Hi Mikulas,

On 01/23/2014 03:26 AM, John David Anglin wrote:
> On 22-Jan-14, at 5:27 PM, Mikulas Patocka wrote:
>> Anyway, the kernel should work with Debian 5 - the only way how to install
>> a new parisc system is to install Debian 5 and then switch to
>> debian-ports.
> 
> Actually, Helge has a lifimage that allows a new Debian system to be setup with debootstrap.
> Helge is working toward new installer.  There's documentation on wiki on how to do it.

Installing Debian unstable from debian-ports is fully described here:
https://parisc.wiki.kernel.org/index.php/Debian_Ports_Installation
(You will see some inequivalent aliases messages because the lifimage is still based on old binaries)

That's how I installed the debian buildd servers:
http://unstable.buildd.net/index-hppa.html

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-3.13/arch/parisc/kernel/cache.c
===================================================================
--- linux-3.13.orig/arch/parisc/kernel/cache.c	2014-01-20 21:40:18.000000000 +0100
+++ linux-3.13/arch/parisc/kernel/cache.c	2014-01-20 21:43:23.000000000 +0100
@@ -325,7 +325,7 @@  void flush_dcache_page(struct page *page
 		flush_tlb_page(mpnt, addr);
 		if (old_addr == 0 || (old_addr & (SHMLBA - 1)) != (addr & (SHMLBA - 1))) {
 			__flush_cache_page(mpnt, addr, page_to_phys(page));
-			if (old_addr)
+			if (old_addr && unlikely(mapping->i_mmap_writable != 0))
 				printk(KERN_ERR "INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %s\n", old_addr, addr, mpnt->vm_file ? (char *)mpnt->vm_file->f_path.dentry->d_name.name : "(null)");
 			old_addr = addr;
 		}