diff mbox series

[v2] io_uring: Adjust mapping wrt architecture aliasing requirements

Message ID Y+3kwh8BokobVl6o@p100 (mailing list archive)
State New, archived
Headers show
Series [v2] io_uring: Adjust mapping wrt architecture aliasing requirements | expand

Commit Message

Helge Deller Feb. 16, 2023, 8:09 a.m. UTC
Some architectures have memory cache aliasing requirements (e.g. parisc)
if memory is shared between userspace and kernel. This patch fixes the
kernel to return an aliased address when asked by userspace via mmap().

Signed-off-by: Helge Deller <deller@gmx.de>
---
v2: Do not allow to map to a user-provided addresss. This forces
programs to write portable code, as usually on x86 mapping to any
address will succeed, while it will fail for most provided address if
used on stricter architectures.

Comments

Jens Axboe Feb. 16, 2023, 4:11 p.m. UTC | #1
On 2/16/23 1:09?AM, Helge Deller wrote:
> Some architectures have memory cache aliasing requirements (e.g. parisc)
> if memory is shared between userspace and kernel. This patch fixes the
> kernel to return an aliased address when asked by userspace via mmap().
> 
> Signed-off-by: Helge Deller <deller@gmx.de>
> ---
> v2: Do not allow to map to a user-provided addresss. This forces
> programs to write portable code, as usually on x86 mapping to any
> address will succeed, while it will fail for most provided address if
> used on stricter architectures.
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 862e05e6691d..01fe7437a071 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -72,6 +72,7 @@
>  #include <linux/io_uring.h>
>  #include <linux/audit.h>
>  #include <linux/security.h>
> +#include <asm/shmparam.h>
> 
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/io_uring.h>
> @@ -3059,6 +3060,54 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>  	return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
>  }
> 
> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
> +			unsigned long addr, unsigned long len,
> +			unsigned long pgoff, unsigned long flags)
> +{
> +	const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
> +	struct vm_unmapped_area_info info;
> +	void *ptr;
> +
> +	/*
> +	 * Do not allow to map to user-provided address to avoid breaking the
> +	 * aliasing rules. Userspace is not able to guess the offset address of
> +	 * kernel kmalloc()ed memory area.
> +	 */
> +	if (addr)
> +		return -EINVAL;

Can we relax this so that if the address is correctly aligned, it will
allow it? The reported issue with sqpoll-cancel-hang.t is due to it
crashing because it's a weird syzbot thing that does mmap() with
MAP_FIXED and an address given.
Helge Deller Feb. 16, 2023, 4:33 p.m. UTC | #2
On 2/16/23 17:11, Jens Axboe wrote:
> On 2/16/23 1:09?AM, Helge Deller wrote:
>> Some architectures have memory cache aliasing requirements (e.g. parisc)
>> if memory is shared between userspace and kernel. This patch fixes the
>> kernel to return an aliased address when asked by userspace via mmap().
>>
>> Signed-off-by: Helge Deller <deller@gmx.de>
>> ---
>> v2: Do not allow to map to a user-provided addresss. This forces
>> programs to write portable code, as usually on x86 mapping to any
>> address will succeed, while it will fail for most provided address if
>> used on stricter architectures.
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 862e05e6691d..01fe7437a071 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -72,6 +72,7 @@
>>   #include <linux/io_uring.h>
>>   #include <linux/audit.h>
>>   #include <linux/security.h>
>> +#include <asm/shmparam.h>
>>
>>   #define CREATE_TRACE_POINTS
>>   #include <trace/events/io_uring.h>
>> @@ -3059,6 +3060,54 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>>   	return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
>>   }
>>
>> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>> +			unsigned long addr, unsigned long len,
>> +			unsigned long pgoff, unsigned long flags)
>> +{
>> +	const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
>> +	struct vm_unmapped_area_info info;
>> +	void *ptr;
>> +
>> +	/*
>> +	 * Do not allow to map to user-provided address to avoid breaking the
>> +	 * aliasing rules. Userspace is not able to guess the offset address of
>> +	 * kernel kmalloc()ed memory area.
>> +	 */
>> +	if (addr)
>> +		return -EINVAL;
>
> Can we relax this so that if the address is correctly aligned, it will
> allow it?

My previous patch had it relaxed, but after some more thoughts I removed
it in this v2-version again.

The idea behind it is good, but I see a huge disadvantage in allowing
correctly aligned addresses: People develop their code usually on x86
which has no such alignment requirements, as it just needs to be PAGE_SIZE aligned.
So their code will always work fine on x86, but as soon as the same code
is built on other platforms it will break. As you know, on parisc it's pure luck
if the program chooses an address which is correctly aligned.
I'm one of the debian maintainers for parisc, and I've seen similiar
mmap-issues in other programs as well. Everytime I've found it to be wrong,
you have to explain to the developers what's wrong and sometimes it's
not easy to fix it.
So, if we can educate people from assuming their code to be correct, I think
we can save a lot of additional work afterwards.
That said, I think it's better to be strict now, unless someone comes
up with a really good reason why it needs to be less strict.

> The reported issue with sqpoll-cancel-hang.t is due to it
> crashing because it's a weird syzbot thing that does mmap() with
> MAP_FIXED and an address given.

Ok, but nevertheless I think it's better to be strict.

Helge
Jens Axboe Feb. 16, 2023, 4:46 p.m. UTC | #3
On 2/16/23 9:33?AM, Helge Deller wrote:
> On 2/16/23 17:11, Jens Axboe wrote:
>> On 2/16/23 1:09?AM, Helge Deller wrote:
>>> Some architectures have memory cache aliasing requirements (e.g. parisc)
>>> if memory is shared between userspace and kernel. This patch fixes the
>>> kernel to return an aliased address when asked by userspace via mmap().
>>>
>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>> ---
>>> v2: Do not allow to map to a user-provided addresss. This forces
>>> programs to write portable code, as usually on x86 mapping to any
>>> address will succeed, while it will fail for most provided address if
>>> used on stricter architectures.
>>>
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index 862e05e6691d..01fe7437a071 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -72,6 +72,7 @@
>>>   #include <linux/io_uring.h>
>>>   #include <linux/audit.h>
>>>   #include <linux/security.h>
>>> +#include <asm/shmparam.h>
>>>
>>>   #define CREATE_TRACE_POINTS
>>>   #include <trace/events/io_uring.h>
>>> @@ -3059,6 +3060,54 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>>>       return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
>>>   }
>>>
>>> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>>> +            unsigned long addr, unsigned long len,
>>> +            unsigned long pgoff, unsigned long flags)
>>> +{
>>> +    const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
>>> +    struct vm_unmapped_area_info info;
>>> +    void *ptr;
>>> +
>>> +    /*
>>> +     * Do not allow to map to user-provided address to avoid breaking the
>>> +     * aliasing rules. Userspace is not able to guess the offset address of
>>> +     * kernel kmalloc()ed memory area.
>>> +     */
>>> +    if (addr)
>>> +        return -EINVAL;
>>
>> Can we relax this so that if the address is correctly aligned, it will
>> allow it?
> 
> My previous patch had it relaxed, but after some more thoughts I removed
> it in this v2-version again.
> 
> The idea behind it is good, but I see a huge disadvantage in allowing
> correctly aligned addresses: People develop their code usually on x86
> which has no such alignment requirements, as it just needs to be PAGE_SIZE aligned.
> So their code will always work fine on x86, but as soon as the same code
> is built on other platforms it will break. As you know, on parisc it's pure luck
> if the program chooses an address which is correctly aligned.
> I'm one of the debian maintainers for parisc, and I've seen similiar
> mmap-issues in other programs as well. Everytime I've found it to be wrong,
> you have to explain to the developers what's wrong and sometimes it's
> not easy to fix it.
> So, if we can educate people from assuming their code to be correct, I think
> we can save a lot of additional work afterwards.
> That said, I think it's better to be strict now, unless someone comes
> up with a really good reason why it needs to be less strict.

I don't disagree with the reasoning at all, but the problem is that it
may introduce breakage if someone IS doing the right thing. Is it
guaranteed to be true? No, certainly not. But someone could very well be
writing perfectly portable code and mapping a ring into a specific
address, and this will now break.

AFAICT, this is actually the case with the syzbot case. In fact, with
the patch applied, it'll obviously start crashing on all archs as the
mmaps will now return -EINVAL rather than work.
Helge Deller Feb. 16, 2023, 5:52 p.m. UTC | #4
On 2/16/23 17:46, Jens Axboe wrote:
> On 2/16/23 9:33?AM, Helge Deller wrote:
>> On 2/16/23 17:11, Jens Axboe wrote:
>>> On 2/16/23 1:09?AM, Helge Deller wrote:
>>>> Some architectures have memory cache aliasing requirements (e.g. parisc)
>>>> if memory is shared between userspace and kernel. This patch fixes the
>>>> kernel to return an aliased address when asked by userspace via mmap().
>>>>
>>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>>> ---
>>>> v2: Do not allow to map to a user-provided addresss. This forces
>>>> programs to write portable code, as usually on x86 mapping to any
>>>> address will succeed, while it will fail for most provided address if
>>>> used on stricter architectures.
>>>>
>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>> index 862e05e6691d..01fe7437a071 100644
>>>> --- a/io_uring/io_uring.c
>>>> +++ b/io_uring/io_uring.c
>>>> @@ -72,6 +72,7 @@
>>>>    #include <linux/io_uring.h>
>>>>    #include <linux/audit.h>
>>>>    #include <linux/security.h>
>>>> +#include <asm/shmparam.h>
>>>>
>>>>    #define CREATE_TRACE_POINTS
>>>>    #include <trace/events/io_uring.h>
>>>> @@ -3059,6 +3060,54 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>>>>        return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
>>>>    }
>>>>
>>>> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>>>> +            unsigned long addr, unsigned long len,
>>>> +            unsigned long pgoff, unsigned long flags)
>>>> +{
>>>> +    const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
>>>> +    struct vm_unmapped_area_info info;
>>>> +    void *ptr;
>>>> +
>>>> +    /*
>>>> +     * Do not allow to map to user-provided address to avoid breaking the
>>>> +     * aliasing rules. Userspace is not able to guess the offset address of
>>>> +     * kernel kmalloc()ed memory area.
>>>> +     */
>>>> +    if (addr)
>>>> +        return -EINVAL;
>>>
>>> Can we relax this so that if the address is correctly aligned, it will
>>> allow it?
>>
>> My previous patch had it relaxed, but after some more thoughts I removed
>> it in this v2-version again.
>>
>> The idea behind it is good, but I see a huge disadvantage in allowing
>> correctly aligned addresses: People develop their code usually on x86
>> which has no such alignment requirements, as it just needs to be PAGE_SIZE aligned.
>> So their code will always work fine on x86, but as soon as the same code
>> is built on other platforms it will break. As you know, on parisc it's pure luck
>> if the program chooses an address which is correctly aligned.
>> I'm one of the debian maintainers for parisc, and I've seen similiar
>> mmap-issues in other programs as well. Everytime I've found it to be wrong,
>> you have to explain to the developers what's wrong and sometimes it's
>> not easy to fix it.
>> So, if we can educate people from assuming their code to be correct, I think
>> we can save a lot of additional work afterwards.
>> That said, I think it's better to be strict now, unless someone comes
>> up with a really good reason why it needs to be less strict.
>
> I don't disagree with the reasoning at all, but the problem is that it
> may introduce breakage if someone IS doing the right thing. Is it
> guaranteed to be true? No, certainly not. But someone could very well be
> writing perfectly portable code and mapping a ring into a specific
> address, and this will now break.

We will find out if there are such users if we keep it strict now and
open it up if it's really necessary.
If you open it up now, you won't be able to turn it stricter later.

> AFAICT, this is actually the case with the syzbot case. In fact, with
> the patch applied, it'll obviously start crashing on all archs as the
> mmaps will now return -EINVAL rather than work.

Yes, but it's not a real user and just a (invalid) testcase.
For that I think it's OK to just disable it.

Helge
Jens Axboe Feb. 16, 2023, 6 p.m. UTC | #5
On 2/16/23 10:52?AM, Helge Deller wrote:
> On 2/16/23 17:46, Jens Axboe wrote:
>> On 2/16/23 9:33?AM, Helge Deller wrote:
>>> On 2/16/23 17:11, Jens Axboe wrote:
>>>> On 2/16/23 1:09?AM, Helge Deller wrote:
>>>>> Some architectures have memory cache aliasing requirements (e.g. parisc)
>>>>> if memory is shared between userspace and kernel. This patch fixes the
>>>>> kernel to return an aliased address when asked by userspace via mmap().
>>>>>
>>>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>>>> ---
>>>>> v2: Do not allow to map to a user-provided addresss. This forces
>>>>> programs to write portable code, as usually on x86 mapping to any
>>>>> address will succeed, while it will fail for most provided address if
>>>>> used on stricter architectures.
>>>>>
>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>> index 862e05e6691d..01fe7437a071 100644
>>>>> --- a/io_uring/io_uring.c
>>>>> +++ b/io_uring/io_uring.c
>>>>> @@ -72,6 +72,7 @@
>>>>>    #include <linux/io_uring.h>
>>>>>    #include <linux/audit.h>
>>>>>    #include <linux/security.h>
>>>>> +#include <asm/shmparam.h>
>>>>>
>>>>>    #define CREATE_TRACE_POINTS
>>>>>    #include <trace/events/io_uring.h>
>>>>> @@ -3059,6 +3060,54 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>>>>>        return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
>>>>>    }
>>>>>
>>>>> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>>>>> +            unsigned long addr, unsigned long len,
>>>>> +            unsigned long pgoff, unsigned long flags)
>>>>> +{
>>>>> +    const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
>>>>> +    struct vm_unmapped_area_info info;
>>>>> +    void *ptr;
>>>>> +
>>>>> +    /*
>>>>> +     * Do not allow to map to user-provided address to avoid breaking the
>>>>> +     * aliasing rules. Userspace is not able to guess the offset address of
>>>>> +     * kernel kmalloc()ed memory area.
>>>>> +     */
>>>>> +    if (addr)
>>>>> +        return -EINVAL;
>>>>
>>>> Can we relax this so that if the address is correctly aligned, it will
>>>> allow it?
>>>
>>> My previous patch had it relaxed, but after some more thoughts I removed
>>> it in this v2-version again.
>>>
>>> The idea behind it is good, but I see a huge disadvantage in allowing
>>> correctly aligned addresses: People develop their code usually on x86
>>> which has no such alignment requirements, as it just needs to be PAGE_SIZE aligned.
>>> So their code will always work fine on x86, but as soon as the same code
>>> is built on other platforms it will break. As you know, on parisc it's pure luck
>>> if the program chooses an address which is correctly aligned.
>>> I'm one of the debian maintainers for parisc, and I've seen similiar
>>> mmap-issues in other programs as well. Everytime I've found it to be wrong,
>>> you have to explain to the developers what's wrong and sometimes it's
>>> not easy to fix it.
>>> So, if we can educate people from assuming their code to be correct, I think
>>> we can save a lot of additional work afterwards.
>>> That said, I think it's better to be strict now, unless someone comes
>>> up with a really good reason why it needs to be less strict.
>>
>> I don't disagree with the reasoning at all, but the problem is that it
>> may introduce breakage if someone IS doing the right thing. Is it
>> guaranteed to be true? No, certainly not. But someone could very well be
>> writing perfectly portable code and mapping a ring into a specific
>> address, and this will now break.
> 
> We will find out if there are such users if we keep it strict now and
> open it up if it's really necessary. If you open it up now, you won't
> be able to turn it stricter later.

But it has been open up until now, that's the issue. And you're now
trying to make it stricter, which is indeed later...

>> AFAICT, this is actually the case with the syzbot case. In fact, with
>> the patch applied, it'll obviously start crashing on all archs as the
>> mmaps will now return -EINVAL rather than work.
> 
> Yes, but it's not a real user and just a (invalid) testcase.
> For that I think it's OK to just disable it.

Totally agree, and I did just disable it, but that part of the test is
not invalid. I don't care about this particular test case, it's more of
a general concern.
Jiri Slaby June 27, 2023, 2:14 p.m. UTC | #6
On 16. 02. 23, 9:09, Helge Deller wrote:
> Some architectures have memory cache aliasing requirements (e.g. parisc)
> if memory is shared between userspace and kernel. This patch fixes the
> kernel to return an aliased address when asked by userspace via mmap().
> 
> Signed-off-by: Helge Deller <deller@gmx.de>
> ---
> v2: Do not allow to map to a user-provided addresss. This forces
> programs to write portable code, as usually on x86 mapping to any
> address will succeed, while it will fail for most provided address if
> used on stricter architectures.
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 862e05e6691d..01fe7437a071 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -72,6 +72,7 @@
>   #include <linux/io_uring.h>
>   #include <linux/audit.h>
>   #include <linux/security.h>
> +#include <asm/shmparam.h>
> 
>   #define CREATE_TRACE_POINTS
>   #include <trace/events/io_uring.h>
> @@ -3059,6 +3060,54 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>   	return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
>   }
> 
> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
> +			unsigned long addr, unsigned long len,
> +			unsigned long pgoff, unsigned long flags)
> +{
> +	const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
> +	struct vm_unmapped_area_info info;
> +	void *ptr;
> +
> +	/*
> +	 * Do not allow to map to user-provided address to avoid breaking the
> +	 * aliasing rules. Userspace is not able to guess the offset address of
> +	 * kernel kmalloc()ed memory area.
> +	 */
> +	if (addr)
> +		return -EINVAL;
> +
> +	ptr = io_uring_validate_mmap_request(filp, pgoff, len);
> +	if (IS_ERR(ptr))
> +		return -ENOMEM;
> +
> +	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> +	info.length = len;
> +	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
> +	info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);

Hi,

this breaks compat (x86_32) on x86_64 in 6.4. When you run most liburing 
tests, you'll get ENOMEM, as this high_limit is something in 64-bit space...

> +#ifdef SHM_COLOUR
> +	info.align_mask = PAGE_MASK & (SHM_COLOUR - 1UL);
> +#else
> +	info.align_mask = PAGE_MASK & (SHMLBA - 1UL);
> +#endif
> +	info.align_offset = (unsigned long) ptr;
> +
> +	/*
> +	 * A failed mmap() very likely causes application failure,
> +	 * so fall back to the bottom-up function here. This scenario
> +	 * can happen with large stack limits and large mmap()
> +	 * allocations.
> +	 */
> +	addr = vm_unmapped_area(&info);

So the found addr here is > TASK_SIZE - len for 32-bit bins. And 
get_unmapped_area() returns ENOMEM.

> +	if (offset_in_page(addr)) {
> +		info.flags = 0;
> +		info.low_limit = TASK_UNMAPPED_BASE;
> +		info.high_limit = mmap_end;
> +		addr = vm_unmapped_area(&info);
> +	}
> +
> +	return addr;
> +}

Reverting the whole commit helps of course. Even this completely 
incorrect hack helps:
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3398,7 +3398,7 @@ static unsigned long 
io_uring_mmu_get_unmapped_area(struct file *filp,
                         unsigned long addr, unsigned long len,
                         unsigned long pgoff, unsigned long flags)
  {
-       const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
+       const unsigned long mmap_end = in_32bit_syscall() ? 
task_size_32bit() : arch_get_mmap_end(addr, len, flags);
         struct vm_unmapped_area_info info;
         void *ptr;

@@ -3417,7 +3417,7 @@ static unsigned long 
io_uring_mmu_get_unmapped_area(struct file *filp,
         info.flags = VM_UNMAPPED_AREA_TOPDOWN;
         info.length = len;
         info.low_limit = max(PAGE_SIZE, mmap_min_addr);
-       info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
+       info.high_limit = in_32bit_syscall() ? task_size_32bit() : 
arch_get_mmap_base(addr, current->mm->mmap_base);
  #ifdef SHM_COLOUR
         info.align_mask = PAGE_MASK & (SHM_COLOUR - 1UL);
  #else


Any ideas? Note that the compat mmap apparently uses bottomup expansion. 
See:
         if (!in_32bit_syscall() && (flags & MAP_32BIT))
                 goto bottomup;

in arch_get_unmapped_area_topdown().

thanks,
Helge Deller June 27, 2023, 7:24 p.m. UTC | #7
On 6/27/23 16:14, Jiri Slaby wrote:
> On 16. 02. 23, 9:09, Helge Deller wrote:
>> Some architectures have memory cache aliasing requirements (e.g. parisc)
>> if memory is shared between userspace and kernel. This patch fixes the
>> kernel to return an aliased address when asked by userspace via mmap().
>>
>> Signed-off-by: Helge Deller <deller@gmx.de>
>> ---
>> v2: Do not allow to map to a user-provided addresss. This forces
>> programs to write portable code, as usually on x86 mapping to any
>> address will succeed, while it will fail for most provided address if
>> used on stricter architectures.
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 862e05e6691d..01fe7437a071 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -72,6 +72,7 @@
>>   #include <linux/io_uring.h>
>>   #include <linux/audit.h>
>>   #include <linux/security.h>
>> +#include <asm/shmparam.h>
>>
>>   #define CREATE_TRACE_POINTS
>>   #include <trace/events/io_uring.h>
>> @@ -3059,6 +3060,54 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>>       return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
>>   }
>>
>> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>> +            unsigned long addr, unsigned long len,
>> +            unsigned long pgoff, unsigned long flags)
>> +{
>> +    const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
>> +    struct vm_unmapped_area_info info;
>> +    void *ptr;
>> +
>> +    /*
>> +     * Do not allow to map to user-provided address to avoid breaking the
>> +     * aliasing rules. Userspace is not able to guess the offset address of
>> +     * kernel kmalloc()ed memory area.
>> +     */
>> +    if (addr)
>> +        return -EINVAL;
>> +
>> +    ptr = io_uring_validate_mmap_request(filp, pgoff, len);
>> +    if (IS_ERR(ptr))
>> +        return -ENOMEM;
>> +
>> +    info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>> +    info.length = len;
>> +    info.low_limit = max(PAGE_SIZE, mmap_min_addr);
>> +    info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
>
> Hi,
>
> this breaks compat (x86_32) on x86_64 in 6.4. When you run most liburing tests, you'll get ENOMEM, as this high_limit is something in 64-bit space...
>
>> +#ifdef SHM_COLOUR
>> +    info.align_mask = PAGE_MASK & (SHM_COLOUR - 1UL);
>> +#else
>> +    info.align_mask = PAGE_MASK & (SHMLBA - 1UL);
>> +#endif
>> +    info.align_offset = (unsigned long) ptr;
>> +
>> +    /*
>> +     * A failed mmap() very likely causes application failure,
>> +     * so fall back to the bottom-up function here. This scenario
>> +     * can happen with large stack limits and large mmap()
>> +     * allocations.
>> +     */
>> +    addr = vm_unmapped_area(&info);
>
> So the found addr here is > TASK_SIZE - len for 32-bit bins. And get_unmapped_area() returns ENOMEM.
>
>> +    if (offset_in_page(addr)) {
>> +        info.flags = 0;
>> +        info.low_limit = TASK_UNMAPPED_BASE;
>> +        info.high_limit = mmap_end;
>> +        addr = vm_unmapped_area(&info);
>> +    }
>> +
>> +    return addr;
>> +}
>
> Reverting the whole commit helps of course. Even this completely incorrect hack helps:
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3398,7 +3398,7 @@ static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>                          unsigned long addr, unsigned long len,
>                          unsigned long pgoff, unsigned long flags)
>   {
> -       const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
> +       const unsigned long mmap_end = in_32bit_syscall() ? task_size_32bit() : arch_get_mmap_end(addr, len, flags);
>          struct vm_unmapped_area_info info;
>          void *ptr;
>
> @@ -3417,7 +3417,7 @@ static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>          info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>          info.length = len;
>          info.low_limit = max(PAGE_SIZE, mmap_min_addr);
> -       info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
> +       info.high_limit = in_32bit_syscall() ? task_size_32bit() : arch_get_mmap_base(addr, current->mm->mmap_base);

I think your "incorrect hack" is actually correct.
If it's the compat case which breaks, then task_size_32bit() might be right.
Maybe adding arch_get_mmap_base() and arch_get_mmap_end() macros to handle
the compat case in to arch/x86/include/asm/* does work, e.g.
#define arch_get_mmap_base(addr, base) \
	(in_32bit_syscall() ? task_size_32bit() : base)

?
Helge
diff mbox series

Patch

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 862e05e6691d..01fe7437a071 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -72,6 +72,7 @@ 
 #include <linux/io_uring.h>
 #include <linux/audit.h>
 #include <linux/security.h>
+#include <asm/shmparam.h>

 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -3059,6 +3060,54 @@  static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
 	return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
 }

+static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
+			unsigned long addr, unsigned long len,
+			unsigned long pgoff, unsigned long flags)
+{
+	const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
+	struct vm_unmapped_area_info info;
+	void *ptr;
+
+	/*
+	 * Do not allow to map to user-provided address to avoid breaking the
+	 * aliasing rules. Userspace is not able to guess the offset address of
+	 * kernel kmalloc()ed memory area.
+	 */
+	if (addr)
+		return -EINVAL;
+
+	ptr = io_uring_validate_mmap_request(filp, pgoff, len);
+	if (IS_ERR(ptr))
+		return -ENOMEM;
+
+	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+	info.length = len;
+	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
+	info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
+#ifdef SHM_COLOUR
+	info.align_mask = PAGE_MASK & (SHM_COLOUR - 1UL);
+#else
+	info.align_mask = PAGE_MASK & (SHMLBA - 1UL);
+#endif
+	info.align_offset = (unsigned long) ptr;
+
+	/*
+	 * A failed mmap() very likely causes application failure,
+	 * so fall back to the bottom-up function here. This scenario
+	 * can happen with large stack limits and large mmap()
+	 * allocations.
+	 */
+	addr = vm_unmapped_area(&info);
+	if (offset_in_page(addr)) {
+		info.flags = 0;
+		info.low_limit = TASK_UNMAPPED_BASE;
+		info.high_limit = mmap_end;
+		addr = vm_unmapped_area(&info);
+	}
+
+	return addr;
+}
+
 #else /* !CONFIG_MMU */

 static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
@@ -3273,6 +3322,8 @@  static const struct file_operations io_uring_fops = {
 #ifndef CONFIG_MMU
 	.get_unmapped_area = io_uring_nommu_get_unmapped_area,
 	.mmap_capabilities = io_uring_nommu_mmap_capabilities,
+#else
+	.get_unmapped_area = io_uring_mmu_get_unmapped_area,
 #endif
 	.poll		= io_uring_poll,
 #ifdef CONFIG_PROC_FS