diff mbox

[RFC,1/2] arm: cacheflush syscall: process only pages that are in the memory

Message ID 20180126111441.29353-1-m.szyprowski@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marek Szyprowski Jan. 26, 2018, 11:14 a.m. UTC
glibc in calls cacheflush syscall on the whole textrels section of the
relocated binaries. However, relocation usually doesn't touch all pages
of that section, so not all of them are read to memory when calling this
syscall. However flush_cache_user_range() function will unconditionally
touch all pages from the provided range, resulting additional overhead
related to reading all clean pages. Optimize this by calling
flush_cache_user_range() only on the pages that are already in the
memory.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/kernel/traps.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

Comments

Russell King (Oracle) Jan. 26, 2018, 11:32 a.m. UTC | #1
On Fri, Jan 26, 2018 at 12:14:40PM +0100, Marek Szyprowski wrote:
> glibc in calls cacheflush syscall on the whole textrels section of the
> relocated binaries. However, relocation usually doesn't touch all pages
> of that section, so not all of them are read to memory when calling this
> syscall. However flush_cache_user_range() function will unconditionally
> touch all pages from the provided range, resulting additional overhead
> related to reading all clean pages. Optimize this by calling
> flush_cache_user_range() only on the pages that are already in the
> memory.

What ensures that another CPU doesn't remove a page while we're
flushing it?  That will trigger a data abort, which will want to
take the mmap_sem, causing a deadlock.

> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
>  arch/arm/kernel/traps.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> index 5e3633c24e63..a5ec262ab30e 100644
> --- a/arch/arm/kernel/traps.c
> +++ b/arch/arm/kernel/traps.c
> @@ -564,23 +564,36 @@ static int bad_syscall(int n, struct pt_regs *regs)
>  static inline int
>  __do_cache_op(unsigned long start, unsigned long end)
>  {
> -	int ret;
> +	struct vm_area_struct *vma = NULL;
> +	int ret = 0;
>  
> +	down_read(&current->mm->mmap_sem);
>  	do {
>  		unsigned long chunk = min(PAGE_SIZE, end - start);
>  
> +		if (!vma || vma->vm_end <= start) {
> +			vma = find_vma(current->mm, start);
> +			if (!vma) {
> +				ret = -EFAULT;
> +				goto done;
> +			}
> +		}
> +
>  		if (fatal_signal_pending(current))
>  			return 0;
>  
> -		ret = flush_cache_user_range(start, start + chunk);
> -		if (ret)
> -			return ret;
> +		if (follow_page(vma, start, 0)) {
> +			ret = flush_cache_user_range(start, start + chunk);
> +			if (ret)
> +				goto done;
> +		}
>  
>  		cond_resched();
>  		start += chunk;
>  	} while (start < end);
> -
> -	return 0;
> +done:
> +	up_read(&current->mm->mmap_sem);
> +	return ret;
>  }
>  
>  static inline int
> -- 
> 2.15.0
>
Marek Szyprowski Jan. 26, 2018, 1:30 p.m. UTC | #2
Hi Russell,

On 2018-01-26 12:32, Russell King - ARM Linux wrote:
> On Fri, Jan 26, 2018 at 12:14:40PM +0100, Marek Szyprowski wrote:
>> glibc in calls cacheflush syscall on the whole textrels section of the
>> relocated binaries. However, relocation usually doesn't touch all pages
>> of that section, so not all of them are read to memory when calling this
>> syscall. However flush_cache_user_range() function will unconditionally
>> touch all pages from the provided range, resulting additional overhead
>> related to reading all clean pages. Optimize this by calling
>> flush_cache_user_range() only on the pages that are already in the
>> memory.
> What ensures that another CPU doesn't remove a page while we're
> flushing it?  That will trigger a data abort, which will want to
> take the mmap_sem, causing a deadlock.

I thought that taking mmap_sem will prevent pages from being removed.
mmap_sem has been already taken in the previous implementation of that
syscall, until code simplification done by commit 97c72d89ce0e ("ARM:
cacheflush: don't bother rounding to nearest vma").

>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> ---
>>   arch/arm/kernel/traps.c | 25 +++++++++++++++++++------
>>   1 file changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
>> index 5e3633c24e63..a5ec262ab30e 100644
>> --- a/arch/arm/kernel/traps.c
>> +++ b/arch/arm/kernel/traps.c
>> @@ -564,23 +564,36 @@ static int bad_syscall(int n, struct pt_regs *regs)
>>   static inline int
>>   __do_cache_op(unsigned long start, unsigned long end)
>>   {
>> -	int ret;
>> +	struct vm_area_struct *vma = NULL;
>> +	int ret = 0;
>>   
>> +	down_read(&current->mm->mmap_sem);
>>   	do {
>>   		unsigned long chunk = min(PAGE_SIZE, end - start);
>>   
>> +		if (!vma || vma->vm_end <= start) {
>> +			vma = find_vma(current->mm, start);
>> +			if (!vma) {
>> +				ret = -EFAULT;
>> +				goto done;
>> +			}
>> +		}
>> +
>>   		if (fatal_signal_pending(current))
>>   			return 0;
>>   
>> -		ret = flush_cache_user_range(start, start + chunk);
>> -		if (ret)
>> -			return ret;
>> +		if (follow_page(vma, start, 0)) {
>> +			ret = flush_cache_user_range(start, start + chunk);
>> +			if (ret)
>> +				goto done;
>> +		}
>>   
>>   		cond_resched();
>>   		start += chunk;
>>   	} while (start < end);
>> -
>> -	return 0;
>> +done:
>> +	up_read(&current->mm->mmap_sem);
>> +	return ret;
>>   }
>>   
>>   static inline int
>> -- 
>> 2.15.0
>>

Best regards
Russell King (Oracle) Jan. 26, 2018, 9:39 p.m. UTC | #3
On Fri, Jan 26, 2018 at 02:30:47PM +0100, Marek Szyprowski wrote:
> Hi Russell,
> 
> On 2018-01-26 12:32, Russell King - ARM Linux wrote:
> >On Fri, Jan 26, 2018 at 12:14:40PM +0100, Marek Szyprowski wrote:
> >>glibc in calls cacheflush syscall on the whole textrels section of the
> >>relocated binaries. However, relocation usually doesn't touch all pages
> >>of that section, so not all of them are read to memory when calling this
> >>syscall. However flush_cache_user_range() function will unconditionally
> >>touch all pages from the provided range, resulting additional overhead
> >>related to reading all clean pages. Optimize this by calling
> >>flush_cache_user_range() only on the pages that are already in the
> >>memory.
> >What ensures that another CPU doesn't remove a page while we're
> >flushing it?  That will trigger a data abort, which will want to
> >take the mmap_sem, causing a deadlock.
> 
> I thought that taking mmap_sem will prevent pages from being removed.
> mmap_sem has been already taken in the previous implementation of that
> syscall, until code simplification done by commit 97c72d89ce0e ("ARM:
> cacheflush: don't bother rounding to nearest vma").

No, you're not reading the previous code state correctly.  Take a closer
look at that commit.

find_vma() requires that mmap_sem is held across the call as the VMA
list is not stable without that semaphore held.  However, more
importantly, notice that it drops the semaphore _before_ calling the
cache flushing function (__do_cache_op()).

The point is that if __do_cache_op() faults, it will enter
do_page_fault(), which will try to take the mmap_sem again, causing
a deadlock.
Inki Dae Jan. 31, 2018, 6:03 a.m. UTC | #4
Hi Russell,

2018년 01월 27일 06:39에 Russell King - ARM Linux 이(가) 쓴 글:
> On Fri, Jan 26, 2018 at 02:30:47PM +0100, Marek Szyprowski wrote:
>> Hi Russell,
>>
>> On 2018-01-26 12:32, Russell King - ARM Linux wrote:
>>> On Fri, Jan 26, 2018 at 12:14:40PM +0100, Marek Szyprowski wrote:
>>>> glibc in calls cacheflush syscall on the whole textrels section of the
>>>> relocated binaries. However, relocation usually doesn't touch all pages
>>>> of that section, so not all of them are read to memory when calling this
>>>> syscall. However flush_cache_user_range() function will unconditionally
>>>> touch all pages from the provided range, resulting additional overhead
>>>> related to reading all clean pages. Optimize this by calling
>>>> flush_cache_user_range() only on the pages that are already in the
>>>> memory.
>>> What ensures that another CPU doesn't remove a page while we're
>>> flushing it?  That will trigger a data abort, which will want to
>>> take the mmap_sem, causing a deadlock.
>>
>> I thought that taking mmap_sem will prevent pages from being removed.
>> mmap_sem has been already taken in the previous implementation of that
>> syscall, until code simplification done by commit 97c72d89ce0e ("ARM:
>> cacheflush: don't bother rounding to nearest vma").
> 
> No, you're not reading the previous code state correctly.  Take a closer
> look at that commit.
> 
> find_vma() requires that mmap_sem is held across the call as the VMA
> list is not stable without that semaphore held.  However, more
> importantly, notice that it drops the semaphore _before_ calling the
> cache flushing function (__do_cache_op()).
> 
> The point is that if __do_cache_op() faults, it will enter
> do_page_fault(), which will try to take the mmap_sem again, causing
> a deadlock.

I'm not sure but seems this patch tries to do cache-flush only in-memory pages.
So I think the page fault wouldn't happen becasue flush_cache_user_range function returns always 0.

Thanks,
Inki Dae

>
diff mbox

Patch

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 5e3633c24e63..a5ec262ab30e 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -564,23 +564,36 @@  static int bad_syscall(int n, struct pt_regs *regs)
 static inline int
 __do_cache_op(unsigned long start, unsigned long end)
 {
-	int ret;
+	struct vm_area_struct *vma = NULL;
+	int ret = 0;
 
+	down_read(&current->mm->mmap_sem);
 	do {
 		unsigned long chunk = min(PAGE_SIZE, end - start);
 
+		if (!vma || vma->vm_end <= start) {
+			vma = find_vma(current->mm, start);
+			if (!vma) {
+				ret = -EFAULT;
+				goto done;
+			}
+		}
+
 		if (fatal_signal_pending(current))
 			return 0;
 
-		ret = flush_cache_user_range(start, start + chunk);
-		if (ret)
-			return ret;
+		if (follow_page(vma, start, 0)) {
+			ret = flush_cache_user_range(start, start + chunk);
+			if (ret)
+				goto done;
+		}
 
 		cond_resched();
 		start += chunk;
 	} while (start < end);
-
-	return 0;
+done:
+	up_read(&current->mm->mmap_sem);
+	return ret;
 }
 
 static inline int