diff mbox

drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page

Message ID 1482139186-17140-1-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin Dec. 19, 2016, 9:19 a.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

For some reason GCC 6.2.1 here unrolls the from and to stack memcpy
here in per-byte fashion and also by repeatedly loading offset
constants. It look horrible like this for example:

      ...
     fdc:       48 b8 41 00 00 00 00    movabs rax,0xffff880000000041
     fe3:       88 ff ff
     fe6:       44 88 74 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r14b
     feb:       48 b8 42 00 00 00 00    movabs rax,0xffff880000000042
     ff2:       88 ff ff
     ff5:       44 88 6c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r13b
     ffa:       48 b8 43 00 00 00 00    movabs rax,0xffff880000000043
    1001:       88 ff ff
    1004:       44 88 64 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r12b
    1009:       48 b8 44 00 00 00 00    movabs rax,0xffff880000000044
    1010:       88 ff ff
    1013:       88 5c 06 80             mov    BYTE PTR [rsi+rax*1-0x80],bl
    1017:       48 b8 45 00 00 00 00    movabs rax,0xffff880000000045
    101e:       88 ff ff
    1021:       44 88 5c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r11b
    1026:       48 b8 46 00 00 00 00    movabs rax,0xffff880000000046
    102d:       88 ff ff
    1030:       44 88 54 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r10b
    1035:       48 b8 47 00 00 00 00    movabs rax,0xffff880000000047
    103c:       88 ff ff
    103f:       44 88 4c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r9b
    1044:       0f b6 5d d0             movzx  ebx,BYTE PTR [rbp-0x30]
    1048:       48 b8 48 00 00 00 00    movabs rax,0xffff880000000048
    104f:       88 ff ff
    1052:       88 5c 06 80             mov    BYTE PTR [rsi+rax*1-0x80],bl
    1056:       48 b8 49 00 00 00 00    movabs rax,0xffff880000000049
    105d:       88 ff ff
    1060:       40 88 7c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],dil
    1065:       0f b6 5d cf             movzx  ebx,BYTE PTR [rbp-0x31]
    1069:       48 b8 4a 00 00 00 00    movabs rax,0xffff88000000004a
    1070:       88 ff ff
    1073:       88 5c 06 80             mov    BYTE PTR [rsi+rax*1-0x80],bl
    1077:       0f b6 7d ce             movzx  edi,BYTE PTR [rbp-0x32]
    107b:       48 b8 4b 00 00 00 00    movabs rax,0xffff88000000004b
      ...

So change the code a bit which makes it generate a more reasonable
code like:
  ...
 bf1:   48 89 78 b8             mov    QWORD PTR [rax-0x48],rdi
 bf5:   4c 89 60 c0             mov    QWORD PTR [rax-0x40],r12
 bf9:   48 89 58 c8             mov    QWORD PTR [rax-0x38],rbx
 bfd:   4c 89 58 d0             mov    QWORD PTR [rax-0x30],r11
 c01:   4c 89 50 d8             mov    QWORD PTR [rax-0x28],r10
  ...

Which saves 2087 bytes of code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_fence_reg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Joonas Lahtinen Dec. 19, 2016, 9:47 a.m. UTC | #1
On ma, 2016-12-19 at 09:19 +0000, Tvrtko Ursulin wrote:
> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> @@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
>  	vaddr = kmap(page);
>  
>  	for (i = 0; i < PAGE_SIZE; i += 128) {
> -		memcpy(temp, &vaddr[i], 64);
> +		memcpy(&temp[0], &vaddr[i], 64);
>  		memcpy(&vaddr[i], &vaddr[i + 64], 64);
> -		memcpy(&vaddr[i + 64], temp, 64);
> +		memcpy(&vaddr[i + 64], &temp[0], 64);

This reeks of GCC bug badly. So I would not apply as next time the bug
could be into the another direction.

Regards, Joonas
Jani Nikula Dec. 19, 2016, 10:32 a.m. UTC | #2
On Mon, 19 Dec 2016, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
> On ma, 2016-12-19 at 09:19 +0000, Tvrtko Ursulin wrote:
>> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
>> @@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
>>  	vaddr = kmap(page);
>>  
>>  	for (i = 0; i < PAGE_SIZE; i += 128) {
>> -		memcpy(temp, &vaddr[i], 64);
>> +		memcpy(&temp[0], &vaddr[i], 64);
>>  		memcpy(&vaddr[i], &vaddr[i + 64], 64);
>> -		memcpy(&vaddr[i + 64], temp, 64);
>> +		memcpy(&vaddr[i + 64], &temp[0], 64);
>
> This reeks of GCC bug badly. So I would not apply as next time the bug
> could be into the another direction.

Agreed. Please file a bug over at https://gcc.gnu.org/bugs/

BR,
Jani.
Tvrtko Ursulin Dec. 20, 2016, 9:48 a.m. UTC | #3
On 19/12/2016 10:32, Jani Nikula wrote:
> On Mon, 19 Dec 2016, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
>> On ma, 2016-12-19 at 09:19 +0000, Tvrtko Ursulin wrote:
>>> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
>>> @@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
>>>  	vaddr = kmap(page);
>>>
>>>  	for (i = 0; i < PAGE_SIZE; i += 128) {
>>> -		memcpy(temp, &vaddr[i], 64);
>>> +		memcpy(&temp[0], &vaddr[i], 64);
>>>  		memcpy(&vaddr[i], &vaddr[i + 64], 64);
>>> -		memcpy(&vaddr[i + 64], temp, 64);
>>> +		memcpy(&vaddr[i + 64], &temp[0], 64);
>>
>> This reeks of GCC bug badly. So I would not apply as next time the bug
>> could be into the another direction.
>
> Agreed. Please file a bug over at https://gcc.gnu.org/bugs/

Bug filed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78869

Potentially useful code generation explorer picked up from #gcc: 
https://godbolt.org/g/XNioHs

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index e03983973252..d665d2e74641 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -631,9 +631,9 @@  i915_gem_swizzle_page(struct page *page)
 	vaddr = kmap(page);
 
 	for (i = 0; i < PAGE_SIZE; i += 128) {
-		memcpy(temp, &vaddr[i], 64);
+		memcpy(&temp[0], &vaddr[i], 64);
 		memcpy(&vaddr[i], &vaddr[i + 64], 64);
-		memcpy(&vaddr[i + 64], temp, 64);
+		memcpy(&vaddr[i + 64], &temp[0], 64);
 	}
 
 	kunmap(page);