diff mbox series

[2/2,v2] scsi: target: tcmu: Fix crash in tcmu_flush_dcache_range on ARM

Message ID 20200618131632.32748-3-bstroesser@ts.fujitsu.com (mailing list archive)
State Mainlined
Commit 3145550a7f8b08356c8ff29feaa6c56aca12901d
Headers show
Series scsi: target: tcmu: fix crashes on ARM | expand

Commit Message

Bodo Stroesser June 18, 2020, 1:16 p.m. UTC
This patch fixes the following crash
(see https://bugzilla.kernel.org/show_bug.cgi?id=208045)

 Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a)
 CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic
        #202004230533
 Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019
 pstate: 80400005 (Nzcv daif +PAN -UAO)
 pc : flush_dcache_page+0x18/0x40
 lr : is_ring_space_avail+0x68/0x2f8 [target_core_user]
 sp : ffff000015123a80
 x29: ffff000015123a80 x28: 0000000000000000
 x27: 0000000000001000 x26: ffff000023ea5000
 x25: ffffcfa25bbe08b8 x24: 0000000000000078
 x23: ffff7e0000000000 x22: ffff000023ea5001
 x21: ffffcfa24b79c000 x20: 0000000000000fff
 x19: ffff7e00008fa940 x18: 0000000000000000
 x17: 0000000000000000 x16: ffff2d047e709138
 x15: 0000000000000000 x14: 0000000000000000
 x13: 0000000000000000 x12: ffff2d047fbd0a40
 x11: 0000000000000000 x10: 0000000000000030
 x9 : 0000000000000000 x8 : ffffc9a254820a00
 x7 : 00000000000013b0 x6 : 000000000000003f
 x5 : 0000000000000040 x4 : ffffcfa25bbe08e8
 x3 : 0000000000001000 x2 : 0000000000000078
 x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18
 Call trace:
  flush_dcache_page+0x18/0x40
  is_ring_space_avail+0x68/0x2f8 [target_core_user]
  queue_cmd_ring+0x1f8/0x680 [target_core_user]
  tcmu_queue_cmd+0xe4/0x158 [target_core_user]
  __target_execute_cmd+0x30/0xf0 [target_core_mod]
  target_execute_cmd+0x294/0x390 [target_core_mod]
  transport_generic_new_cmd+0x1e8/0x358 [target_core_mod]
  transport_handle_cdb_direct+0x50/0xb0 [target_core_mod]
  iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod]
  iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod]
  iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod]
  iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod]
  iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod]
  kthread+0x130/0x138
  ret_from_fork+0x10/0x18
 Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260)
 ---[ end trace 1e451c73f4266776 ]---

The solution is based on patch:

  "scsi: target: tcmu: Optimize use of flush_dcache_page"

which restricts the use of tcmu_flush_dcache_range() to
addresses from vmalloc'ed areas only.

This patch now replaces the virt_to_page() call in
tcmu_flush_dcache_range() - which is wrong for vmalloced addrs -
by vmalloc_to_page().

The patch was tested on ARM with kernel 4.19.118 and 5.7.2

Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Tested-by: JiangYu <lnsyyj@hotmail.com>
Tested-by: Daniel Meyerholt <dxm523@gmail.com>
---
 drivers/target/target_core_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Mike Christie June 18, 2020, 3 p.m. UTC | #1
On 6/18/20 8:16 AM, Bodo Stroesser wrote:
> This patch fixes the following crash
> (see https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=208045__;!!GqivPVa7Brio!O7JgZIL3VPAzIqwJfZjL48y8M90K3HXv3pUVeoCzZ-vXovCpx5g7xMg-z5aAiVXVtkfE$ )
> 
>   Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a)
>   CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic
>          #202004230533
>   Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019
>   pstate: 80400005 (Nzcv daif +PAN -UAO)
>   pc : flush_dcache_page+0x18/0x40
>   lr : is_ring_space_avail+0x68/0x2f8 [target_core_user]
>   sp : ffff000015123a80
>   x29: ffff000015123a80 x28: 0000000000000000
>   x27: 0000000000001000 x26: ffff000023ea5000
>   x25: ffffcfa25bbe08b8 x24: 0000000000000078
>   x23: ffff7e0000000000 x22: ffff000023ea5001
>   x21: ffffcfa24b79c000 x20: 0000000000000fff
>   x19: ffff7e00008fa940 x18: 0000000000000000
>   x17: 0000000000000000 x16: ffff2d047e709138
>   x15: 0000000000000000 x14: 0000000000000000
>   x13: 0000000000000000 x12: ffff2d047fbd0a40
>   x11: 0000000000000000 x10: 0000000000000030
>   x9 : 0000000000000000 x8 : ffffc9a254820a00
>   x7 : 00000000000013b0 x6 : 000000000000003f
>   x5 : 0000000000000040 x4 : ffffcfa25bbe08e8
>   x3 : 0000000000001000 x2 : 0000000000000078
>   x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18
>   Call trace:
>    flush_dcache_page+0x18/0x40
>    is_ring_space_avail+0x68/0x2f8 [target_core_user]
>    queue_cmd_ring+0x1f8/0x680 [target_core_user]
>    tcmu_queue_cmd+0xe4/0x158 [target_core_user]
>    __target_execute_cmd+0x30/0xf0 [target_core_mod]
>    target_execute_cmd+0x294/0x390 [target_core_mod]
>    transport_generic_new_cmd+0x1e8/0x358 [target_core_mod]
>    transport_handle_cdb_direct+0x50/0xb0 [target_core_mod]
>    iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod]
>    iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod]
>    iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod]
>    iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod]
>    iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod]
>    kthread+0x130/0x138
>    ret_from_fork+0x10/0x18
>   Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260)
>   ---[ end trace 1e451c73f4266776 ]---
> 
> The solution is based on patch:
> 
>    "scsi: target: tcmu: Optimize use of flush_dcache_page"
> 
> which restricts the use of tcmu_flush_dcache_range() to
> addresses from vmalloc'ed areas only.
> 
> This patch now replaces the virt_to_page() call in
> tcmu_flush_dcache_range() - which is wrong for vmalloced addrs -
> by vmalloc_to_page().
> 
> The patch was tested on ARM with kernel 4.19.118 and 5.7.2
> 
> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
> Tested-by: JiangYu <lnsyyj@hotmail.com>
> Tested-by: Daniel Meyerholt <dxm523@gmail.com>
> ---
>   drivers/target/target_core_user.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
> index a65e9671ae7a..835d3001cb0e 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -601,7 +601,7 @@ static inline void tcmu_flush_dcache_range(void *vaddr, size_t size)
>   	size = round_up(size+offset, PAGE_SIZE);
>   
>   	while (size) {
> -		flush_dcache_page(virt_to_page(start));
> +		flush_dcache_page(vmalloc_to_page(start));
>   		start += PAGE_SIZE;
>   		size -= PAGE_SIZE;
>   	}

For this bug we only need to change the flush in is_ring_space_avail 
right? It's what is accessing the mb which is vmalloc'd.

Is it reccommended to call vmalloc_to_page on a page we get with 
alloc_page and is the mm then always going to do the right thing for us 
(I do not know the mm code well and just quickly scanned the 
Documentation and comments, but could not find anything), or could we 
hit similar issues where we are using the wrong call on different types 
of allocated memory down the line?
Bodo Stroesser June 18, 2020, 3:41 p.m. UTC | #2
On 2020-06-18 17:00, Mike Christie wrote:
> On 6/18/20 8:16 AM, Bodo Stroesser wrote:
>> This patch fixes the following crash
>> (see 
>> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=208045__;!!GqivPVa7Brio!O7JgZIL3VPAzIqwJfZjL48y8M90K3HXv3pUVeoCzZ-vXovCpx5g7xMg-z5aAiVXVtkfE$ 
>> )
>>
>>   Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a)
>>   CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic
>>          #202004230533
>>   Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019
>>   pstate: 80400005 (Nzcv daif +PAN -UAO)
>>   pc : flush_dcache_page+0x18/0x40
>>   lr : is_ring_space_avail+0x68/0x2f8 [target_core_user]
>>   sp : ffff000015123a80
>>   x29: ffff000015123a80 x28: 0000000000000000
>>   x27: 0000000000001000 x26: ffff000023ea5000
>>   x25: ffffcfa25bbe08b8 x24: 0000000000000078
>>   x23: ffff7e0000000000 x22: ffff000023ea5001
>>   x21: ffffcfa24b79c000 x20: 0000000000000fff
>>   x19: ffff7e00008fa940 x18: 0000000000000000
>>   x17: 0000000000000000 x16: ffff2d047e709138
>>   x15: 0000000000000000 x14: 0000000000000000
>>   x13: 0000000000000000 x12: ffff2d047fbd0a40
>>   x11: 0000000000000000 x10: 0000000000000030
>>   x9 : 0000000000000000 x8 : ffffc9a254820a00
>>   x7 : 00000000000013b0 x6 : 000000000000003f
>>   x5 : 0000000000000040 x4 : ffffcfa25bbe08e8
>>   x3 : 0000000000001000 x2 : 0000000000000078
>>   x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18
>>   Call trace:
>>    flush_dcache_page+0x18/0x40
>>    is_ring_space_avail+0x68/0x2f8 [target_core_user]
>>    queue_cmd_ring+0x1f8/0x680 [target_core_user]
>>    tcmu_queue_cmd+0xe4/0x158 [target_core_user]
>>    __target_execute_cmd+0x30/0xf0 [target_core_mod]
>>    target_execute_cmd+0x294/0x390 [target_core_mod]
>>    transport_generic_new_cmd+0x1e8/0x358 [target_core_mod]
>>    transport_handle_cdb_direct+0x50/0xb0 [target_core_mod]
>>    iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod]
>>    iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod]
>>    iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod]
>>    iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod]
>>    iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod]
>>    kthread+0x130/0x138
>>    ret_from_fork+0x10/0x18
>>   Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260)
>>   ---[ end trace 1e451c73f4266776 ]---
>>
>> The solution is based on patch:
>>
>>    "scsi: target: tcmu: Optimize use of flush_dcache_page"
>>
>> which restricts the use of tcmu_flush_dcache_range() to
>> addresses from vmalloc'ed areas only.
>>
>> This patch now replaces the virt_to_page() call in
>> tcmu_flush_dcache_range() - which is wrong for vmalloced addrs -
>> by vmalloc_to_page().
>>
>> The patch was tested on ARM with kernel 4.19.118 and 5.7.2
>>
>> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
>> Tested-by: JiangYu <lnsyyj@hotmail.com>
>> Tested-by: Daniel Meyerholt <dxm523@gmail.com>
>> ---
>>   drivers/target/target_core_user.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/target/target_core_user.c 
>> b/drivers/target/target_core_user.c
>> index a65e9671ae7a..835d3001cb0e 100644
>> --- a/drivers/target/target_core_user.c
>> +++ b/drivers/target/target_core_user.c
>> @@ -601,7 +601,7 @@ static inline void tcmu_flush_dcache_range(void 
>> *vaddr, size_t size)
>>       size = round_up(size+offset, PAGE_SIZE);
>>       while (size) {
>> -        flush_dcache_page(virt_to_page(start));
>> +        flush_dcache_page(vmalloc_to_page(start));
>>           start += PAGE_SIZE;
>>           size -= PAGE_SIZE;
>>       }
> 
> For this bug we only need to change the flush in is_ring_space_avail 
> right? It's what is accessing the mb which is vmalloc'd.
No, is_ring_space_avail was just the first caller of
tcmu_flush_dcache_range for vmalloc'ed pages, so it crashed there.

The entiere address range exposed to userspace via uio consists of
two parts:
1) mb + command ring are vmalloc'ed in a single vzalloc call
    during initialization of a tcmu device.
2) the data area which is allocated page by page calling
    alloc_page()

The second part is handled by (scatter|gather)_data_area. For the
calls from these routines I think usage of virt_to_page in
tcmu_flush_dcache_range was fine. But patch number 1 of this
series replaced these called with direct calls to flush_dcache_page.

So all remaining calls to tcmu_flush_dcache_range after patch 1
are calls for vmalloc'ed addresses. Therefore after patch 1 we can
safely replace virt_to_page with vmalloc_to_page.

So it turned out that patch 1, which in the beginning was thought
to be an optimization, now also simplifies the crash fix.

> 
> Is it reccommended to call vmalloc_to_page on a page we get with 
> alloc_page and is the mm then always going to do the right thing for us 
> (I do not know the mm code well and just quickly scanned the 
> Documentation and comments, but could not find anything), or could we 
> hit similar issues where we are using the wrong call on different types 
> of allocated memory down the line?
Mike Christie June 18, 2020, 4:06 p.m. UTC | #3
On 6/18/20 10:41 AM, Bodo Stroesser wrote:
> On 2020-06-18 17:00, Mike Christie wrote:
>> On 6/18/20 8:16 AM, Bodo Stroesser wrote:
>>> This patch fixes the following crash
>>> (see 
>>> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=208045__;!!GqivPVa7Brio!O7JgZIL3VPAzIqwJfZjL48y8M90K3HXv3pUVeoCzZ-vXovCpx5g7xMg-z5aAiVXVtkfE$ 
>>> )
>>>
>>>   Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a)
>>>   CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic
>>>          #202004230533
>>>   Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 
>>> 2019
>>>   pstate: 80400005 (Nzcv daif +PAN -UAO)
>>>   pc : flush_dcache_page+0x18/0x40
>>>   lr : is_ring_space_avail+0x68/0x2f8 [target_core_user]
>>>   sp : ffff000015123a80
>>>   x29: ffff000015123a80 x28: 0000000000000000
>>>   x27: 0000000000001000 x26: ffff000023ea5000
>>>   x25: ffffcfa25bbe08b8 x24: 0000000000000078
>>>   x23: ffff7e0000000000 x22: ffff000023ea5001
>>>   x21: ffffcfa24b79c000 x20: 0000000000000fff
>>>   x19: ffff7e00008fa940 x18: 0000000000000000
>>>   x17: 0000000000000000 x16: ffff2d047e709138
>>>   x15: 0000000000000000 x14: 0000000000000000
>>>   x13: 0000000000000000 x12: ffff2d047fbd0a40
>>>   x11: 0000000000000000 x10: 0000000000000030
>>>   x9 : 0000000000000000 x8 : ffffc9a254820a00
>>>   x7 : 00000000000013b0 x6 : 000000000000003f
>>>   x5 : 0000000000000040 x4 : ffffcfa25bbe08e8
>>>   x3 : 0000000000001000 x2 : 0000000000000078
>>>   x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18
>>>   Call trace:
>>>    flush_dcache_page+0x18/0x40
>>>    is_ring_space_avail+0x68/0x2f8 [target_core_user]
>>>    queue_cmd_ring+0x1f8/0x680 [target_core_user]
>>>    tcmu_queue_cmd+0xe4/0x158 [target_core_user]
>>>    __target_execute_cmd+0x30/0xf0 [target_core_mod]
>>>    target_execute_cmd+0x294/0x390 [target_core_mod]
>>>    transport_generic_new_cmd+0x1e8/0x358 [target_core_mod]
>>>    transport_handle_cdb_direct+0x50/0xb0 [target_core_mod]
>>>    iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod]
>>>    iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod]
>>>    iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod]
>>>    iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod]
>>>    iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod]
>>>    kthread+0x130/0x138
>>>    ret_from_fork+0x10/0x18
>>>   Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260)
>>>   ---[ end trace 1e451c73f4266776 ]---
>>>
>>> The solution is based on patch:
>>>
>>>    "scsi: target: tcmu: Optimize use of flush_dcache_page"
>>>
>>> which restricts the use of tcmu_flush_dcache_range() to
>>> addresses from vmalloc'ed areas only.
>>>
>>> This patch now replaces the virt_to_page() call in
>>> tcmu_flush_dcache_range() - which is wrong for vmalloced addrs -
>>> by vmalloc_to_page().
>>>
>>> The patch was tested on ARM with kernel 4.19.118 and 5.7.2
>>>
>>> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
>>> Tested-by: JiangYu <lnsyyj@hotmail.com>
>>> Tested-by: Daniel Meyerholt <dxm523@gmail.com>
>>> ---
>>>   drivers/target/target_core_user.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/target/target_core_user.c 
>>> b/drivers/target/target_core_user.c
>>> index a65e9671ae7a..835d3001cb0e 100644
>>> --- a/drivers/target/target_core_user.c
>>> +++ b/drivers/target/target_core_user.c
>>> @@ -601,7 +601,7 @@ static inline void tcmu_flush_dcache_range(void 
>>> *vaddr, size_t size)
>>>       size = round_up(size+offset, PAGE_SIZE);
>>>       while (size) {
>>> -        flush_dcache_page(virt_to_page(start));
>>> +        flush_dcache_page(vmalloc_to_page(start));
>>>           start += PAGE_SIZE;
>>>           size -= PAGE_SIZE;
>>>       }
>>
>> For this bug we only need to change the flush in is_ring_space_avail 
>> right? It's what is accessing the mb which is vmalloc'd.
> No, is_ring_space_avail was just the first caller of
> tcmu_flush_dcache_range for vmalloc'ed pages, so it crashed there.
> 
> The entiere address range exposed to userspace via uio consists of
> two parts:
> 1) mb + command ring are vmalloc'ed in a single vzalloc call
>     during initialization of a tcmu device.
> 2) the data area which is allocated page by page calling
>     alloc_page()
> 
> The second part is handled by (scatter|gather)_data_area. For the
> calls from these routines I think usage of virt_to_page in
> tcmu_flush_dcache_range was fine. But patch number 1 of this
> series replaced these called with direct calls to flush_dcache_page.

That was my problem. Did not see the replacement. Looks good to me.
diff mbox series

Patch

diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
index a65e9671ae7a..835d3001cb0e 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -601,7 +601,7 @@  static inline void tcmu_flush_dcache_range(void *vaddr, size_t size)
 	size = round_up(size+offset, PAGE_SIZE);
 
 	while (size) {
-		flush_dcache_page(virt_to_page(start));
+		flush_dcache_page(vmalloc_to_page(start));
 		start += PAGE_SIZE;
 		size -= PAGE_SIZE;
 	}