diff mbox

[v3,for,4.15] dmaengine: dmatest: move callback wait queue to thread context

Message ID 1510927894-15297-1-git-send-email-awallis@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Adam Wallis Nov. 17, 2017, 2:11 p.m. UTC
Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
introduced a bug (that is in fact documented by the patch commit text)
that leaves behind a dangling pointer. Since the done_wait structure is
allocated on the stack, future invocations to the DMATEST can produce
undesirable results (e.g., corrupted spinlocks).

Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times
out") attempted to WARN the user that the stack was likely corrupted but
did not fix the actual issue.

This patch fixes the issue by pushing the wait queue and callback
structs into the the thread structure. If a failure occurs due to time,
dmaengine_terminate_all will force the callback to safely call
wake_up_all() without possibility of using a freed pointer.

Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User
Cc: stable@vger.kernel.org # 4.13.x
Cc: stable@vger.kernel.org # 4.14.x
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605
Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
Reviewed-by: Sinan Kaya <okaya@codeaurora.org>
Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
Signed-off-by: Adam Wallis <awallis@codeaurora.org>
---
changes from v2: Added "Fixes" tag
changes from v1: Added pre-req patches for stable

 drivers/dma/dmatest.c | 37 ++++++++++++++++---------------------
 1 file changed, 16 insertions(+), 21 deletions(-)

Comments

Dan Williams Nov. 17, 2017, 3:12 p.m. UTC | #1
On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote:
> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
> introduced a bug (that is in fact documented by the patch commit text)
> that leaves behind a dangling pointer. Since the done_wait structure is
> allocated on the stack, future invocations to the DMATEST can produce
> undesirable results (e.g., corrupted spinlocks).
>
> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times
> out") attempted to WARN the user that the stack was likely corrupted but
> did not fix the actual issue.
>
> This patch fixes the issue by pushing the wait queue and callback
> structs into the the thread structure. If a failure occurs due to time,
> dmaengine_terminate_all will force the callback to safely call
> wake_up_all() without possibility of using a freed pointer.
>
> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User
> Cc: stable@vger.kernel.org # 4.13.x
> Cc: stable@vger.kernel.org # 4.14.x

You don't need 3 cc stables, you don't even need the "#
kernel-version". Since you have the "Fixes:" line the target kernel(s)
for the backport can be auto-determined. I should go update
Documentation/process/stable-kernel-rules.rst to mention this.

> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605
> Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>
> Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
> Signed-off-by: Adam Wallis <awallis@codeaurora.org>
> ---
> changes from v2: Added "Fixes" tag
> changes from v1: Added pre-req patches for stable
>
>  drivers/dma/dmatest.c | 37 ++++++++++++++++---------------------
>  1 file changed, 16 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
> index 47edc7f..2573b6c 100644
> --- a/drivers/dma/dmatest.c
> +++ b/drivers/dma/dmatest.c
> @@ -155,6 +155,12 @@ struct dmatest_params {
>  #define PATTERN_COUNT_MASK     0x1f
>  #define PATTERN_MEMSET_IDX     0x01
>
> +/* poor man's completion - we want to use wait_event_freezable() on it */
> +struct dmatest_done {
> +       bool                    done;
> +       wait_queue_head_t       *wait;
> +};
> +
>  struct dmatest_thread {
>         struct list_head        node;
>         struct dmatest_info     *info;
> @@ -165,6 +171,8 @@ struct dmatest_thread {
>         u8                      **dsts;
>         u8                      **udsts;
>         enum dma_transaction_type type;
> +       wait_queue_head_t done_wait;

Why are we defining a waitquehead per thread vs defining one globally
for the whole module with "static DECLARE_WAIT_QUEUE_HEAD(x);"?
Adam Wallis Nov. 17, 2017, 3:28 p.m. UTC | #2
On 11/17/2017 10:12 AM, Dan Williams wrote:
> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote:
>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
>> introduced a bug (that is in fact documented by the patch commit text)
>> that leaves behind a dangling pointer. Since the done_wait structure is
>> allocated on the stack, future invocations to the DMATEST can produce
>> undesirable results (e.g., corrupted spinlocks).
>>
>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times
>> out") attempted to WARN the user that the stack was likely corrupted but
>> did not fix the actual issue.
>>
>> This patch fixes the issue by pushing the wait queue and callback
>> structs into the the thread structure. If a failure occurs due to time,
>> dmaengine_terminate_all will force the callback to safely call
>> wake_up_all() without possibility of using a freed pointer.
>>
>> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User
>> Cc: stable@vger.kernel.org # 4.13.x
>> Cc: stable@vger.kernel.org # 4.14.x
> 

Sure - do you want me to remove them? I was just following the instructions on
stable.

> You don't need 3 cc stables, you don't even need the "#
> kernel-version". Since you have the "Fixes:" line the target kernel(s)
> for the backport can be auto-determined. I should go update
> Documentation/process/stable-kernel-rules.rst to mention this.
> 
>> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605
>> Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
>> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>
>> Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
>> Signed-off-by: Adam Wallis <awallis@codeaurora.org>
>> ---
>> changes from v2: Added "Fixes" tag
>> changes from v1: Added pre-req patches for stable
>>
>>  drivers/dma/dmatest.c | 37 ++++++++++++++++---------------------
>>  1 file changed, 16 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
>> index 47edc7f..2573b6c 100644
>> --- a/drivers/dma/dmatest.c
>> +++ b/drivers/dma/dmatest.c
>> @@ -155,6 +155,12 @@ struct dmatest_params {
>>  #define PATTERN_COUNT_MASK     0x1f
>>  #define PATTERN_MEMSET_IDX     0x01
>>
>> +/* poor man's completion - we want to use wait_event_freezable() on it */
>> +struct dmatest_done {
>> +       bool                    done;
>> +       wait_queue_head_t       *wait;
>> +};
>> +
>>  struct dmatest_thread {
>>         struct list_head        node;
>>         struct dmatest_info     *info;
>> @@ -165,6 +171,8 @@ struct dmatest_thread {
>>         u8                      **dsts;
>>         u8                      **udsts;
>>         enum dma_transaction_type type;
>> +       wait_queue_head_t done_wait;
> 
> Why are we defining a waitquehead per thread vs defining one globally
> for the whole module with "static DECLARE_WAIT_QUEUE_HEAD(x);"?

This is how the original dmatest functions. Each thread had a wait queue that it
created so that it could go to sleep while the DMA transfer occurred. Each
thread is dependent on its own DMA transaction for the wakeup call. Again, this
is how the test originally worked. I just moved the wait queue from the stack
(which was getting corrupted) to the thread context to allow for safe cleanup.
In other words, I haven't really changed how the test works...just fixing a bug
with the current implementation.

> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Dan Williams Nov. 17, 2017, 3:57 p.m. UTC | #3
On Fri, Nov 17, 2017 at 7:28 AM, Adam Wallis <awallis@codeaurora.org> wrote:
> On 11/17/2017 10:12 AM, Dan Williams wrote:
>> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote:
>>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
>>> introduced a bug (that is in fact documented by the patch commit text)
>>> that leaves behind a dangling pointer. Since the done_wait structure is
>>> allocated on the stack, future invocations to the DMATEST can produce
>>> undesirable results (e.g., corrupted spinlocks).
>>>
>>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times
>>> out") attempted to WARN the user that the stack was likely corrupted but
>>> did not fix the actual issue.
>>>
>>> This patch fixes the issue by pushing the wait queue and callback
>>> structs into the the thread structure. If a failure occurs due to time,
>>> dmaengine_terminate_all will force the callback to safely call
>>> wake_up_all() without possibility of using a freed pointer.
>>>
>>> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User
>>> Cc: stable@vger.kernel.org # 4.13.x
>>> Cc: stable@vger.kernel.org # 4.14.x
>>
>
> Sure - do you want me to remove them? I was just following the instructions on
> stable.

It's not broken, just a note for next time.

>
>> You don't need 3 cc stables, you don't even need the "#
>> kernel-version". Since you have the "Fixes:" line the target kernel(s)
>> for the backport can be auto-determined. I should go update
>> Documentation/process/stable-kernel-rules.rst to mention this.
>>
>>> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605
>>> Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
>>> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>
>>> Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
>>> Signed-off-by: Adam Wallis <awallis@codeaurora.org>
>>> ---
>>> changes from v2: Added "Fixes" tag
>>> changes from v1: Added pre-req patches for stable
>>>
>>>  drivers/dma/dmatest.c | 37 ++++++++++++++++---------------------
>>>  1 file changed, 16 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
>>> index 47edc7f..2573b6c 100644
>>> --- a/drivers/dma/dmatest.c
>>> +++ b/drivers/dma/dmatest.c
>>> @@ -155,6 +155,12 @@ struct dmatest_params {
>>>  #define PATTERN_COUNT_MASK     0x1f
>>>  #define PATTERN_MEMSET_IDX     0x01
>>>
>>> +/* poor man's completion - we want to use wait_event_freezable() on it */
>>> +struct dmatest_done {
>>> +       bool                    done;
>>> +       wait_queue_head_t       *wait;
>>> +};
>>> +
>>>  struct dmatest_thread {
>>>         struct list_head        node;
>>>         struct dmatest_info     *info;
>>> @@ -165,6 +171,8 @@ struct dmatest_thread {
>>>         u8                      **dsts;
>>>         u8                      **udsts;
>>>         enum dma_transaction_type type;
>>> +       wait_queue_head_t done_wait;
>>
>> Why are we defining a waitquehead per thread vs defining one globally
>> for the whole module with "static DECLARE_WAIT_QUEUE_HEAD(x);"?
>
> This is how the original dmatest functions. Each thread had a wait queue that it
> created so that it could go to sleep while the DMA transfer occurred. Each
> thread is dependent on its own DMA transaction for the wakeup call. Again, this
> is how the test originally worked. I just moved the wait queue from the stack
> (which was getting corrupted) to the thread context to allow for safe cleanup.
> In other words, I haven't really changed how the test works...just fixing a bug
> with the current implementation.

Ok, always takes me a bit to re-orient myself to this file since I
only look at it once a year.

This fix seems incomplete. The next test iteration after a timeout
will now reuse the per-thread 'done' notification. If the engine that
timed out still completes its dma it will collide with the next
operation that is using the same 'done' variable. So it seems to me
that the wait_queue_head should be global, and the 'done' variable
should be either allocated per-operation or we should call
dmaengine_terminate_all() after a timeout. Since not all engines
implement a terminate I think the potential memory leak of a few
'done' variables is a better option.
Adam Wallis Nov. 17, 2017, 5:01 p.m. UTC | #4
On 11/17/2017 10:57 AM, Dan Williams wrote:
> On Fri, Nov 17, 2017 at 7:28 AM, Adam Wallis <awallis@codeaurora.org> wrote:
>> On 11/17/2017 10:12 AM, Dan Williams wrote:
>>> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote:
>>>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
>>>> introduced a bug (that is in fact documented by the patch commit text)
>>>> that leaves behind a dangling pointer. Since the done_wait structure is
>>>> allocated on the stack, future invocations to the DMATEST can produce
>>>> undesirable results (e.g., corrupted spinlocks).
>>>>
>>>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times
>>>> out") attempted to WARN the user that the stack was likely corrupted but
>>>> did not fix the actual issue.
>>>>
>>>> This patch fixes the issue by pushing the wait queue and callback
>>>> structs into the the thread structure. If a failure occurs due to time,
>>>> dmaengine_terminate_all will force the callback to safely call
>>>> wake_up_all() without possibility of using a freed pointer.
>>>>
>>>> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User
>>>> Cc: stable@vger.kernel.org # 4.13.x
>>>> Cc: stable@vger.kernel.org # 4.14.x
>>>
>>
>> Sure - do you want me to remove them? I was just following the instructions on
>> stable.
> 
> It's not broken, just a note for next time.
> 
>>
>>> You don't need 3 cc stables, you don't even need the "#
>>> kernel-version". Since you have the "Fixes:" line the target kernel(s)
>>> for the backport can be auto-determined. I should go update
>>> Documentation/process/stable-kernel-rules.rst to mention this.
>>>
>>>> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605
>>>> Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
>>>> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>
>>>> Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
>>>> Signed-off-by: Adam Wallis <awallis@codeaurora.org>
>>>> ---
>>>> changes from v2: Added "Fixes" tag
>>>> changes from v1: Added pre-req patches for stable
>>>>
>>>>  drivers/dma/dmatest.c | 37 ++++++++++++++++---------------------
>>>>  1 file changed, 16 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
>>>> index 47edc7f..2573b6c 100644
>>>> --- a/drivers/dma/dmatest.c
>>>> +++ b/drivers/dma/dmatest.c
>>>> @@ -155,6 +155,12 @@ struct dmatest_params {
>>>>  #define PATTERN_COUNT_MASK     0x1f
>>>>  #define PATTERN_MEMSET_IDX     0x01
>>>>
>>>> +/* poor man's completion - we want to use wait_event_freezable() on it */
>>>> +struct dmatest_done {
>>>> +       bool                    done;
>>>> +       wait_queue_head_t       *wait;
>>>> +};
>>>> +
>>>>  struct dmatest_thread {
>>>>         struct list_head        node;
>>>>         struct dmatest_info     *info;
>>>> @@ -165,6 +171,8 @@ struct dmatest_thread {
>>>>         u8                      **dsts;
>>>>         u8                      **udsts;
>>>>         enum dma_transaction_type type;
>>>> +       wait_queue_head_t done_wait;
>>>
>>> Why are we defining a waitquehead per thread vs defining one globally
>>> for the whole module with "static DECLARE_WAIT_QUEUE_HEAD(x);"?
>>
>> This is how the original dmatest functions. Each thread had a wait queue that it
>> created so that it could go to sleep while the DMA transfer occurred. Each
>> thread is dependent on its own DMA transaction for the wakeup call. Again, this
>> is how the test originally worked. I just moved the wait queue from the stack
>> (which was getting corrupted) to the thread context to allow for safe cleanup.
>> In other words, I haven't really changed how the test works...just fixing a bug
>> with the current implementation.
> 
> Ok, always takes me a bit to re-orient myself to this file since I
> only look at it once a year.
> 
> This fix seems incomplete. The next test iteration after a timeout
> will now reuse the per-thread 'done' notification. If the engine that
> timed out still completes its dma it will collide with the next
> operation that is using the same 'done' variable. So it seems to me
> that the wait_queue_head should be global, and the 'done' variable
> should be either allocated per-operation or we should call
> dmaengine_terminate_all() after a timeout. Since not all engines
> implement a terminate I think the potential memory leak of a few
> 'done' variables is a better option.
> 
Dan
An important part of my patch was severed in this v3 submission. My apologies.

There is a change that addresses, I believe, your concern that was in v2

 	/* terminate all transfers on specified channels */
-	if (ret)
+	if (ret || failed_tests)
 		dmaengine_terminate_all(chan);

Will clean up again, retest, and resubmit. Thanks for your patience and instruction.

Adam
Adam Wallis Nov. 17, 2017, 5:17 p.m. UTC | #5
On 11/17/2017 12:01 PM, Adam Wallis wrote:
> On 11/17/2017 10:57 AM, Dan Williams wrote:
>> On Fri, Nov 17, 2017 at 7:28 AM, Adam Wallis <awallis@codeaurora.org> wrote:
>>> On 11/17/2017 10:12 AM, Dan Williams wrote:
>>>> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote:
>>>>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
>>>>> introduced a bug (that is in fact documented by the patch commit text)
>>>>> that leaves behind a dangling pointer. Since the done_wait structure is
>>>>> allocated on the stack, future invocations to the DMATEST can produce
>>>>> undesirable results (e.g., corrupted spinlocks).
>>>>>
>>>>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times
>>>>> out") attempted to WARN the user that the stack was likely corrupted but
>>>>> did not fix the actual issue.
>>>>>
>>>>> This patch fixes the issue by pushing the wait queue and callback
>>>>> structs into the the thread structure. If a failure occurs due to time,
>>>>> dmaengine_terminate_all will force the callback to safely call
>>>>> wake_up_all() without possibility of using a freed pointer.
>>>>>
>>>>> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User
>>>>> Cc: stable@vger.kernel.org # 4.13.x
>>>>> Cc: stable@vger.kernel.org # 4.14.x
>>>>
>>>
>>> Sure - do you want me to remove them? I was just following the instructions on
>>> stable.
>>
>> It's not broken, just a note for next time.
>>
One last question - if all we need is Fixes and CC stable...how do we indicate a
dependency (for a cherry pick)?

Thanks
Adam Wallis Nov. 17, 2017, 5:28 p.m. UTC | #6
On 11/17/2017 12:01 PM, Adam Wallis wrote:
> On 11/17/2017 10:57 AM, Dan Williams wrote:
>> On Fri, Nov 17, 2017 at 7:28 AM, Adam Wallis <awallis@codeaurora.org> wrote:
>>> On 11/17/2017 10:12 AM, Dan Williams wrote:
>>>> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote:
>>>>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
>>>>> introduced a bug (that is in fact documented by the patch commit text)
>>>>> that leaves behind a dangling pointer. Since the done_wait structure is
>>>>> allocated on the stack, future invocations to the DMATEST can produce
>>>>> undesirable results (e.g., corrupted spinlocks).
>>>>>
>>>>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times
>>>>> out") attempted to WARN the user that the stack was likely corrupted but
>>>>> did not fix the actual issue.
>>>>>
>>>>> This patch fixes the issue by pushing the wait queue and callback
>>>>> structs into the the thread structure. If a failure occurs due to time,
>>>>> dmaengine_terminate_all will force the callback to safely call
>>>>> wake_up_all() without possibility of using a freed pointer.
>>>>>
>>>>> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User
>>>>> Cc: stable@vger.kernel.org # 4.13.x
>>>>> Cc: stable@vger.kernel.org # 4.14.x
>>>>
>>>
>>> Sure - do you want me to remove them? I was just following the instructions on
>>> stable.
>>
>> It's not broken, just a note for next time.
>>
>>>
>>>> You don't need 3 cc stables, you don't even need the "#
>>>> kernel-version". Since you have the "Fixes:" line the target kernel(s)
>>>> for the backport can be auto-determined. I should go update
>>>> Documentation/process/stable-kernel-rules.rst to mention this.
>>>>
>>>>> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605
>>>>> Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
>>>>> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>
>>>>> Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
>>>>> Signed-off-by: Adam Wallis <awallis@codeaurora.org>
>>>>> ---
>>>>> changes from v2: Added "Fixes" tag
>>>>> changes from v1: Added pre-req patches for stable
>>>>>
>>>>>  drivers/dma/dmatest.c | 37 ++++++++++++++++---------------------
>>>>>  1 file changed, 16 insertions(+), 21 deletions(-)
>>>>>
>>>>> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
>>>>> index 47edc7f..2573b6c 100644
>>>>> --- a/drivers/dma/dmatest.c
>>>>> +++ b/drivers/dma/dmatest.c
>>>>> @@ -155,6 +155,12 @@ struct dmatest_params {
>>>>>  #define PATTERN_COUNT_MASK     0x1f
>>>>>  #define PATTERN_MEMSET_IDX     0x01
>>>>>
>>>>> +/* poor man's completion - we want to use wait_event_freezable() on it */
>>>>> +struct dmatest_done {
>>>>> +       bool                    done;
>>>>> +       wait_queue_head_t       *wait;
>>>>> +};
>>>>> +
>>>>>  struct dmatest_thread {
>>>>>         struct list_head        node;
>>>>>         struct dmatest_info     *info;
>>>>> @@ -165,6 +171,8 @@ struct dmatest_thread {
>>>>>         u8                      **dsts;
>>>>>         u8                      **udsts;
>>>>>         enum dma_transaction_type type;
>>>>> +       wait_queue_head_t done_wait;
>>>>
>>>> Why are we defining a waitquehead per thread vs defining one globally
>>>> for the whole module with "static DECLARE_WAIT_QUEUE_HEAD(x);"?
>>>
>>> This is how the original dmatest functions. Each thread had a wait queue that it
>>> created so that it could go to sleep while the DMA transfer occurred. Each
>>> thread is dependent on its own DMA transaction for the wakeup call. Again, this
>>> is how the test originally worked. I just moved the wait queue from the stack
>>> (which was getting corrupted) to the thread context to allow for safe cleanup.
>>> In other words, I haven't really changed how the test works...just fixing a bug
>>> with the current implementation.
>>
>> Ok, always takes me a bit to re-orient myself to this file since I
>> only look at it once a year.
>>
>> This fix seems incomplete. The next test iteration after a timeout
>> will now reuse the per-thread 'done' notification. If the engine that
>> timed out still completes its dma it will collide with the next
>> operation that is using the same 'done' variable. So it seems to me
>> that the wait_queue_head should be global, and the 'done' variable
>> should be either allocated per-operation or we should call
>> dmaengine_terminate_all() after a timeout. Since not all engines
>> implement a terminate I think the potential memory leak of a few
>> 'done' variables is a better option.
>>
> Dan
> An important part of my patch was severed in this v3 submission. My apologies.
> 
> There is a change that addresses, I believe, your concern that was in v2
> 
>  	/* terminate all transfers on specified channels */
> -	if (ret)
> +	if (ret || failed_tests)
>  		dmaengine_terminate_all(chan);
> 
> Will clean up again, retest, and resubmit. Thanks for your patience and instruction.

Dan, I thought the patch was truncated, but it's all there in V3. I should have
finished my coffee before responding. You are absolutely right that in the timed
out case that dmaengine_terminate_all(chan) should be called, and that change is
in fact already included in this patch set

@@ -789,7 +782,7 @@ static int dmatest_func(void *data)
 		dmatest_KBs(runtime, total_len), ret);

 	/* terminate all transfers on specified channels */
-	if (ret)
+	if (ret || failed_tests)
 		dmaengine_terminate_all(chan);

Would you prefer that I add a better description in the commit text to address
the fact this was in fact added?

> 
> Adam
>
Dan Williams Nov. 17, 2017, 5:33 p.m. UTC | #7
On Fri, Nov 17, 2017 at 9:28 AM, Adam Wallis <awallis@codeaurora.org> wrote:
> On 11/17/2017 12:01 PM, Adam Wallis wrote:
[..]
>> Dan
>> An important part of my patch was severed in this v3 submission. My apologies.
>>
>> There is a change that addresses, I believe, your concern that was in v2
>>
>>       /* terminate all transfers on specified channels */
>> -     if (ret)
>> +     if (ret || failed_tests)
>>               dmaengine_terminate_all(chan);
>>
>> Will clean up again, retest, and resubmit. Thanks for your patience and instruction.
>
> Dan, I thought the patch was truncated, but it's all there in V3. I should have
> finished my coffee before responding. You are absolutely right that in the timed
> out case that dmaengine_terminate_all(chan) should be called, and that change is
> in fact already included in this patch set
>
> @@ -789,7 +782,7 @@ static int dmatest_func(void *data)
>                 dmatest_KBs(runtime, total_len), ret);
>
>         /* terminate all transfers on specified channels */
> -       if (ret)
> +       if (ret || failed_tests)
>                 dmaengine_terminate_all(chan);
>
> Would you prefer that I add a better description in the commit text to address
> the fact this was in fact added?

Ah, sorry I overlooked that.

What about the case where the dmaengine does not support the
terminate_all operation? I think we should WARN in that case, but
that's also where the per-operation allocation of a done variable can
prevent done notifications leaking between operations.
Dan Williams Nov. 17, 2017, 5:36 p.m. UTC | #8
On Fri, Nov 17, 2017 at 9:17 AM, Adam Wallis <awallis@codeaurora.org> wrote:
> On 11/17/2017 12:01 PM, Adam Wallis wrote:
>> On 11/17/2017 10:57 AM, Dan Williams wrote:
>>> On Fri, Nov 17, 2017 at 7:28 AM, Adam Wallis <awallis@codeaurora.org> wrote:
>>>> On 11/17/2017 10:12 AM, Dan Williams wrote:
>>>>> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote:
>>>>>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
>>>>>> introduced a bug (that is in fact documented by the patch commit text)
>>>>>> that leaves behind a dangling pointer. Since the done_wait structure is
>>>>>> allocated on the stack, future invocations to the DMATEST can produce
>>>>>> undesirable results (e.g., corrupted spinlocks).
>>>>>>
>>>>>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times
>>>>>> out") attempted to WARN the user that the stack was likely corrupted but
>>>>>> did not fix the actual issue.
>>>>>>
>>>>>> This patch fixes the issue by pushing the wait queue and callback
>>>>>> structs into the the thread structure. If a failure occurs due to time,
>>>>>> dmaengine_terminate_all will force the callback to safely call
>>>>>> wake_up_all() without possibility of using a freed pointer.
>>>>>>
>>>>>> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User
>>>>>> Cc: stable@vger.kernel.org # 4.13.x
>>>>>> Cc: stable@vger.kernel.org # 4.14.x
>>>>>
>>>>
>>>> Sure - do you want me to remove them? I was just following the instructions on
>>>> stable.
>>>
>>> It's not broken, just a note for next time.
>>>
> One last question - if all we need is Fixes and CC stable...how do we indicate a
> dependency (for a cherry pick)?

For cherry picks the document does say to include multiple "Cc:
stable" lines. So ignore what I said you're following the proper
process.

The way I've handled this is to just wait for the -stable team to
notify that the patch does not apply and send a custom backport at
that point.
Adam Wallis Nov. 20, 2017, 3:27 p.m. UTC | #9
On 11/17/2017 12:33 PM, Dan Williams wrote:
> On Fri, Nov 17, 2017 at 9:28 AM, Adam Wallis <awallis@codeaurora.org> wrote:
>> On 11/17/2017 12:01 PM, Adam Wallis wrote:
> [..]
>>> Dan
>>> An important part of my patch was severed in this v3 submission. My apologies.
>>>
>>> There is a change that addresses, I believe, your concern that was in v2
>>>
>>>       /* terminate all transfers on specified channels */
>>> -     if (ret)
>>> +     if (ret || failed_tests)
>>>               dmaengine_terminate_all(chan);
>>>
>>> Will clean up again, retest, and resubmit. Thanks for your patience and instruction.
>>
>> Dan, I thought the patch was truncated, but it's all there in V3. I should have
>> finished my coffee before responding. You are absolutely right that in the timed
>> out case that dmaengine_terminate_all(chan) should be called, and that change is
>> in fact already included in this patch set
>>
>> @@ -789,7 +782,7 @@ static int dmatest_func(void *data)
>>                 dmatest_KBs(runtime, total_len), ret);
>>
>>         /* terminate all transfers on specified channels */
>> -       if (ret)
>> +       if (ret || failed_tests)
>>                 dmaengine_terminate_all(chan);
>>
>> Would you prefer that I add a better description in the commit text to address
>> the fact this was in fact added?
> 
> Ah, sorry I overlooked that.
> 
> What about the case where the dmaengine does not support the
> terminate_all operation? I think we should WARN in that case, but
> that's also where the per-operation allocation of a done variable can
> prevent done notifications leaking between operations.
> 

Dan, thanks for your comments. I just released v4 of this patch. Let me know if
I misunderstood what you said above. I do acknowledge that container_of might be
less desirable than just changing the argument that we pass to the callback.
Open to your suggestions!

> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
diff mbox

Patch

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index 47edc7f..2573b6c 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -155,6 +155,12 @@  struct dmatest_params {
 #define PATTERN_COUNT_MASK	0x1f
 #define PATTERN_MEMSET_IDX	0x01
 
+/* poor man's completion - we want to use wait_event_freezable() on it */
+struct dmatest_done {
+	bool			done;
+	wait_queue_head_t	*wait;
+};
+
 struct dmatest_thread {
 	struct list_head	node;
 	struct dmatest_info	*info;
@@ -165,6 +171,8 @@  struct dmatest_thread {
 	u8			**dsts;
 	u8			**udsts;
 	enum dma_transaction_type type;
+	wait_queue_head_t done_wait;
+	struct dmatest_done test_done;
 	bool			done;
 };
 
@@ -342,11 +350,6 @@  static unsigned int dmatest_verify(u8 **bufs, unsigned int start,
 	return error_count;
 }
 
-/* poor man's completion - we want to use wait_event_freezable() on it */
-struct dmatest_done {
-	bool			done;
-	wait_queue_head_t	*wait;
-};
 
 static void dmatest_callback(void *arg)
 {
@@ -424,9 +427,8 @@  static unsigned long long dmatest_KBs(s64 runtime, unsigned long long len)
  */
 static int dmatest_func(void *data)
 {
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(done_wait);
 	struct dmatest_thread	*thread = data;
-	struct dmatest_done	done = { .wait = &done_wait };
+	struct dmatest_done	*done = &thread->test_done;
 	struct dmatest_info	*info;
 	struct dmatest_params	*params;
 	struct dma_chan		*chan;
@@ -673,9 +675,9 @@  static int dmatest_func(void *data)
 			continue;
 		}
 
-		done.done = false;
+		done->done = false;
 		tx->callback = dmatest_callback;
-		tx->callback_param = &done;
+		tx->callback_param = done;
 		cookie = tx->tx_submit(tx);
 
 		if (dma_submit_error(cookie)) {
@@ -688,21 +690,12 @@  static int dmatest_func(void *data)
 		}
 		dma_async_issue_pending(chan);
 
-		wait_event_freezable_timeout(done_wait, done.done,
+		wait_event_freezable_timeout(thread->done_wait, done->done,
 					     msecs_to_jiffies(params->timeout));
 
 		status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
 
-		if (!done.done) {
-			/*
-			 * We're leaving the timed out dma operation with
-			 * dangling pointer to done_wait.  To make this
-			 * correct, we'll need to allocate wait_done for
-			 * each test iteration and perform "who's gonna
-			 * free it this time?" dancing.  For now, just
-			 * leave it dangling.
-			 */
-			WARN(1, "dmatest: Kernel stack may be corrupted!!\n");
+		if (!done->done) {
 			dmaengine_unmap_put(um);
 			result("test timed out", total_tests, src_off, dst_off,
 			       len, 0);
@@ -789,7 +782,7 @@  static int dmatest_func(void *data)
 		dmatest_KBs(runtime, total_len), ret);
 
 	/* terminate all transfers on specified channels */
-	if (ret)
+	if (ret || failed_tests)
 		dmaengine_terminate_all(chan);
 
 	thread->done = true;
@@ -849,6 +842,8 @@  static int dmatest_add_threads(struct dmatest_info *info,
 		thread->info = info;
 		thread->chan = dtc->chan;
 		thread->type = type;
+		thread->test_done.wait = &thread->done_wait;
+		init_waitqueue_head(&thread->done_wait);
 		smp_wmb();
 		thread->task = kthread_create(dmatest_func, thread, "%s-%s%u",
 				dma_chan_name(chan), op, i);