Message ID | 1510927894-15297-1-git-send-email-awallis@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote: > Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") > introduced a bug (that is in fact documented by the patch commit text) > that leaves behind a dangling pointer. Since the done_wait structure is > allocated on the stack, future invocations to the DMATEST can produce > undesirable results (e.g., corrupted spinlocks). > > Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times > out") attempted to WARN the user that the stack was likely corrupted but > did not fix the actual issue. > > This patch fixes the issue by pushing the wait queue and callback > structs into the the thread structure. If a failure occurs due to time, > dmaengine_terminate_all will force the callback to safely call > wake_up_all() without possibility of using a freed pointer. > > Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User > Cc: stable@vger.kernel.org # 4.13.x > Cc: stable@vger.kernel.org # 4.14.x You don't need 3 cc stables, you don't even need the "# kernel-version". Since you have the "Fixes:" line the target kernel(s) for the backport can be auto-determined. I should go update Documentation/process/stable-kernel-rules.rst to mention this. > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605 > Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") > Reviewed-by: Sinan Kaya <okaya@codeaurora.org> > Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > Signed-off-by: Adam Wallis <awallis@codeaurora.org> > --- > changes from v2: Added "Fixes" tag > changes from v1: Added pre-req patches for stable > > drivers/dma/dmatest.c | 37 ++++++++++++++++--------------------- > 1 file changed, 16 insertions(+), 21 deletions(-) > > diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c > index 47edc7f..2573b6c 100644 > --- a/drivers/dma/dmatest.c > +++ b/drivers/dma/dmatest.c > @@ -155,6 +155,12 @@ struct dmatest_params { > #define PATTERN_COUNT_MASK 0x1f > #define PATTERN_MEMSET_IDX 0x01 > > +/* poor man's completion - we want to use wait_event_freezable() on it */ > +struct dmatest_done { > + bool done; > + wait_queue_head_t *wait; > +}; > + > struct dmatest_thread { > struct list_head node; > struct dmatest_info *info; > @@ -165,6 +171,8 @@ struct dmatest_thread { > u8 **dsts; > u8 **udsts; > enum dma_transaction_type type; > + wait_queue_head_t done_wait; Why are we defining a waitquehead per thread vs defining one globally for the whole module with "static DECLARE_WAIT_QUEUE_HEAD(x);"?
On 11/17/2017 10:12 AM, Dan Williams wrote: > On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote: >> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >> introduced a bug (that is in fact documented by the patch commit text) >> that leaves behind a dangling pointer. Since the done_wait structure is >> allocated on the stack, future invocations to the DMATEST can produce >> undesirable results (e.g., corrupted spinlocks). >> >> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times >> out") attempted to WARN the user that the stack was likely corrupted but >> did not fix the actual issue. >> >> This patch fixes the issue by pushing the wait queue and callback >> structs into the the thread structure. If a failure occurs due to time, >> dmaengine_terminate_all will force the callback to safely call >> wake_up_all() without possibility of using a freed pointer. >> >> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User >> Cc: stable@vger.kernel.org # 4.13.x >> Cc: stable@vger.kernel.org # 4.14.x > Sure - do you want me to remove them? I was just following the instructions on stable. > You don't need 3 cc stables, you don't even need the "# > kernel-version". Since you have the "Fixes:" line the target kernel(s) > for the backport can be auto-determined. I should go update > Documentation/process/stable-kernel-rules.rst to mention this. > >> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605 >> Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >> Reviewed-by: Sinan Kaya <okaya@codeaurora.org> >> Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >> Signed-off-by: Adam Wallis <awallis@codeaurora.org> >> --- >> changes from v2: Added "Fixes" tag >> changes from v1: Added pre-req patches for stable >> >> drivers/dma/dmatest.c | 37 ++++++++++++++++--------------------- >> 1 file changed, 16 insertions(+), 21 deletions(-) >> >> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c >> index 47edc7f..2573b6c 100644 >> --- a/drivers/dma/dmatest.c >> +++ b/drivers/dma/dmatest.c >> @@ -155,6 +155,12 @@ struct dmatest_params { >> #define PATTERN_COUNT_MASK 0x1f >> #define PATTERN_MEMSET_IDX 0x01 >> >> +/* poor man's completion - we want to use wait_event_freezable() on it */ >> +struct dmatest_done { >> + bool done; >> + wait_queue_head_t *wait; >> +}; >> + >> struct dmatest_thread { >> struct list_head node; >> struct dmatest_info *info; >> @@ -165,6 +171,8 @@ struct dmatest_thread { >> u8 **dsts; >> u8 **udsts; >> enum dma_transaction_type type; >> + wait_queue_head_t done_wait; > > Why are we defining a waitquehead per thread vs defining one globally > for the whole module with "static DECLARE_WAIT_QUEUE_HEAD(x);"? This is how the original dmatest functions. Each thread had a wait queue that it created so that it could go to sleep while the DMA transfer occurred. Each thread is dependent on its own DMA transaction for the wakeup call. Again, this is how the test originally worked. I just moved the wait queue from the stack (which was getting corrupted) to the thread context to allow for safe cleanup. In other words, I haven't really changed how the test works...just fixing a bug with the current implementation. > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
On Fri, Nov 17, 2017 at 7:28 AM, Adam Wallis <awallis@codeaurora.org> wrote: > On 11/17/2017 10:12 AM, Dan Williams wrote: >> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote: >>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >>> introduced a bug (that is in fact documented by the patch commit text) >>> that leaves behind a dangling pointer. Since the done_wait structure is >>> allocated on the stack, future invocations to the DMATEST can produce >>> undesirable results (e.g., corrupted spinlocks). >>> >>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times >>> out") attempted to WARN the user that the stack was likely corrupted but >>> did not fix the actual issue. >>> >>> This patch fixes the issue by pushing the wait queue and callback >>> structs into the the thread structure. If a failure occurs due to time, >>> dmaengine_terminate_all will force the callback to safely call >>> wake_up_all() without possibility of using a freed pointer. >>> >>> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User >>> Cc: stable@vger.kernel.org # 4.13.x >>> Cc: stable@vger.kernel.org # 4.14.x >> > > Sure - do you want me to remove them? I was just following the instructions on > stable. It's not broken, just a note for next time. > >> You don't need 3 cc stables, you don't even need the "# >> kernel-version". Since you have the "Fixes:" line the target kernel(s) >> for the backport can be auto-determined. I should go update >> Documentation/process/stable-kernel-rules.rst to mention this. >> >>> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605 >>> Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >>> Reviewed-by: Sinan Kaya <okaya@codeaurora.org> >>> Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>> Signed-off-by: Adam Wallis <awallis@codeaurora.org> >>> --- >>> changes from v2: Added "Fixes" tag >>> changes from v1: Added pre-req patches for stable >>> >>> drivers/dma/dmatest.c | 37 ++++++++++++++++--------------------- >>> 1 file changed, 16 insertions(+), 21 deletions(-) >>> >>> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c >>> index 47edc7f..2573b6c 100644 >>> --- a/drivers/dma/dmatest.c >>> +++ b/drivers/dma/dmatest.c >>> @@ -155,6 +155,12 @@ struct dmatest_params { >>> #define PATTERN_COUNT_MASK 0x1f >>> #define PATTERN_MEMSET_IDX 0x01 >>> >>> +/* poor man's completion - we want to use wait_event_freezable() on it */ >>> +struct dmatest_done { >>> + bool done; >>> + wait_queue_head_t *wait; >>> +}; >>> + >>> struct dmatest_thread { >>> struct list_head node; >>> struct dmatest_info *info; >>> @@ -165,6 +171,8 @@ struct dmatest_thread { >>> u8 **dsts; >>> u8 **udsts; >>> enum dma_transaction_type type; >>> + wait_queue_head_t done_wait; >> >> Why are we defining a waitquehead per thread vs defining one globally >> for the whole module with "static DECLARE_WAIT_QUEUE_HEAD(x);"? > > This is how the original dmatest functions. Each thread had a wait queue that it > created so that it could go to sleep while the DMA transfer occurred. Each > thread is dependent on its own DMA transaction for the wakeup call. Again, this > is how the test originally worked. I just moved the wait queue from the stack > (which was getting corrupted) to the thread context to allow for safe cleanup. > In other words, I haven't really changed how the test works...just fixing a bug > with the current implementation. Ok, always takes me a bit to re-orient myself to this file since I only look at it once a year. This fix seems incomplete. The next test iteration after a timeout will now reuse the per-thread 'done' notification. If the engine that timed out still completes its dma it will collide with the next operation that is using the same 'done' variable. So it seems to me that the wait_queue_head should be global, and the 'done' variable should be either allocated per-operation or we should call dmaengine_terminate_all() after a timeout. Since not all engines implement a terminate I think the potential memory leak of a few 'done' variables is a better option.
On 11/17/2017 10:57 AM, Dan Williams wrote: > On Fri, Nov 17, 2017 at 7:28 AM, Adam Wallis <awallis@codeaurora.org> wrote: >> On 11/17/2017 10:12 AM, Dan Williams wrote: >>> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote: >>>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >>>> introduced a bug (that is in fact documented by the patch commit text) >>>> that leaves behind a dangling pointer. Since the done_wait structure is >>>> allocated on the stack, future invocations to the DMATEST can produce >>>> undesirable results (e.g., corrupted spinlocks). >>>> >>>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times >>>> out") attempted to WARN the user that the stack was likely corrupted but >>>> did not fix the actual issue. >>>> >>>> This patch fixes the issue by pushing the wait queue and callback >>>> structs into the the thread structure. If a failure occurs due to time, >>>> dmaengine_terminate_all will force the callback to safely call >>>> wake_up_all() without possibility of using a freed pointer. >>>> >>>> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User >>>> Cc: stable@vger.kernel.org # 4.13.x >>>> Cc: stable@vger.kernel.org # 4.14.x >>> >> >> Sure - do you want me to remove them? I was just following the instructions on >> stable. > > It's not broken, just a note for next time. > >> >>> You don't need 3 cc stables, you don't even need the "# >>> kernel-version". Since you have the "Fixes:" line the target kernel(s) >>> for the backport can be auto-determined. I should go update >>> Documentation/process/stable-kernel-rules.rst to mention this. >>> >>>> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605 >>>> Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >>>> Reviewed-by: Sinan Kaya <okaya@codeaurora.org> >>>> Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>>> Signed-off-by: Adam Wallis <awallis@codeaurora.org> >>>> --- >>>> changes from v2: Added "Fixes" tag >>>> changes from v1: Added pre-req patches for stable >>>> >>>> drivers/dma/dmatest.c | 37 ++++++++++++++++--------------------- >>>> 1 file changed, 16 insertions(+), 21 deletions(-) >>>> >>>> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c >>>> index 47edc7f..2573b6c 100644 >>>> --- a/drivers/dma/dmatest.c >>>> +++ b/drivers/dma/dmatest.c >>>> @@ -155,6 +155,12 @@ struct dmatest_params { >>>> #define PATTERN_COUNT_MASK 0x1f >>>> #define PATTERN_MEMSET_IDX 0x01 >>>> >>>> +/* poor man's completion - we want to use wait_event_freezable() on it */ >>>> +struct dmatest_done { >>>> + bool done; >>>> + wait_queue_head_t *wait; >>>> +}; >>>> + >>>> struct dmatest_thread { >>>> struct list_head node; >>>> struct dmatest_info *info; >>>> @@ -165,6 +171,8 @@ struct dmatest_thread { >>>> u8 **dsts; >>>> u8 **udsts; >>>> enum dma_transaction_type type; >>>> + wait_queue_head_t done_wait; >>> >>> Why are we defining a waitquehead per thread vs defining one globally >>> for the whole module with "static DECLARE_WAIT_QUEUE_HEAD(x);"? >> >> This is how the original dmatest functions. Each thread had a wait queue that it >> created so that it could go to sleep while the DMA transfer occurred. Each >> thread is dependent on its own DMA transaction for the wakeup call. Again, this >> is how the test originally worked. I just moved the wait queue from the stack >> (which was getting corrupted) to the thread context to allow for safe cleanup. >> In other words, I haven't really changed how the test works...just fixing a bug >> with the current implementation. > > Ok, always takes me a bit to re-orient myself to this file since I > only look at it once a year. > > This fix seems incomplete. The next test iteration after a timeout > will now reuse the per-thread 'done' notification. If the engine that > timed out still completes its dma it will collide with the next > operation that is using the same 'done' variable. So it seems to me > that the wait_queue_head should be global, and the 'done' variable > should be either allocated per-operation or we should call > dmaengine_terminate_all() after a timeout. Since not all engines > implement a terminate I think the potential memory leak of a few > 'done' variables is a better option. > Dan An important part of my patch was severed in this v3 submission. My apologies. There is a change that addresses, I believe, your concern that was in v2 /* terminate all transfers on specified channels */ - if (ret) + if (ret || failed_tests) dmaengine_terminate_all(chan); Will clean up again, retest, and resubmit. Thanks for your patience and instruction. Adam
On 11/17/2017 12:01 PM, Adam Wallis wrote: > On 11/17/2017 10:57 AM, Dan Williams wrote: >> On Fri, Nov 17, 2017 at 7:28 AM, Adam Wallis <awallis@codeaurora.org> wrote: >>> On 11/17/2017 10:12 AM, Dan Williams wrote: >>>> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote: >>>>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >>>>> introduced a bug (that is in fact documented by the patch commit text) >>>>> that leaves behind a dangling pointer. Since the done_wait structure is >>>>> allocated on the stack, future invocations to the DMATEST can produce >>>>> undesirable results (e.g., corrupted spinlocks). >>>>> >>>>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times >>>>> out") attempted to WARN the user that the stack was likely corrupted but >>>>> did not fix the actual issue. >>>>> >>>>> This patch fixes the issue by pushing the wait queue and callback >>>>> structs into the the thread structure. If a failure occurs due to time, >>>>> dmaengine_terminate_all will force the callback to safely call >>>>> wake_up_all() without possibility of using a freed pointer. >>>>> >>>>> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User >>>>> Cc: stable@vger.kernel.org # 4.13.x >>>>> Cc: stable@vger.kernel.org # 4.14.x >>>> >>> >>> Sure - do you want me to remove them? I was just following the instructions on >>> stable. >> >> It's not broken, just a note for next time. >> One last question - if all we need is Fixes and CC stable...how do we indicate a dependency (for a cherry pick)? Thanks
On 11/17/2017 12:01 PM, Adam Wallis wrote: > On 11/17/2017 10:57 AM, Dan Williams wrote: >> On Fri, Nov 17, 2017 at 7:28 AM, Adam Wallis <awallis@codeaurora.org> wrote: >>> On 11/17/2017 10:12 AM, Dan Williams wrote: >>>> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote: >>>>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >>>>> introduced a bug (that is in fact documented by the patch commit text) >>>>> that leaves behind a dangling pointer. Since the done_wait structure is >>>>> allocated on the stack, future invocations to the DMATEST can produce >>>>> undesirable results (e.g., corrupted spinlocks). >>>>> >>>>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times >>>>> out") attempted to WARN the user that the stack was likely corrupted but >>>>> did not fix the actual issue. >>>>> >>>>> This patch fixes the issue by pushing the wait queue and callback >>>>> structs into the the thread structure. If a failure occurs due to time, >>>>> dmaengine_terminate_all will force the callback to safely call >>>>> wake_up_all() without possibility of using a freed pointer. >>>>> >>>>> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User >>>>> Cc: stable@vger.kernel.org # 4.13.x >>>>> Cc: stable@vger.kernel.org # 4.14.x >>>> >>> >>> Sure - do you want me to remove them? I was just following the instructions on >>> stable. >> >> It's not broken, just a note for next time. >> >>> >>>> You don't need 3 cc stables, you don't even need the "# >>>> kernel-version". Since you have the "Fixes:" line the target kernel(s) >>>> for the backport can be auto-determined. I should go update >>>> Documentation/process/stable-kernel-rules.rst to mention this. >>>> >>>>> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605 >>>>> Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >>>>> Reviewed-by: Sinan Kaya <okaya@codeaurora.org> >>>>> Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>>>> Signed-off-by: Adam Wallis <awallis@codeaurora.org> >>>>> --- >>>>> changes from v2: Added "Fixes" tag >>>>> changes from v1: Added pre-req patches for stable >>>>> >>>>> drivers/dma/dmatest.c | 37 ++++++++++++++++--------------------- >>>>> 1 file changed, 16 insertions(+), 21 deletions(-) >>>>> >>>>> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c >>>>> index 47edc7f..2573b6c 100644 >>>>> --- a/drivers/dma/dmatest.c >>>>> +++ b/drivers/dma/dmatest.c >>>>> @@ -155,6 +155,12 @@ struct dmatest_params { >>>>> #define PATTERN_COUNT_MASK 0x1f >>>>> #define PATTERN_MEMSET_IDX 0x01 >>>>> >>>>> +/* poor man's completion - we want to use wait_event_freezable() on it */ >>>>> +struct dmatest_done { >>>>> + bool done; >>>>> + wait_queue_head_t *wait; >>>>> +}; >>>>> + >>>>> struct dmatest_thread { >>>>> struct list_head node; >>>>> struct dmatest_info *info; >>>>> @@ -165,6 +171,8 @@ struct dmatest_thread { >>>>> u8 **dsts; >>>>> u8 **udsts; >>>>> enum dma_transaction_type type; >>>>> + wait_queue_head_t done_wait; >>>> >>>> Why are we defining a waitquehead per thread vs defining one globally >>>> for the whole module with "static DECLARE_WAIT_QUEUE_HEAD(x);"? >>> >>> This is how the original dmatest functions. Each thread had a wait queue that it >>> created so that it could go to sleep while the DMA transfer occurred. Each >>> thread is dependent on its own DMA transaction for the wakeup call. Again, this >>> is how the test originally worked. I just moved the wait queue from the stack >>> (which was getting corrupted) to the thread context to allow for safe cleanup. >>> In other words, I haven't really changed how the test works...just fixing a bug >>> with the current implementation. >> >> Ok, always takes me a bit to re-orient myself to this file since I >> only look at it once a year. >> >> This fix seems incomplete. The next test iteration after a timeout >> will now reuse the per-thread 'done' notification. If the engine that >> timed out still completes its dma it will collide with the next >> operation that is using the same 'done' variable. So it seems to me >> that the wait_queue_head should be global, and the 'done' variable >> should be either allocated per-operation or we should call >> dmaengine_terminate_all() after a timeout. Since not all engines >> implement a terminate I think the potential memory leak of a few >> 'done' variables is a better option. >> > Dan > An important part of my patch was severed in this v3 submission. My apologies. > > There is a change that addresses, I believe, your concern that was in v2 > > /* terminate all transfers on specified channels */ > - if (ret) > + if (ret || failed_tests) > dmaengine_terminate_all(chan); > > Will clean up again, retest, and resubmit. Thanks for your patience and instruction. Dan, I thought the patch was truncated, but it's all there in V3. I should have finished my coffee before responding. You are absolutely right that in the timed out case that dmaengine_terminate_all(chan) should be called, and that change is in fact already included in this patch set @@ -789,7 +782,7 @@ static int dmatest_func(void *data) dmatest_KBs(runtime, total_len), ret); /* terminate all transfers on specified channels */ - if (ret) + if (ret || failed_tests) dmaengine_terminate_all(chan); Would you prefer that I add a better description in the commit text to address the fact this was in fact added? > > Adam >
On Fri, Nov 17, 2017 at 9:28 AM, Adam Wallis <awallis@codeaurora.org> wrote: > On 11/17/2017 12:01 PM, Adam Wallis wrote: [..] >> Dan >> An important part of my patch was severed in this v3 submission. My apologies. >> >> There is a change that addresses, I believe, your concern that was in v2 >> >> /* terminate all transfers on specified channels */ >> - if (ret) >> + if (ret || failed_tests) >> dmaengine_terminate_all(chan); >> >> Will clean up again, retest, and resubmit. Thanks for your patience and instruction. > > Dan, I thought the patch was truncated, but it's all there in V3. I should have > finished my coffee before responding. You are absolutely right that in the timed > out case that dmaengine_terminate_all(chan) should be called, and that change is > in fact already included in this patch set > > @@ -789,7 +782,7 @@ static int dmatest_func(void *data) > dmatest_KBs(runtime, total_len), ret); > > /* terminate all transfers on specified channels */ > - if (ret) > + if (ret || failed_tests) > dmaengine_terminate_all(chan); > > Would you prefer that I add a better description in the commit text to address > the fact this was in fact added? Ah, sorry I overlooked that. What about the case where the dmaengine does not support the terminate_all operation? I think we should WARN in that case, but that's also where the per-operation allocation of a done variable can prevent done notifications leaking between operations.
On Fri, Nov 17, 2017 at 9:17 AM, Adam Wallis <awallis@codeaurora.org> wrote: > On 11/17/2017 12:01 PM, Adam Wallis wrote: >> On 11/17/2017 10:57 AM, Dan Williams wrote: >>> On Fri, Nov 17, 2017 at 7:28 AM, Adam Wallis <awallis@codeaurora.org> wrote: >>>> On 11/17/2017 10:12 AM, Dan Williams wrote: >>>>> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@codeaurora.org> wrote: >>>>>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >>>>>> introduced a bug (that is in fact documented by the patch commit text) >>>>>> that leaves behind a dangling pointer. Since the done_wait structure is >>>>>> allocated on the stack, future invocations to the DMATEST can produce >>>>>> undesirable results (e.g., corrupted spinlocks). >>>>>> >>>>>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times >>>>>> out") attempted to WARN the user that the stack was likely corrupted but >>>>>> did not fix the actual issue. >>>>>> >>>>>> This patch fixes the issue by pushing the wait queue and callback >>>>>> structs into the the thread structure. If a failure occurs due to time, >>>>>> dmaengine_terminate_all will force the callback to safely call >>>>>> wake_up_all() without possibility of using a freed pointer. >>>>>> >>>>>> Cc: stable@vger.kernel.org # 4.13.x: a9df21e: dmatest: Warn User >>>>>> Cc: stable@vger.kernel.org # 4.13.x >>>>>> Cc: stable@vger.kernel.org # 4.14.x >>>>> >>>> >>>> Sure - do you want me to remove them? I was just following the instructions on >>>> stable. >>> >>> It's not broken, just a note for next time. >>> > One last question - if all we need is Fixes and CC stable...how do we indicate a > dependency (for a cherry pick)? For cherry picks the document does say to include multiple "Cc: stable" lines. So ignore what I said you're following the proper process. The way I've handled this is to just wait for the -stable team to notify that the patch does not apply and send a custom backport at that point.
On 11/17/2017 12:33 PM, Dan Williams wrote: > On Fri, Nov 17, 2017 at 9:28 AM, Adam Wallis <awallis@codeaurora.org> wrote: >> On 11/17/2017 12:01 PM, Adam Wallis wrote: > [..] >>> Dan >>> An important part of my patch was severed in this v3 submission. My apologies. >>> >>> There is a change that addresses, I believe, your concern that was in v2 >>> >>> /* terminate all transfers on specified channels */ >>> - if (ret) >>> + if (ret || failed_tests) >>> dmaengine_terminate_all(chan); >>> >>> Will clean up again, retest, and resubmit. Thanks for your patience and instruction. >> >> Dan, I thought the patch was truncated, but it's all there in V3. I should have >> finished my coffee before responding. You are absolutely right that in the timed >> out case that dmaengine_terminate_all(chan) should be called, and that change is >> in fact already included in this patch set >> >> @@ -789,7 +782,7 @@ static int dmatest_func(void *data) >> dmatest_KBs(runtime, total_len), ret); >> >> /* terminate all transfers on specified channels */ >> - if (ret) >> + if (ret || failed_tests) >> dmaengine_terminate_all(chan); >> >> Would you prefer that I add a better description in the commit text to address >> the fact this was in fact added? > > Ah, sorry I overlooked that. > > What about the case where the dmaengine does not support the > terminate_all operation? I think we should WARN in that case, but > that's also where the per-operation allocation of a done variable can > prevent done notifications leaking between operations. > Dan, thanks for your comments. I just released v4 of this patch. Let me know if I misunderstood what you said above. I do acknowledge that container_of might be less desirable than just changing the argument that we pass to the callback. Open to your suggestions! > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c index 47edc7f..2573b6c 100644 --- a/drivers/dma/dmatest.c +++ b/drivers/dma/dmatest.c @@ -155,6 +155,12 @@ struct dmatest_params { #define PATTERN_COUNT_MASK 0x1f #define PATTERN_MEMSET_IDX 0x01 +/* poor man's completion - we want to use wait_event_freezable() on it */ +struct dmatest_done { + bool done; + wait_queue_head_t *wait; +}; + struct dmatest_thread { struct list_head node; struct dmatest_info *info; @@ -165,6 +171,8 @@ struct dmatest_thread { u8 **dsts; u8 **udsts; enum dma_transaction_type type; + wait_queue_head_t done_wait; + struct dmatest_done test_done; bool done; }; @@ -342,11 +350,6 @@ static unsigned int dmatest_verify(u8 **bufs, unsigned int start, return error_count; } -/* poor man's completion - we want to use wait_event_freezable() on it */ -struct dmatest_done { - bool done; - wait_queue_head_t *wait; -}; static void dmatest_callback(void *arg) { @@ -424,9 +427,8 @@ static unsigned long long dmatest_KBs(s64 runtime, unsigned long long len) */ static int dmatest_func(void *data) { - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(done_wait); struct dmatest_thread *thread = data; - struct dmatest_done done = { .wait = &done_wait }; + struct dmatest_done *done = &thread->test_done; struct dmatest_info *info; struct dmatest_params *params; struct dma_chan *chan; @@ -673,9 +675,9 @@ static int dmatest_func(void *data) continue; } - done.done = false; + done->done = false; tx->callback = dmatest_callback; - tx->callback_param = &done; + tx->callback_param = done; cookie = tx->tx_submit(tx); if (dma_submit_error(cookie)) { @@ -688,21 +690,12 @@ static int dmatest_func(void *data) } dma_async_issue_pending(chan); - wait_event_freezable_timeout(done_wait, done.done, + wait_event_freezable_timeout(thread->done_wait, done->done, msecs_to_jiffies(params->timeout)); status = dma_async_is_tx_complete(chan, cookie, NULL, NULL); - if (!done.done) { - /* - * We're leaving the timed out dma operation with - * dangling pointer to done_wait. To make this - * correct, we'll need to allocate wait_done for - * each test iteration and perform "who's gonna - * free it this time?" dancing. For now, just - * leave it dangling. - */ - WARN(1, "dmatest: Kernel stack may be corrupted!!\n"); + if (!done->done) { dmaengine_unmap_put(um); result("test timed out", total_tests, src_off, dst_off, len, 0); @@ -789,7 +782,7 @@ static int dmatest_func(void *data) dmatest_KBs(runtime, total_len), ret); /* terminate all transfers on specified channels */ - if (ret) + if (ret || failed_tests) dmaengine_terminate_all(chan); thread->done = true; @@ -849,6 +842,8 @@ static int dmatest_add_threads(struct dmatest_info *info, thread->info = info; thread->chan = dtc->chan; thread->type = type; + thread->test_done.wait = &thread->done_wait; + init_waitqueue_head(&thread->done_wait); smp_wmb(); thread->task = kthread_create(dmatest_func, thread, "%s-%s%u", dma_chan_name(chan), op, i);