Message ID | 1509565797-5219-1-git-send-email-awallis@codeaurora.org (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Wed, Nov 01, 2017 at 03:49:57PM -0400, Adam Wallis wrote: > Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") > introduced a bug (that is in fact documented by the patch commit text) > that leaves behind a dangling pointer. Since the done_wait structure is > allocated on the stack, future invocations to the DMATEST can produce > undesirable results (e.g., corrupted spinlocks). Ideally, this would be > cleaned up in the the thread handler, but at the very least, the kernel > is left in a very precarious scenario that can lead to some long debug > sessions when the crash comes later. > > This bug has also been captured at > https://bugzilla.kernel.org/show_bug.cgi?id=197605 > > Signed-off-by: Adam Wallis <awallis@codeaurora.org> > --- > drivers/dma/dmatest.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c > index 34ff532..95c662f 100644 > --- a/drivers/dma/dmatest.c > +++ b/drivers/dma/dmatest.c > @@ -706,7 +706,7 @@ static int dmatest_func(void *data) > result("test timed out", total_tests, src_off, dst_off, > len, 0); > failed_tests++; > - continue; > + BUG(); This will essentially kill the system. People have been trying to remove usage of BUG(), we should not do that. Complaining violently makes sense.. > } else if (status != DMA_COMPLETE) { > dmaengine_unmap_put(um); > result(status == DMA_ERROR ? > -- > Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. > Qualcomm Technologies, Inc. is a member of the > Code Aurora Forum, a Linux Foundation Collaborative Project. >
On 11/2/2017 4:17 AM, Vinod Koul wrote: > On Wed, Nov 01, 2017 at 03:49:57PM -0400, Adam Wallis wrote: >> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >> introduced a bug (that is in fact documented by the patch commit text) >> that leaves behind a dangling pointer. Since the done_wait structure is >> allocated on the stack, future invocations to the DMATEST can produce >> undesirable results (e.g., corrupted spinlocks). Ideally, this would be >> cleaned up in the the thread handler, but at the very least, the kernel >> is left in a very precarious scenario that can lead to some long debug >> sessions when the crash comes later. >> >> This bug has also been captured at >> https://bugzilla.kernel.org/show_bug.cgi?id=197605 >> >> Signed-off-by: Adam Wallis <awallis@codeaurora.org> >> --- >> drivers/dma/dmatest.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c >> index 34ff532..95c662f 100644 >> --- a/drivers/dma/dmatest.c >> +++ b/drivers/dma/dmatest.c >> @@ -706,7 +706,7 @@ static int dmatest_func(void *data) >> result("test timed out", total_tests, src_off, dst_off, >> len, 0); >> failed_tests++; >> - continue; >> + BUG(); > > This will essentially kill the system. People have been trying to remove > usage of BUG(), we should not do that. Complaining violently makes sense.. > Understood. In my testing, the system is effectively shot if you run the dmatest in a tight enough loop, however, at least using a WARN would alert the user that the kernel might be in an unstable state. I will submit another patch with a WARN, even though, I hope a proper fix can be introduced at some point. I am not familiar enough with this code to propose such a fix at the moment though. > >> } else if (status != DMA_COMPLETE) { >> dmaengine_unmap_put(um); >> result(status == DMA_ERROR ? >> -- >> Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. >> Qualcomm Technologies, Inc. is a member of the >> Code Aurora Forum, a Linux Foundation Collaborative Project. >> >
On 11/02/2017 07:11 AM, Adam Wallis wrote: >> This will essentially kill the system. People have been trying to remove >> usage of BUG(), we should not do that. Complaining violently makes sense.. >> > Understood. In my testing, the system is effectively shot if you run the dmatest > in a tight enough loop, however, at least using a WARN would alert the user that > the kernel might be in an unstable state. I will submit another patch with a > WARN, even though, I hope a proper fix can be introduced at some point. I am not > familiar enough with this code to propose such a fix at the moment though. I think Adam's original patch is correct, because as he said, the code as it is today corrupts the kernel. The BUG() is a stop-cap measure to prevent the user from thinking that the kernel is stable when it isn't. So I think that either dmatest gets fixed properly today (which Adam said he cannot do), or you should apply his patch until the code can get fixed.
On Thu, Nov 02, 2017 at 07:50:02AM -0500, Timur Tabi wrote: > On 11/02/2017 07:11 AM, Adam Wallis wrote: > >>This will essentially kill the system. People have been trying to remove > >>usage of BUG(), we should not do that. Complaining violently makes sense.. > >> > >Understood. In my testing, the system is effectively shot if you run the dmatest > >in a tight enough loop, however, at least using a WARN would alert the user that > >the kernel might be in an unstable state. I will submit another patch with a > >WARN, even though, I hope a proper fix can be introduced at some point. I am not > >familiar enough with this code to propose such a fix at the moment though. > > I think Adam's original patch is correct, because as he said, the code as it > is today corrupts the kernel. The BUG() is a stop-cap measure to prevent > the user from thinking that the kernel is stable when it isn't. It also helps people to get more information to debug the systems > So I think that either dmatest gets fixed properly today (which Adam said he > cannot do), or you should apply his patch until the code can get fixed. Ofcourse fixing is better.
diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c index 34ff532..95c662f 100644 --- a/drivers/dma/dmatest.c +++ b/drivers/dma/dmatest.c @@ -706,7 +706,7 @@ static int dmatest_func(void *data) result("test timed out", total_tests, src_off, dst_off, len, 0); failed_tests++; - continue; + BUG(); } else if (status != DMA_COMPLETE) { dmaengine_unmap_put(um); result(status == DMA_ERROR ?
Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") introduced a bug (that is in fact documented by the patch commit text) that leaves behind a dangling pointer. Since the done_wait structure is allocated on the stack, future invocations to the DMATEST can produce undesirable results (e.g., corrupted spinlocks). Ideally, this would be cleaned up in the the thread handler, but at the very least, the kernel is left in a very precarious scenario that can lead to some long debug sessions when the crash comes later. This bug has also been captured at https://bugzilla.kernel.org/show_bug.cgi?id=197605 Signed-off-by: Adam Wallis <awallis@codeaurora.org> --- drivers/dma/dmatest.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)