diff mbox series

[v2] refscale: Fix use of uninitalized wait_queue_head_t

Message ID 20230707175355.2442933-1-longman@redhat.com (mailing list archive)
State Accepted
Commit 933d3bf8f96d7cedf78081030e004d23aee2b56c
Headers show
Series [v2] refscale: Fix use of uninitalized wait_queue_head_t | expand

Commit Message

Waiman Long July 7, 2023, 5:53 p.m. UTC
It was found that running the refscale test might crash the kernel once
in a while with the following error:

[ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
[ 8569.952900] #PF: supervisor read access in kernel mode
[ 8569.952902] #PF: error_code(0x0000) - not-present page
[ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
[ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
[ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
[ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
  :
[ 8569.952940] Call Trace:
[ 8569.952941]  <TASK>
[ 8569.952944]  ref_scale_reader+0x380/0x4a0 [refscale]
[ 8569.952959]  kthread+0x10e/0x130
[ 8569.952966]  ret_from_fork+0x1f/0x30
[ 8569.952973]  </TASK>

This is likely caused by the fact that init_waitqueue_head() is
called after the ref_scale_reader kthread is created. The kthread
can potentially try to use the waitqueue head before it is properly
initialized. The crash happened at

	static inline void __add_wait_queue(...)
	{
		:
		if (!(wq->flags & WQ_FLAG_PRIORITY)) <=== Crash here

The offset of flags from list_head entry in wait_queue_entry is -0x18. If
reader_tasks[i].wq.head.next is NULL as allocated reader_task structure
is zero initialized, the instruction will try to access address
0xffffffffffffffe8 which is the fault address listed above.

Fix this by initializing the waitqueue head first before kthread
creation.

Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
---
 kernel/rcu/refscale.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Paul E. McKenney July 7, 2023, 7:54 p.m. UTC | #1
On Fri, Jul 07, 2023 at 01:53:55PM -0400, Waiman Long wrote:
> It was found that running the refscale test might crash the kernel once
> in a while with the following error:
> 
> [ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
> [ 8569.952900] #PF: supervisor read access in kernel mode
> [ 8569.952902] #PF: error_code(0x0000) - not-present page
> [ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
> [ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
> [ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
> [ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
>   :
> [ 8569.952940] Call Trace:
> [ 8569.952941]  <TASK>
> [ 8569.952944]  ref_scale_reader+0x380/0x4a0 [refscale]
> [ 8569.952959]  kthread+0x10e/0x130
> [ 8569.952966]  ret_from_fork+0x1f/0x30
> [ 8569.952973]  </TASK>
> 
> This is likely caused by the fact that init_waitqueue_head() is
> called after the ref_scale_reader kthread is created. The kthread
> can potentially try to use the waitqueue head before it is properly
> initialized. The crash happened at
> 
> 	static inline void __add_wait_queue(...)
> 	{
> 		:
> 		if (!(wq->flags & WQ_FLAG_PRIORITY)) <=== Crash here
> 
> The offset of flags from list_head entry in wait_queue_entry is -0x18. If
> reader_tasks[i].wq.head.next is NULL as allocated reader_task structure
> is zero initialized, the instruction will try to access address
> 0xffffffffffffffe8 which is the fault address listed above.
> 
> Fix this by initializing the waitqueue head first before kthread
> creation.
> 
> Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization")
> Signed-off-by: Waiman Long <longman@redhat.com>
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

Queued and pushed, thank you all!

As always, I could not resist wordsmithing the commit log, please see
below.

							Thanx, Paul

------------------------------------------------------------------------

commit 933d3bf8f96d7cedf78081030e004d23aee2b56c
Author: Waiman Long <longman@redhat.com>
Date:   Fri Jul 7 13:53:55 2023 -0400

    refscale: Fix uninitalized use of wait_queue_head_t
    
    Running the refscale test occasionally crashes the kernel with the
    following error:
    
    [ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
    [ 8569.952900] #PF: supervisor read access in kernel mode
    [ 8569.952902] #PF: error_code(0x0000) - not-present page
    [ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
    [ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
    [ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
    [ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
      :
    [ 8569.952940] Call Trace:
    [ 8569.952941]  <TASK>
    [ 8569.952944]  ref_scale_reader+0x380/0x4a0 [refscale]
    [ 8569.952959]  kthread+0x10e/0x130
    [ 8569.952966]  ret_from_fork+0x1f/0x30
    [ 8569.952973]  </TASK>
    
    The likely cause is that init_waitqueue_head() is called after the call to
    the torture_create_kthread() function that creates the ref_scale_reader
    kthread.  Although this init_waitqueue_head() call will very likely
    complete before this kthread is created and starts running, it is
    possible that the calling kthread will be delayed between the calls to
    torture_create_kthread() and init_waitqueue_head().  In this case, the
    new kthread will use the waitqueue head before it is properly initialized,
    which is not good for the kernel's health and well-being.
    
    The above crash happened here:
    
            static inline void __add_wait_queue(...)
            {
                    :
                    if (!(wq->flags & WQ_FLAG_PRIORITY)) <=== Crash here
    
    The offset of flags from list_head entry in wait_queue_entry is
    -0x18. If reader_tasks[i].wq.head.next is NULL as allocated reader_task
    structure is zero initialized, the instruction will try to access address
    0xffffffffffffffe8, which is exactly the fault address listed above.
    
    This commit therefore invokes init_waitqueue_head() before creating
    the kthread.
    
    Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization")
    Signed-off-by: Waiman Long <longman@redhat.com>
    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
    Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
    Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/refscale.c b/kernel/rcu/refscale.c
index 1970ce5f22d4..71d138573856 100644
--- a/kernel/rcu/refscale.c
+++ b/kernel/rcu/refscale.c
@@ -1107,12 +1107,11 @@ ref_scale_init(void)
 	VERBOSE_SCALEOUT("Starting %d reader threads", nreaders);
 
 	for (i = 0; i < nreaders; i++) {
+		init_waitqueue_head(&reader_tasks[i].wq);
 		firsterr = torture_create_kthread(ref_scale_reader, (void *)i,
 						  reader_tasks[i].task);
 		if (torture_init_error(firsterr))
 			goto unwind;
-
-		init_waitqueue_head(&(reader_tasks[i].wq));
 	}
 
 	// Main Task
Waiman Long July 7, 2023, 8:17 p.m. UTC | #2
On 7/7/23 15:54, Paul E. McKenney wrote:
> On Fri, Jul 07, 2023 at 01:53:55PM -0400, Waiman Long wrote:
>> It was found that running the refscale test might crash the kernel once
>> in a while with the following error:
>>
>> [ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
>> [ 8569.952900] #PF: supervisor read access in kernel mode
>> [ 8569.952902] #PF: error_code(0x0000) - not-present page
>> [ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
>> [ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
>> [ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
>> [ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
>>    :
>> [ 8569.952940] Call Trace:
>> [ 8569.952941]  <TASK>
>> [ 8569.952944]  ref_scale_reader+0x380/0x4a0 [refscale]
>> [ 8569.952959]  kthread+0x10e/0x130
>> [ 8569.952966]  ret_from_fork+0x1f/0x30
>> [ 8569.952973]  </TASK>
>>
>> This is likely caused by the fact that init_waitqueue_head() is
>> called after the ref_scale_reader kthread is created. The kthread
>> can potentially try to use the waitqueue head before it is properly
>> initialized. The crash happened at
>>
>> 	static inline void __add_wait_queue(...)
>> 	{
>> 		:
>> 		if (!(wq->flags & WQ_FLAG_PRIORITY)) <=== Crash here
>>
>> The offset of flags from list_head entry in wait_queue_entry is -0x18. If
>> reader_tasks[i].wq.head.next is NULL as allocated reader_task structure
>> is zero initialized, the instruction will try to access address
>> 0xffffffffffffffe8 which is the fault address listed above.
>>
>> Fix this by initializing the waitqueue head first before kthread
>> creation.
>>
>> Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> Queued and pushed, thank you all!
>
> As always, I could not resist wordsmithing the commit log, please see
> below.

Thanks for the wordsmithing. I don't mind at all.

Cheers,
Longman

> 							Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 933d3bf8f96d7cedf78081030e004d23aee2b56c
> Author: Waiman Long <longman@redhat.com>
> Date:   Fri Jul 7 13:53:55 2023 -0400
>
>      refscale: Fix uninitalized use of wait_queue_head_t
>      
>      Running the refscale test occasionally crashes the kernel with the
>      following error:
>      
>      [ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
>      [ 8569.952900] #PF: supervisor read access in kernel mode
>      [ 8569.952902] #PF: error_code(0x0000) - not-present page
>      [ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
>      [ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
>      [ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
>      [ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
>        :
>      [ 8569.952940] Call Trace:
>      [ 8569.952941]  <TASK>
>      [ 8569.952944]  ref_scale_reader+0x380/0x4a0 [refscale]
>      [ 8569.952959]  kthread+0x10e/0x130
>      [ 8569.952966]  ret_from_fork+0x1f/0x30
>      [ 8569.952973]  </TASK>
>      
>      The likely cause is that init_waitqueue_head() is called after the call to
>      the torture_create_kthread() function that creates the ref_scale_reader
>      kthread.  Although this init_waitqueue_head() call will very likely
>      complete before this kthread is created and starts running, it is
>      possible that the calling kthread will be delayed between the calls to
>      torture_create_kthread() and init_waitqueue_head().  In this case, the
>      new kthread will use the waitqueue head before it is properly initialized,
>      which is not good for the kernel's health and well-being.
>      
>      The above crash happened here:
>      
>              static inline void __add_wait_queue(...)
>              {
>                      :
>                      if (!(wq->flags & WQ_FLAG_PRIORITY)) <=== Crash here
>      
>      The offset of flags from list_head entry in wait_queue_entry is
>      -0x18. If reader_tasks[i].wq.head.next is NULL as allocated reader_task
>      structure is zero initialized, the instruction will try to access address
>      0xffffffffffffffe8, which is exactly the fault address listed above.
>      
>      This commit therefore invokes init_waitqueue_head() before creating
>      the kthread.
>      
>      Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization")
>      Signed-off-by: Waiman Long <longman@redhat.com>
>      Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>      Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>      Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>
> diff --git a/kernel/rcu/refscale.c b/kernel/rcu/refscale.c
> index 1970ce5f22d4..71d138573856 100644
> --- a/kernel/rcu/refscale.c
> +++ b/kernel/rcu/refscale.c
> @@ -1107,12 +1107,11 @@ ref_scale_init(void)
>   	VERBOSE_SCALEOUT("Starting %d reader threads", nreaders);
>   
>   	for (i = 0; i < nreaders; i++) {
> +		init_waitqueue_head(&reader_tasks[i].wq);
>   		firsterr = torture_create_kthread(ref_scale_reader, (void *)i,
>   						  reader_tasks[i].task);
>   		if (torture_init_error(firsterr))
>   			goto unwind;
> -
> -		init_waitqueue_head(&(reader_tasks[i].wq));
>   	}
>   
>   	// Main Task
>
diff mbox series

Patch

diff --git a/kernel/rcu/refscale.c b/kernel/rcu/refscale.c
index 1970ce5f22d4..71d138573856 100644
--- a/kernel/rcu/refscale.c
+++ b/kernel/rcu/refscale.c
@@ -1107,12 +1107,11 @@  ref_scale_init(void)
 	VERBOSE_SCALEOUT("Starting %d reader threads", nreaders);
 
 	for (i = 0; i < nreaders; i++) {
+		init_waitqueue_head(&reader_tasks[i].wq);
 		firsterr = torture_create_kthread(ref_scale_reader, (void *)i,
 						  reader_tasks[i].task);
 		if (torture_init_error(firsterr))
 			goto unwind;
-
-		init_waitqueue_head(&(reader_tasks[i].wq));
 	}
 
 	// Main Task