diff mbox series

[v2,2/6] x86/sgx: Do not consider unsanitized pages an error

Message ID 20220831173829.126661-3-jarkko@kernel.org (mailing list archive)
State New, archived
Headers show
Series x86/sgx: A collection of tests and fixes | expand

Commit Message

Jarkko Sakkinen Aug. 31, 2022, 5:38 p.m. UTC
In sgx_init(), if misc_register() fails or misc_register() succeeds but
neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
prematurely stopped. This may leave some unsanitized pages, which does
not matter, because SGX will be disabled for the whole power cycle.

This triggers WARN_ON() because sgx_dirty_page_list ends up being
non-empty, and dumps the call stack:

[    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
[    0.268591] ------------[ cut here ]------------
[    0.268592] WARNING: CPU: 6 PID: 83 at
arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
[    0.268598] Modules linked in:
[    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
[    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
07/06/2022
[    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
[    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
[    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
[    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
0000000000000000
[    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
00000000ffffffff
[    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
ffff8dcd820a4180
[    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
ffffb6c74006bce0
[    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
0000000000000000
[    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
knlGS:0000000000000000
[    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
00000000003706e0
[    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    0.268622] Call Trace:
[    0.268624]  <TASK>
[    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
[    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[    0.268634]  ? __kthread_parkme+0x36/0x90
[    0.268637]  kthread+0xe5/0x110
[    0.268639]  ? kthread_complete_and_exit+0x20/0x20
[    0.268642]  ret_from_fork+0x1f/0x30
[    0.268647]  </TASK>
[    0.268648] ---[ end trace 0000000000000000 ]---

Ultimately this can crash the kernel, if the following is set:

	/proc/sys/kernel/panic_on_warn

In premature stop, print nothing, as the number is by practical means a
random number. Otherwise, it is an indicator of a bug in the driver, and
therefore print the number of unsanitized pages with pr_err().

Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
v6:
- Address Reinette's feedback:
  https://lore.kernel.org/linux-sgx/Yw6%2FiTzSdSw%2FY%2FVO@kernel.org/

v5:
- Add the klog dump and sysctl option to the commit message.

v4:
- Explain expectations for dirty_page_list in the function header, instead
  of an inline comment.
- Improve commit message to explain the conditions better.
- Return the number of pages left dirty to ksgxd() and print warning after
  the 2nd call, if there are any.

v3:
- Remove WARN_ON().
- Tuned comments and the commit message a bit.

v2:
- Replaced WARN_ON() with optional pr_info() inside
  __sgx_sanitize_pages().
- Rewrote the commit message.
- Added the fixes tag.
---
 arch/x86/kernel/cpu/sgx/main.c | 42 ++++++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 7 deletions(-)

Comments

Reinette Chatre Aug. 31, 2022, 8:39 p.m. UTC | #1
Hi Jarkko,

On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
> In sgx_init(), if misc_register() fails or misc_register() succeeds but
> neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> prematurely stopped. This may leave some unsanitized pages, which does
> not matter, because SGX will be disabled for the whole power cycle.
> 
> This triggers WARN_ON() because sgx_dirty_page_list ends up being
> non-empty, and dumps the call stack:
> 
> [    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
> [    0.268591] ------------[ cut here ]------------
> [    0.268592] WARNING: CPU: 6 PID: 83 at
> arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> [    0.268598] Modules linked in:
> [    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
> [    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
> 07/06/2022
> [    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
> [    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
> 84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
> ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
> [    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
> [    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
> 0000000000000000
> [    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
> 00000000ffffffff
> [    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
> ffff8dcd820a4180
> [    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
> ffffb6c74006bce0
> [    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
> 0000000000000000
> [    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
> knlGS:0000000000000000
> [    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
> 00000000003706e0
> [    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [    0.268622] Call Trace:
> [    0.268624]  <TASK>
> [    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
> [    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [    0.268634]  ? __kthread_parkme+0x36/0x90
> [    0.268637]  kthread+0xe5/0x110
> [    0.268639]  ? kthread_complete_and_exit+0x20/0x20
> [    0.268642]  ret_from_fork+0x1f/0x30
> [    0.268647]  </TASK>
> [    0.268648] ---[ end trace 0000000000000000 ]---
> 

Are you still planning to trim this?

> Ultimately this can crash the kernel, if the following is set:
> 
> 	/proc/sys/kernel/panic_on_warn
> 
> In premature stop, print nothing, as the number is by practical means a
> random number. Otherwise, it is an indicator of a bug in the driver, and
> therefore print the number of unsanitized pages with pr_err().

I think that "print the number of unsanitized pages with pr_err()" 
contradicts the patch subject of "Do not consider unsanitized pages
an error".

...

> @@ -388,17 +393,40 @@ void sgx_reclaim_direct(void)
>  
>  static int ksgxd(void *p)
>  {
> +	long ret;
> +
>  	set_freezable();
>  
>  	/*
>  	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	if (ret == -ECANCELED)
> +		/* kthread stopped */
> +		return 0;
>  
> -	/* sanity check: */
> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	switch (ret) {
> +	case 0:
> +		/* success, no unsanitized pages */
> +		break;
> +
> +	case -ECANCELED:
> +		/* kthread stopped */
> +		return 0;
> +
> +	default:
> +		/*
> +		 * Never expected to happen in a working driver. If it happens
> +		 * the bug is expected to be in the sanitization process, but
> +		 * successfully sanitized pages are still valid and driver can
> +		 * be used and most importantly debugged without issues. To put
> +		 * short, the global state of kernel is not corrupted so no
> +		 * reason to do any more complicated rollback.
> +		 */
> +		pr_err("%ld unsanitized pages\n", ret);
> +	}
>  
>  	while (!kthread_should_stop()) {
>  		if (try_to_freeze())


I think I am missing something here. A lot of logic is added here but I
do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
the reclaimer was canceled. I am thus wondering, could the above not be
simplified to something similar to V1:

@@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
 
 static int ksgxd(void *p)
 {
+	unsigned long left_dirty;
+
 	set_freezable();
 
 	/*
@@ -395,10 +402,10 @@ static int ksgxd(void *p)
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
 	__sgx_sanitize_pages(&sgx_dirty_page_list);
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
 
-	/* sanity check: */
-	WARN_ON(!list_empty(&sgx_dirty_page_list));
+	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	if (left_dirty && !kthread_should_stop())
+		pr_err("%lu unsanitized pages\n", left_dirty);
 
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())


Reinette
Kai Huang Sept. 1, 2022, 10:50 a.m. UTC | #2
On Wed, 2022-08-31 at 13:39 -0700, Reinette Chatre wrote:
> >   static int ksgxd(void *p)
> >   {
> > +	long ret;
> > +
> >   	set_freezable();
> >   
> >   	/*
> >   	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> >   	 * required for SECS pages, whose child pages blocked EREMOVE.
> >   	 */
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	if (ret == -ECANCELED)
> > +		/* kthread stopped */
> > +		return 0;
> >   
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	switch (ret) {
> > +	case 0:
> > +		/* success, no unsanitized pages */
> > +		break;
> > +
> > +	case -ECANCELED:
> > +		/* kthread stopped */
> > +		return 0;
> > +
> > +	default:
> > +		/*
> > +		 * Never expected to happen in a working driver. If it
> > happens
> > +		 * the bug is expected to be in the sanitization process,
> > but
> > +		 * successfully sanitized pages are still valid and driver
> > can
> > +		 * be used and most importantly debugged without issues. To
> > put
> > +		 * short, the global state of kernel is not corrupted so no
> > +		 * reason to do any more complicated rollback.
> > +		 */
> > +		pr_err("%ld unsanitized pages\n", ret);
> > +	}
> >   
> >   	while (!kthread_should_stop()) {
> >   		if (try_to_freeze())
> 
> 
> I think I am missing something here. A lot of logic is added here but I
> do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> the reclaimer was canceled. I am thus wondering, could the above not be
> simplified to something similar to V1:
> 
> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>  
>  static int ksgxd(void *p)
>  {
> +	unsigned long left_dirty;
> +
>  	set_freezable();
>  
>  	/*
> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>  
> -	/* sanity check: */
> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	if (left_dirty && !kthread_should_stop())
> +		pr_err("%lu unsanitized pages\n", left_dirty);
>  

This basically means driver bug if I understand correctly.  To be consistent
with the behaviour of existing code, how about just WARN()?
	
	...
	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
	WARN_ON(left_dirty && !kthread_should_stop());

It seems there's little value to print out the unsanitized pages here.  The
existing code doesn't print it anyway.
Jarkko Sakkinen Sept. 1, 2022, 9:47 p.m. UTC | #3
On Thu, Sep 01, 2022 at 10:50:07AM +0000, Huang, Kai wrote:
> On Wed, 2022-08-31 at 13:39 -0700, Reinette Chatre wrote:
> > >   static int ksgxd(void *p)
> > >   {
> > > +	long ret;
> > > +
> > >   	set_freezable();
> > >   
> > >   	/*
> > >   	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> > >   	 * required for SECS pages, whose child pages blocked EREMOVE.
> > >   	 */
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	if (ret == -ECANCELED)
> > > +		/* kthread stopped */
> > > +		return 0;
> > >   
> > > -	/* sanity check: */
> > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	switch (ret) {
> > > +	case 0:
> > > +		/* success, no unsanitized pages */
> > > +		break;
> > > +
> > > +	case -ECANCELED:
> > > +		/* kthread stopped */
> > > +		return 0;
> > > +
> > > +	default:
> > > +		/*
> > > +		 * Never expected to happen in a working driver. If it
> > > happens
> > > +		 * the bug is expected to be in the sanitization process,
> > > but
> > > +		 * successfully sanitized pages are still valid and driver
> > > can
> > > +		 * be used and most importantly debugged without issues. To
> > > put
> > > +		 * short, the global state of kernel is not corrupted so no
> > > +		 * reason to do any more complicated rollback.
> > > +		 */
> > > +		pr_err("%ld unsanitized pages\n", ret);
> > > +	}
> > >   
> > >   	while (!kthread_should_stop()) {
> > >   		if (try_to_freeze())
> > 
> > 
> > I think I am missing something here. A lot of logic is added here but I
> > do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> > the reclaimer was canceled. I am thus wondering, could the above not be
> > simplified to something similar to V1:
> > 
> > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> >  
> >  static int ksgxd(void *p)
> >  {
> > +	unsigned long left_dirty;
> > +
> >  	set_freezable();
> >  
> >  	/*
> > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >  	 */
> >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> >  
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	if (left_dirty && !kthread_should_stop())
> > +		pr_err("%lu unsanitized pages\n", left_dirty);
> >  
> 
> This basically means driver bug if I understand correctly.  To be consistent
> with the behaviour of existing code, how about just WARN()?
> 	
> 	...
> 	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> 	WARN_ON(left_dirty && !kthread_should_stop());
> 
> It seems there's little value to print out the unsanitized pages here.  The
> existing code doesn't print it anyway.

Using WARN IMHO here is too strong measure, given that
it tear down the whole kernel, if panic_on_warn is enabled.

For debugging, any information is useful information, so
would not make sense not print the number of pages, if 
that is available. That could very well point out the
issue why all pages are not sanitized if there was a bug.

BR, Jarkko
Jarkko Sakkinen Sept. 1, 2022, 9:53 p.m. UTC | #4
On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
> > In sgx_init(), if misc_register() fails or misc_register() succeeds but
> > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > prematurely stopped. This may leave some unsanitized pages, which does
> > not matter, because SGX will be disabled for the whole power cycle.
> > 
> > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > non-empty, and dumps the call stack:
> > 
> > [    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
> > [    0.268591] ------------[ cut here ]------------
> > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> > [    0.268598] Modules linked in:
> > [    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
> > [    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
> > 07/06/2022
> > [    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
> > [    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
> > 84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
> > ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
> > [    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
> > [    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
> > 0000000000000000
> > [    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
> > 00000000ffffffff
> > [    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
> > ffff8dcd820a4180
> > [    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
> > ffffb6c74006bce0
> > [    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
> > 0000000000000000
> > [    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
> > knlGS:0000000000000000
> > [    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
> > 00000000003706e0
> > [    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [    0.268622] Call Trace:
> > [    0.268624]  <TASK>
> > [    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
> > [    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> > [    0.268634]  ? __kthread_parkme+0x36/0x90
> > [    0.268637]  kthread+0xe5/0x110
> > [    0.268639]  ? kthread_complete_and_exit+0x20/0x20
> > [    0.268642]  ret_from_fork+0x1f/0x30
> > [    0.268647]  </TASK>
> > [    0.268648] ---[ end trace 0000000000000000 ]---
> > 
> 
> Are you still planning to trim this?
> 
> > Ultimately this can crash the kernel, if the following is set:
> > 
> > 	/proc/sys/kernel/panic_on_warn
> > 
> > In premature stop, print nothing, as the number is by practical means a
> > random number. Otherwise, it is an indicator of a bug in the driver, and
> > therefore print the number of unsanitized pages with pr_err().
> 
> I think that "print the number of unsanitized pages with pr_err()" 
> contradicts the patch subject of "Do not consider unsanitized pages
> an error".
> 
> ...
> 
> > @@ -388,17 +393,40 @@ void sgx_reclaim_direct(void)
> >  
> >  static int ksgxd(void *p)
> >  {
> > +	long ret;
> > +
> >  	set_freezable();
> >  
> >  	/*
> >  	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >  	 */
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	if (ret == -ECANCELED)
> > +		/* kthread stopped */
> > +		return 0;
> >  
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	switch (ret) {
> > +	case 0:
> > +		/* success, no unsanitized pages */
> > +		break;
> > +
> > +	case -ECANCELED:
> > +		/* kthread stopped */
> > +		return 0;
> > +
> > +	default:
> > +		/*
> > +		 * Never expected to happen in a working driver. If it happens
> > +		 * the bug is expected to be in the sanitization process, but
> > +		 * successfully sanitized pages are still valid and driver can
> > +		 * be used and most importantly debugged without issues. To put
> > +		 * short, the global state of kernel is not corrupted so no
> > +		 * reason to do any more complicated rollback.
> > +		 */
> > +		pr_err("%ld unsanitized pages\n", ret);
> > +	}
> >  
> >  	while (!kthread_should_stop()) {
> >  		if (try_to_freeze())
> 
> 
> I think I am missing something here. A lot of logic is added here but I
> do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> the reclaimer was canceled. I am thus wondering, could the above not be
> simplified to something similar to V1:
> 
> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>  
>  static int ksgxd(void *p)
>  {
> +	unsigned long left_dirty;
> +
>  	set_freezable();
>  
>  	/*
> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
>  	__sgx_sanitize_pages(&sgx_dirty_page_list);

IMHO, would make sense also to have here:

        if (!kthread_should_stop())
                return 0;

> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>  
> -	/* sanity check: */
> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	if (left_dirty && !kthread_should_stop())
> +		pr_err("%lu unsanitized pages\n", left_dirty);

That would be incorrect, if the function returned
because of kthread stopped.

If you do the check here you already have a window
where kthread could have been stopped anyhow.

So even this would be less correct:

        if (kthreas_should_stop()) {
                return 0;
        }  else if (left_dirty) {
                pr_err("%lu unsanitized pages\n", left_dirty);
        }

So in the end you end as complicated and less correct
fix. This all is explained in the commit message.

If you unconditionally print error, you don't have
a meaning for the number of unsanitized pags.

BR, Jarkko
Jarkko Sakkinen Sept. 1, 2022, 9:56 p.m. UTC | #5
On Fri, Sep 02, 2022 at 12:53:55AM +0300, Jarkko Sakkinen wrote:
> On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
> > Hi Jarkko,
> > 
> > On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
> > > In sgx_init(), if misc_register() fails or misc_register() succeeds but
> > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > prematurely stopped. This may leave some unsanitized pages, which does
> > > not matter, because SGX will be disabled for the whole power cycle.
> > > 
> > > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > > non-empty, and dumps the call stack:
> > > 
> > > [    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
> > > [    0.268591] ------------[ cut here ]------------
> > > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> > > [    0.268598] Modules linked in:
> > > [    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
> > > [    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
> > > 07/06/2022
> > > [    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
> > > [    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
> > > 84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
> > > ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
> > > [    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
> > > [    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
> > > 0000000000000000
> > > [    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
> > > 00000000ffffffff
> > > [    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
> > > ffff8dcd820a4180
> > > [    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
> > > ffffb6c74006bce0
> > > [    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
> > > 0000000000000000
> > > [    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
> > > knlGS:0000000000000000
> > > [    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
> > > 00000000003706e0
> > > [    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > 0000000000000400
> > > [    0.268622] Call Trace:
> > > [    0.268624]  <TASK>
> > > [    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
> > > [    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> > > [    0.268634]  ? __kthread_parkme+0x36/0x90
> > > [    0.268637]  kthread+0xe5/0x110
> > > [    0.268639]  ? kthread_complete_and_exit+0x20/0x20
> > > [    0.268642]  ret_from_fork+0x1f/0x30
> > > [    0.268647]  </TASK>
> > > [    0.268648] ---[ end trace 0000000000000000 ]---
> > > 
> > 
> > Are you still planning to trim this?
> > 
> > > Ultimately this can crash the kernel, if the following is set:
> > > 
> > > 	/proc/sys/kernel/panic_on_warn
> > > 
> > > In premature stop, print nothing, as the number is by practical means a
> > > random number. Otherwise, it is an indicator of a bug in the driver, and
> > > therefore print the number of unsanitized pages with pr_err().
> > 
> > I think that "print the number of unsanitized pages with pr_err()" 
> > contradicts the patch subject of "Do not consider unsanitized pages
> > an error".
> > 
> > ...
> > 
> > > @@ -388,17 +393,40 @@ void sgx_reclaim_direct(void)
> > >  
> > >  static int ksgxd(void *p)
> > >  {
> > > +	long ret;
> > > +
> > >  	set_freezable();
> > >  
> > >  	/*
> > >  	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> > >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> > >  	 */
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	if (ret == -ECANCELED)
> > > +		/* kthread stopped */
> > > +		return 0;
> > >  
> > > -	/* sanity check: */
> > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	switch (ret) {
> > > +	case 0:
> > > +		/* success, no unsanitized pages */
> > > +		break;
> > > +
> > > +	case -ECANCELED:
> > > +		/* kthread stopped */
> > > +		return 0;
> > > +
> > > +	default:
> > > +		/*
> > > +		 * Never expected to happen in a working driver. If it happens
> > > +		 * the bug is expected to be in the sanitization process, but
> > > +		 * successfully sanitized pages are still valid and driver can
> > > +		 * be used and most importantly debugged without issues. To put
> > > +		 * short, the global state of kernel is not corrupted so no
> > > +		 * reason to do any more complicated rollback.
> > > +		 */
> > > +		pr_err("%ld unsanitized pages\n", ret);
> > > +	}
> > >  
> > >  	while (!kthread_should_stop()) {
> > >  		if (try_to_freeze())
> > 
> > 
> > I think I am missing something here. A lot of logic is added here but I
> > do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> > the reclaimer was canceled. I am thus wondering, could the above not be
> > simplified to something similar to V1:
> > 
> > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> >  
> >  static int ksgxd(void *p)
> >  {
> > +	unsigned long left_dirty;
> > +
> >  	set_freezable();
> >  
> >  	/*
> > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >  	 */
> >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> 
> IMHO, would make sense also to have here:
> 
>         if (!kthread_should_stop())
>                 return 0;
> 
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> >  
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	if (left_dirty && !kthread_should_stop())
> > +		pr_err("%lu unsanitized pages\n", left_dirty);
> 
> That would be incorrect, if the function returned
> because of kthread stopped.
> 
> If you do the check here you already have a window
> where kthread could have been stopped anyhow.
> 
> So even this would be less correct:
> 
>         if (kthreas_should_stop()) {
>                 return 0;
>         }  else if (left_dirty) {
>                 pr_err("%lu unsanitized pages\n", left_dirty);
>         }
> 
> So in the end you end as complicated and less correct
> fix. This all is explained in the commit message.
> 
> If you unconditionally print error, you don't have
> a meaning for the number of unsanitized pags.

If you add my long comment explaining the error case, the SLOC
size is almost the same. That takes most space in my patch.

BR, Jarkko
Jarkko Sakkinen Sept. 1, 2022, 10:01 p.m. UTC | #6
On Fri, Sep 02, 2022 at 12:56:34AM +0300, Jarkko Sakkinen wrote:
> On Fri, Sep 02, 2022 at 12:53:55AM +0300, Jarkko Sakkinen wrote:
> > On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
> > > Hi Jarkko,
> > > 
> > > On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
> > > > In sgx_init(), if misc_register() fails or misc_register() succeeds but
> > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > prematurely stopped. This may leave some unsanitized pages, which does
> > > > not matter, because SGX will be disabled for the whole power cycle.
> > > > 
> > > > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > > > non-empty, and dumps the call stack:
> > > > 
> > > > [    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
> > > > [    0.268591] ------------[ cut here ]------------
> > > > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > > > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> > > > [    0.268598] Modules linked in:
> > > > [    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
> > > > [    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
> > > > 07/06/2022
> > > > [    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
> > > > [    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
> > > > 84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
> > > > ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
> > > > [    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
> > > > [    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
> > > > 0000000000000000
> > > > [    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
> > > > 00000000ffffffff
> > > > [    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
> > > > ffff8dcd820a4180
> > > > [    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
> > > > ffffb6c74006bce0
> > > > [    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
> > > > 0000000000000000
> > > > [    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
> > > > knlGS:0000000000000000
> > > > [    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
> > > > 00000000003706e0
> > > > [    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000
> > > > [    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > 0000000000000400
> > > > [    0.268622] Call Trace:
> > > > [    0.268624]  <TASK>
> > > > [    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
> > > > [    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> > > > [    0.268634]  ? __kthread_parkme+0x36/0x90
> > > > [    0.268637]  kthread+0xe5/0x110
> > > > [    0.268639]  ? kthread_complete_and_exit+0x20/0x20
> > > > [    0.268642]  ret_from_fork+0x1f/0x30
> > > > [    0.268647]  </TASK>
> > > > [    0.268648] ---[ end trace 0000000000000000 ]---
> > > > 
> > > 
> > > Are you still planning to trim this?
> > > 
> > > > Ultimately this can crash the kernel, if the following is set:
> > > > 
> > > > 	/proc/sys/kernel/panic_on_warn
> > > > 
> > > > In premature stop, print nothing, as the number is by practical means a
> > > > random number. Otherwise, it is an indicator of a bug in the driver, and
> > > > therefore print the number of unsanitized pages with pr_err().
> > > 
> > > I think that "print the number of unsanitized pages with pr_err()" 
> > > contradicts the patch subject of "Do not consider unsanitized pages
> > > an error".
> > > 
> > > ...
> > > 
> > > > @@ -388,17 +393,40 @@ void sgx_reclaim_direct(void)
> > > >  
> > > >  static int ksgxd(void *p)
> > > >  {
> > > > +	long ret;
> > > > +
> > > >  	set_freezable();
> > > >  
> > > >  	/*
> > > >  	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> > > >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> > > >  	 */
> > > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > +	if (ret == -ECANCELED)
> > > > +		/* kthread stopped */
> > > > +		return 0;
> > > >  
> > > > -	/* sanity check: */
> > > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > +	switch (ret) {
> > > > +	case 0:
> > > > +		/* success, no unsanitized pages */
> > > > +		break;
> > > > +
> > > > +	case -ECANCELED:
> > > > +		/* kthread stopped */
> > > > +		return 0;
> > > > +
> > > > +	default:
> > > > +		/*
> > > > +		 * Never expected to happen in a working driver. If it happens
> > > > +		 * the bug is expected to be in the sanitization process, but
> > > > +		 * successfully sanitized pages are still valid and driver can
> > > > +		 * be used and most importantly debugged without issues. To put
> > > > +		 * short, the global state of kernel is not corrupted so no
> > > > +		 * reason to do any more complicated rollback.
> > > > +		 */
> > > > +		pr_err("%ld unsanitized pages\n", ret);
> > > > +	}
> > > >  
> > > >  	while (!kthread_should_stop()) {
> > > >  		if (try_to_freeze())
> > > 
> > > 
> > > I think I am missing something here. A lot of logic is added here but I
> > > do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> > > the reclaimer was canceled. I am thus wondering, could the above not be
> > > simplified to something similar to V1:
> > > 
> > > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> > >  
> > >  static int ksgxd(void *p)
> > >  {
> > > +	unsigned long left_dirty;
> > > +
> > >  	set_freezable();
> > >  
> > >  	/*
> > > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> > >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> > >  	 */
> > >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > 
> > IMHO, would make sense also to have here:
> > 
> >         if (!kthread_should_stop())
> >                 return 0;
> > 
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > >  
> > > -	/* sanity check: */
> > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	if (left_dirty && !kthread_should_stop())
> > > +		pr_err("%lu unsanitized pages\n", left_dirty);
> > 
> > That would be incorrect, if the function returned
> > because of kthread stopped.
> > 
> > If you do the check here you already have a window
> > where kthread could have been stopped anyhow.
> > 
> > So even this would be less correct:
> > 
> >         if (kthreas_should_stop()) {
> >                 return 0;
> >         }  else if (left_dirty) {
> >                 pr_err("%lu unsanitized pages\n", left_dirty);
> >         }
> > 
> > So in the end you end as complicated and less correct
> > fix. This all is explained in the commit message.
> > 
> > If you unconditionally print error, you don't have
> > a meaning for the number of unsanitized pags.
> 
> If you add my long comment explaining the error case, the SLOC
> size is almost the same. That takes most space in my patch.

Printing pages if the thread was stopped is just making
the bug look different and have less output, it does not
fix anything, except prevent kernel panic.

BR, Jarkko
Reinette Chatre Sept. 1, 2022, 10:34 p.m. UTC | #7
Hi Jarkko,

On 9/1/2022 2:53 PM, Jarkko Sakkinen wrote:
> On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
>> On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:

>> I think I am missing something here. A lot of logic is added here but I
>> do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
>> the reclaimer was canceled. I am thus wondering, could the above not be
>> simplified to something similar to V1:
>>
>> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>>  
>>  static int ksgxd(void *p)
>>  {
>> +	unsigned long left_dirty;
>> +
>>  	set_freezable();
>>  
>>  	/*
>> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>>  	 */
>>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> 
> IMHO, would make sense also to have here:
> 
>         if (!kthread_should_stop())
>                 return 0;
> 

Would this not prematurely stop the thread when it should not be?

>> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>>  
>> -	/* sanity check: */
>> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
>> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
>> +	if (left_dirty && !kthread_should_stop())
>> +		pr_err("%lu unsanitized pages\n", left_dirty);
> 
> That would be incorrect, if the function returned
> because of kthread stopped.


I should have highlighted this but in my example I changed
left_dirty to be "unsigned long" with the intention that the
"return -ECANCELED" is replaced with "return 0".

__sgx_sanitize_pages() returns 0 when it exits because of
kthread stopped.

To elaborate I was thinking about:

+static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list)
 {
+	unsigned long left_dirty = 0;
 	struct sgx_epc_page *page;
 	LIST_HEAD(dirty);
 	int ret;
 
-	/* dirty_page_list is thread-local, no need for a lock: */
 	while (!list_empty(dirty_page_list)) {
 		if (kthread_should_stop())
-			return;
+			return 0;
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
@@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 		} else {
 			/* The page is not yet clean - move to the dirty list. */
 			list_move_tail(&page->list, &dirty);
+			left_dirty++;
 		}
 
 		cond_resched();
 	}
 
 	list_splice(&dirty, dirty_page_list);
+	return left_dirty;
 }


and then with what I had in previous email the checks should work:

@@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
 
 static int ksgxd(void *p)
 {
+	unsigned long left_dirty;
+
 	set_freezable();
 
 	/*
@@ -395,10 +402,10 @@ static int ksgxd(void *p)
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
 	__sgx_sanitize_pages(&sgx_dirty_page_list);
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
 
-	/* sanity check: */
-	WARN_ON(!list_empty(&sgx_dirty_page_list));
+	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	if (left_dirty && !kthread_should_stop())
+		pr_err("%lu unsanitized pages\n", left_dirty);
 
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())


> 
> If you do the check here you already have a window
> where kthread could have been stopped anyhow.
> 
> So even this would be less correct:
> 
>         if (kthreas_should_stop()) {
>                 return 0;
>         }  else if (left_dirty) {
>                 pr_err("%lu unsanitized pages\n", left_dirty);
>         }
> 
> So in the end you end as complicated and less correct
> fix. This all is explained in the commit message.
> 
> If you unconditionally print error, you don't have
> a meaning for the number of unsanitized pags.

Understood that the goal is to only print the
number of unsanitized pages if ksgxd has not been
stopped prematurely.

Reinette
Jarkko Sakkinen Sept. 1, 2022, 11:56 p.m. UTC | #8
On Thu, Sep 01, 2022 at 03:34:10PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 9/1/2022 2:53 PM, Jarkko Sakkinen wrote:
> > On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
> >> On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
> 
> >> I think I am missing something here. A lot of logic is added here but I
> >> do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> >> the reclaimer was canceled. I am thus wondering, could the above not be
> >> simplified to something similar to V1:
> >>
> >> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> >>  
> >>  static int ksgxd(void *p)
> >>  {
> >> +	unsigned long left_dirty;
> >> +
> >>  	set_freezable();
> >>  
> >>  	/*
> >> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> >>  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >>  	 */
> >>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > 
> > IMHO, would make sense also to have here:
> > 
> >         if (!kthread_should_stop())
> >                 return 0;
> > 
> 
> Would this not prematurely stop the thread when it should not be?
> 
> >> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> >>  
> >> -	/* sanity check: */
> >> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> >> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> >> +	if (left_dirty && !kthread_should_stop())
> >> +		pr_err("%lu unsanitized pages\n", left_dirty);
> > 
> > That would be incorrect, if the function returned
> > because of kthread stopped.
> 
> 
> I should have highlighted this but in my example I changed
> left_dirty to be "unsigned long" with the intention that the
> "return -ECANCELED" is replaced with "return 0".

It wasn't supposed to be, it's an error. Thanks for spotting
that.

> 
> __sgx_sanitize_pages() returns 0 when it exits because of
> kthread stopped.
> 
> To elaborate I was thinking about:
> 
> +static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list)
>  {
> +	unsigned long left_dirty = 0;
>  	struct sgx_epc_page *page;
>  	LIST_HEAD(dirty);
>  	int ret;
>  
> -	/* dirty_page_list is thread-local, no need for a lock: */
>  	while (!list_empty(dirty_page_list)) {
>  		if (kthread_should_stop())
> -			return;
> +			return 0;
>  
>  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
>  
> @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
>  		} else {
>  			/* The page is not yet clean - move to the dirty list. */
>  			list_move_tail(&page->list, &dirty);
> +			left_dirty++;
>  		}
>  
>  		cond_resched();
>  	}
>  
>  	list_splice(&dirty, dirty_page_list);
> +	return left_dirty;
>  }
> 
> 
> and then with what I had in previous email the checks should work:
> 
> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>  
>  static int ksgxd(void *p)
>  {
> +	unsigned long left_dirty;
> +
>  	set_freezable();
>  
>  	/*
> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>  
> -	/* sanity check: */
> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	if (left_dirty && !kthread_should_stop())
> +		pr_err("%lu unsanitized pages\n", left_dirty);
>  
>  	while (!kthread_should_stop()) {
>  		if (try_to_freeze())
> 
> 
> > 
> > If you do the check here you already have a window
> > where kthread could have been stopped anyhow.
> > 
> > So even this would be less correct:
> > 
> >         if (kthreas_should_stop()) {
> >                 return 0;
> >         }  else if (left_dirty) {
> >                 pr_err("%lu unsanitized pages\n", left_dirty);
> >         }
> > 
> > So in the end you end as complicated and less correct
> > fix. This all is explained in the commit message.
> > 
> > If you unconditionally print error, you don't have
> > a meaning for the number of unsanitized pags.
> 
> Understood that the goal is to only print the
> number of unsanitized pages if ksgxd has not been
> stopped prematurely.

Yeah, and thus give as useful information for sysadmin/developer
as we can.

BR, Jarkko
Jarkko Sakkinen Sept. 2, 2022, 1:26 p.m. UTC | #9
On Fri, Sep 02, 2022 at 02:57:01AM +0300, Jarkko Sakkinen wrote:
> > Understood that the goal is to only print the
> > number of unsanitized pages if ksgxd has not been
> > stopped prematurely.
> 
> Yeah, and thus give as useful information for sysadmin/developer
> as we can.

I figured out how to get best of both worlds. See the
attached patch. I'll send formal series later on.

Keeps the accurancy, just postpones the exit.

BR, Jarkko
From 31c43c3276667cef0a7f0687d489552f26da877b Mon Sep 17 00:00:00 2001
From: Jarkko Sakkinen <jarkko@kernel.org>
Date: Thu, 25 Aug 2022 08:12:30 +0300
Subject: [PATCH] x86/sgx: Do not consider unsanitized pages an error

In sgx_init(), if misc_register() fails or misc_register() succeeds but
neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
prematurely stopped. This may leave some unsanitized pages, which does
not matter, because SGX will be disabled for the whole power cycle.

This triggers WARN_ON() because sgx_dirty_page_list ends up being
non-empty, and dumps the call stack:

[    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
[    0.268591] ------------[ cut here ]------------
[    0.268592] WARNING: CPU: 6 PID: 83 at
arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
[    0.268598] Modules linked in:
[    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
[    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
07/06/2022
[    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
[    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
[    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
[    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
0000000000000000
[    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
00000000ffffffff
[    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
ffff8dcd820a4180
[    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
ffffb6c74006bce0
[    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
0000000000000000
[    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
knlGS:0000000000000000
[    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
00000000003706e0
[    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    0.268622] Call Trace:
[    0.268624]  <TASK>
[    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
[    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[    0.268634]  ? __kthread_parkme+0x36/0x90
[    0.268637]  kthread+0xe5/0x110
[    0.268639]  ? kthread_complete_and_exit+0x20/0x20
[    0.268642]  ret_from_fork+0x1f/0x30
[    0.268647]  </TASK>
[    0.268648] ---[ end trace 0000000000000000 ]---

Ultimately this can crash the kernel, if the following is set:

	/proc/sys/kernel/panic_on_warn

In premature stop, print nothing, as the number is by practical means a
random number. Otherwise, it is an indicator of a bug in the driver, and
therefore print the number of unsanitized pages with pr_err().

Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

v6:
- Address Reinette's feedback:
  https://lore.kernel.org/linux-sgx/Yw6%2FiTzSdSw%2FY%2FVO@kernel.org/

v5:
- Add the klog dump and sysctl option to the commit message.

v4:
- Explain expectations for dirty_page_list in the function header, instead
  of an inline comment.
- Improve commit message to explain the conditions better.
- Return the number of pages left dirty to ksgxd() and print warning after
  the 2nd call, if there are any.

v3:
- Remove WARN_ON().
- Tuned comments and the commit message a bit.

v2:
- Replaced WARN_ON() with optional pr_info() inside
  __sgx_sanitize_pages().
- Rewrote the commit message.
- Added the fixes tag.
---
 arch/x86/kernel/cpu/sgx/main.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 515e2a5f25bb..f29dcaddd140 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -49,17 +49,23 @@ static LIST_HEAD(sgx_dirty_page_list);
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
  * from the input list, and made available for the page allocator. SECS pages
  * prepending their children in the input list are left intact.
+ *
+ * Contents of the @dirty_page_list must be thread-local, i.e.
+ * not shared by multiple threads.
+ *
+ * Return 0 when sanitization was successful or kthread was stopped, and the
+ * number of unsanitized pages otherwise.
  */
-static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
+static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list)
 {
 	struct sgx_epc_page *page;
+	long left_dirty = 0;
 	LIST_HEAD(dirty);
 	int ret;
 
-	/* dirty_page_list is thread-local, no need for a lock: */
 	while (!list_empty(dirty_page_list)) {
 		if (kthread_should_stop())
-			return;
+			return 0;
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
@@ -92,12 +98,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 		} else {
 			/* The page is not yet clean - move to the dirty list. */
 			list_move_tail(&page->list, &dirty);
+			left_dirty++;
 		}
 
 		cond_resched();
 	}
 
 	list_splice(&dirty, dirty_page_list);
+	return left_dirty;
 }
 
 static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
@@ -388,17 +396,28 @@ void sgx_reclaim_direct(void)
 
 static int ksgxd(void *p)
 {
+	unsigned long left_dirty;
+
 	set_freezable();
 
 	/*
 	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
+	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	pr_debug("%ld unsanitized pages\n", left_dirty);
 
-	/* sanity check: */
-	WARN_ON(!list_empty(&sgx_dirty_page_list));
+	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	/*
+	 * Never expected to happen in a working driver. If it happens the bug
+	 * is expected to be in the sanitization process, but successfully
+	 * sanitized pages are still valid and driver can be used and most
+	 * importantly debugged without issues. To put short, the global state
+	 * of kernel is not corrupted so no reason to do any more complicated
+	 * rollback.
+	 */
+	if (ret)
+		pr_err("%ld unsanitized pages\n", left_dirty);
 
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())
Jarkko Sakkinen Sept. 2, 2022, 3:53 p.m. UTC | #10
On Fri, Sep 02, 2022 at 04:26:51PM +0300, Jarkko Sakkinen wrote:
> +	if (ret)
> +		pr_err("%ld unsanitized pages\n", left_dirty);

Yeah, I know, should be 'left_dirty'. I just quickly drafted
the patch for the email.

BR, Jarkko
Reinette Chatre Sept. 2, 2022, 4:08 p.m. UTC | #11
Hi Jarkko,

On 9/2/2022 8:53 AM, Jarkko Sakkinen wrote:
> On Fri, Sep 02, 2022 at 04:26:51PM +0300, Jarkko Sakkinen wrote:
>> +	if (ret)
>> +		pr_err("%ld unsanitized pages\n", left_dirty);
> 
> Yeah, I know, should be 'left_dirty'. I just quickly drafted
> the patch for the email.
> 

No problem - you did mention that it was an informal patch.

(btw ... also watch out for the long local parameter returned
as an unsigned long and the signed vs unsigned printing
format string.) I also continue to recommend that you trim
that backtrace ... this patch is heading to x86 area where
this is required.

Reinette
Jarkko Sakkinen Sept. 2, 2022, 4:30 p.m. UTC | #12
On Fri, Sep 02, 2022 at 09:08:23AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 9/2/2022 8:53 AM, Jarkko Sakkinen wrote:
> > On Fri, Sep 02, 2022 at 04:26:51PM +0300, Jarkko Sakkinen wrote:
> >> +	if (ret)
> >> +		pr_err("%ld unsanitized pages\n", left_dirty);
> > 
> > Yeah, I know, should be 'left_dirty'. I just quickly drafted
> > the patch for the email.
> > 
> 
> No problem - you did mention that it was an informal patch.
> 
> (btw ... also watch out for the long local parameter returned
> as an unsigned long and the signed vs unsigned printing
> format string.) I also continue to recommend that you trim

Point taken.

> that backtrace ... this patch is heading to x86 area where
> this is required.

Should I just cut the whole stack trace, and leave the
part before it?

BR, Jarkko
Reinette Chatre Sept. 2, 2022, 5:38 p.m. UTC | #13
Hi Jarkko,

On 9/2/2022 9:30 AM, Jarkko Sakkinen wrote:
> On Fri, Sep 02, 2022 at 09:08:23AM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 9/2/2022 8:53 AM, Jarkko Sakkinen wrote:
>>> On Fri, Sep 02, 2022 at 04:26:51PM +0300, Jarkko Sakkinen wrote:
>>>> +	if (ret)
>>>> +		pr_err("%ld unsanitized pages\n", left_dirty);
>>>
>>> Yeah, I know, should be 'left_dirty'. I just quickly drafted
>>> the patch for the email.
>>>
>>
>> No problem - you did mention that it was an informal patch.
>>
>> (btw ... also watch out for the long local parameter returned
>> as an unsigned long and the signed vs unsigned printing
>> format string.) I also continue to recommend that you trim
> 
> Point taken.
> 
>> that backtrace ... this patch is heading to x86 area where
>> this is required.
> 
> Should I just cut the whole stack trace, and leave the
> part before it?

The trace is printed because of a WARN_ON() in the code.
I do not think there is anything very helpful in that trace.
I think the only helpful parts are the WARN itself that includes
the line number and then information on which kernel it was
encountered on.

How about something like (please note the FIXME within):

"
Paul reported the following WARN while running kernel vFIXME:
  WARNING: CPU: 6 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0

This is the WARN_ON() within ksgxd() that is triggered when
sgx_dirty_page_list (the list containing unsanitized pages) is non-empty.

In sgx_init(), if misc_register() fails or misc_register() succeeds but
neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
prematurely stopped. This may leave some unsanitized pages, which does
not matter, because SGX will be disabled for the whole power cycle.

Ultimately this can crash the kernel, if the following is set:

	/proc/sys/kernel/panic_on_warn

In premature stop, print nothing, as the number is by practical means a
random number. Otherwise, it is an indicator of a bug in the driver, and
therefore print the number of unsanitized pages with pr_err().
"

Reinette
Jarkko Sakkinen Sept. 2, 2022, 7:20 p.m. UTC | #14
On Fri, Sep 02, 2022 at 10:38:34AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 9/2/2022 9:30 AM, Jarkko Sakkinen wrote:
> > On Fri, Sep 02, 2022 at 09:08:23AM -0700, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 9/2/2022 8:53 AM, Jarkko Sakkinen wrote:
> >>> On Fri, Sep 02, 2022 at 04:26:51PM +0300, Jarkko Sakkinen wrote:
> >>>> +	if (ret)
> >>>> +		pr_err("%ld unsanitized pages\n", left_dirty);
> >>>
> >>> Yeah, I know, should be 'left_dirty'. I just quickly drafted
> >>> the patch for the email.
> >>>
> >>
> >> No problem - you did mention that it was an informal patch.
> >>
> >> (btw ... also watch out for the long local parameter returned
> >> as an unsigned long and the signed vs unsigned printing
> >> format string.) I also continue to recommend that you trim
> > 
> > Point taken.
> > 
> >> that backtrace ... this patch is heading to x86 area where
> >> this is required.
> > 
> > Should I just cut the whole stack trace, and leave the
> > part before it?
> 
> The trace is printed because of a WARN_ON() in the code.
> I do not think there is anything very helpful in that trace.
> I think the only helpful parts are the WARN itself that includes
> the line number and then information on which kernel it was
> encountered on.
> 
> How about something like (please note the FIXME within):
> 
> "
> Paul reported the following WARN while running kernel vFIXME:
>   WARNING: CPU: 6 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0

Yeah, this is a great idea, the use of WARN() is the whole point.
Thank you.

BR, Jarkko
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 515e2a5f25bb..bcd6b64961bd 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -49,17 +49,20 @@  static LIST_HEAD(sgx_dirty_page_list);
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
  * from the input list, and made available for the page allocator. SECS pages
  * prepending their children in the input list are left intact.
+ *
+ * Contents of the @dirty_page_list must be thread-local, i.e.
+ * not shared by multiple threads.
  */
-static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
+static long __sgx_sanitize_pages(struct list_head *dirty_page_list)
 {
 	struct sgx_epc_page *page;
+	long left_dirty = 0;
 	LIST_HEAD(dirty);
 	int ret;
 
-	/* dirty_page_list is thread-local, no need for a lock: */
 	while (!list_empty(dirty_page_list)) {
 		if (kthread_should_stop())
-			return;
+			return -ECANCELED;
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
@@ -92,12 +95,14 @@  static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 		} else {
 			/* The page is not yet clean - move to the dirty list. */
 			list_move_tail(&page->list, &dirty);
+			left_dirty++;
 		}
 
 		cond_resched();
 	}
 
 	list_splice(&dirty, dirty_page_list);
+	return left_dirty;
 }
 
 static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
@@ -388,17 +393,40 @@  void sgx_reclaim_direct(void)
 
 static int ksgxd(void *p)
 {
+	long ret;
+
 	set_freezable();
 
 	/*
 	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
+	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	if (ret == -ECANCELED)
+		/* kthread stopped */
+		return 0;
 
-	/* sanity check: */
-	WARN_ON(!list_empty(&sgx_dirty_page_list));
+	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	switch (ret) {
+	case 0:
+		/* success, no unsanitized pages */
+		break;
+
+	case -ECANCELED:
+		/* kthread stopped */
+		return 0;
+
+	default:
+		/*
+		 * Never expected to happen in a working driver. If it happens
+		 * the bug is expected to be in the sanitization process, but
+		 * successfully sanitized pages are still valid and driver can
+		 * be used and most importantly debugged without issues. To put
+		 * short, the global state of kernel is not corrupted so no
+		 * reason to do any more complicated rollback.
+		 */
+		pr_err("%ld unsanitized pages\n", ret);
+	}
 
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())