diff mbox series

[1/6] x86/sgx: Do not consider unsanitized pages an error

Message ID 20220830031206.13449-2-jarkko@kernel.org (mailing list archive)
State New, archived
Headers show
Series x86/sgx: Test and fixes | expand

Commit Message

Jarkko Sakkinen Aug. 30, 2022, 3:12 a.m. UTC
In sgx_init(), if misc_register() for the provision device fails, and
neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
prematurely stopped.

This triggers WARN_ON() because sgx_dirty_page_list ends up being
non-empty, and dumps the call stack:

[    0.000000] Linux version 6.0.0-rc2 (root@4beb429beb4a) (gcc (Debian
11.3.0-3) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38) #382 SMP
PREEMPT_DYNAMIC Fri Aug 26 12:52:15 UTC 2022
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.0.0-rc2
root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
module_blacklist=psmouse initcall_debug log_buf_len=4M cryptomgr.notests
[…]
[    0.268089] calling  sgx_init+0x0/0x409 @ 1
[    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
[    0.268591] ------------[ cut here ]------------
[    0.268592] WARNING: CPU: 6 PID: 83 at
arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
[    0.268598] Modules linked in:
[    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
[    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
07/06/2022
[    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
[    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
[    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
[    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
0000000000000000
[    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
00000000ffffffff
[    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
ffff8dcd820a4180
[    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
ffffb6c74006bce0
[    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
0000000000000000
[    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
knlGS:0000000000000000
[    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
00000000003706e0
[    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    0.268622] Call Trace:
[    0.268624]  <TASK>
[    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
[    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[    0.268634]  ? __kthread_parkme+0x36/0x90
[    0.268637]  kthread+0xe5/0x110
[    0.268639]  ? kthread_complete_and_exit+0x20/0x20
[    0.268642]  ret_from_fork+0x1f/0x30
[    0.268647]  </TASK>
[    0.268648] ---[ end trace 0000000000000000 ]---
[    0.268694] initcall sgx_init+0x0/0x409 returned -19 after 603 usecs

Ultimately this can crash the kernel, if the following is set:

	/proc/sys/kernel/panic_on_warn

Print a simple warning instead, and improve the output by printing the
number of unsanitized pages, in order to provide debug informnation for
future needs.

Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
Cc: Haitao Huang <haitao.huang@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>

v5:
- Add the klog dump and sysctl option to the commit message.

v4:
- Explain expectations for dirty_page_list in the function header, instead
  of an inline comment.
- Improve commit message to explain the conditions better.
- Return the number of pages left dirty to ksgxd() and print warning after
  the 2nd call, if there are any.

v3:
- Remove WARN_ON().
- Tuned comments and the commit message a bit.

v2:
- Replaced WARN_ON() with optional pr_info() inside
  __sgx_sanitize_pages().
- Rewrote the commit message.
- Added the fixes tag.
---
 arch/x86/kernel/cpu/sgx/main.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

Comments

Reinette Chatre Aug. 30, 2022, 10:54 p.m. UTC | #1
Hi Jarkko,

On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> In sgx_init(), if misc_register() for the provision device fails, and
> neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> prematurely stopped.

I do not think misc_register() is required to fail for the scenario to
be triggered (rather use "or" than "and"?). Perhaps just
"In sgx_init(), if a failure is encountered after ksgxd is started
(via sgx_page_reclaimer_init()) ...".

To help the reader understand the subject of this patch it may help
to explain that prematurely stopping ksgxd may leave some
unsanitized pages, but that is not a problem since SGX cannot
be used on the platform anyway. 

> This triggers WARN_ON() because sgx_dirty_page_list ends up being
> non-empty, and dumps the call stack:
> 

Traces like below can be frowned upon. I recommend that you follow the
guidance in "Backtraces in commit mesages"(sic) in 
Documentation/process/submitting-patches.rst.

> [    0.000000] Linux version 6.0.0-rc2 (root@4beb429beb4a) (gcc (Debian
> 11.3.0-3) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38) #382 SMP
> PREEMPT_DYNAMIC Fri Aug 26 12:52:15 UTC 2022
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.0.0-rc2
> root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> module_blacklist=psmouse initcall_debug log_buf_len=4M cryptomgr.notests
> […]
> [    0.268089] calling  sgx_init+0x0/0x409 @ 1
> [    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
> [    0.268591] ------------[ cut here ]------------
> [    0.268592] WARNING: CPU: 6 PID: 83 at
> arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> [    0.268598] Modules linked in:
> [    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
> [    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
> 07/06/2022
> [    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
> [    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
> 84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
> ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
> [    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
> [    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
> 0000000000000000
> [    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
> 00000000ffffffff
> [    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
> ffff8dcd820a4180
> [    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
> ffffb6c74006bce0
> [    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
> 0000000000000000
> [    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
> knlGS:0000000000000000
> [    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
> 00000000003706e0
> [    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [    0.268622] Call Trace:
> [    0.268624]  <TASK>
> [    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
> [    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [    0.268634]  ? __kthread_parkme+0x36/0x90
> [    0.268637]  kthread+0xe5/0x110
> [    0.268639]  ? kthread_complete_and_exit+0x20/0x20
> [    0.268642]  ret_from_fork+0x1f/0x30
> [    0.268647]  </TASK>
> [    0.268648] ---[ end trace 0000000000000000 ]---
> [    0.268694] initcall sgx_init+0x0/0x409 returned -19 after 603 usecs
> 
> Ultimately this can crash the kernel, if the following is set:
> 
> 	/proc/sys/kernel/panic_on_warn
> 
> Print a simple warning instead, and improve the output by printing the
> number of unsanitized pages, in order to provide debug informnation for
> future needs.

informnation -> information

 
...

> Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

Should this go to stable?

> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 515e2a5f25bb..903100fcfce3 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
>   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
>   * from the input list, and made available for the page allocator. SECS pages
>   * prepending their children in the input list are left intact.
> + *
> + * Contents of the @dirty_page_list must be thread-local, i.e.
> + * not shared by multiple threads.

Did you intend to mention something about the needed locking here? It looks
like some information is lost during the move to the function description.

>   */
> -static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> +static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
>  {
>  	struct sgx_epc_page *page;
> +	int left_dirty = 0;

I do not know how many pages this code should be ready for but at least
this could handle more by being an unsigned int considering that it is
always positive ... maybe even unsigned long?

>  	LIST_HEAD(dirty);
>  	int ret;
>  
> -	/* dirty_page_list is thread-local, no need for a lock: */
>  	while (!list_empty(dirty_page_list)) {
>  		if (kthread_should_stop())
> -			return;
> +			break;
>  
>  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
>  
> @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
>  		} else {
>  			/* The page is not yet clean - move to the dirty list. */
>  			list_move_tail(&page->list, &dirty);
> +			left_dirty++;
>  		}
>  
>  		cond_resched();
>  	}
>  
>  	list_splice(&dirty, dirty_page_list);
> +	return left_dirty;
>  }
>  
>  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>  
>  static int ksgxd(void *p)
>  {
> +	int left_dirty;
> +
>  	set_freezable();
>  
>  	/*
> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>  
> -	/* sanity check: */
> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	if (left_dirty)
> +		pr_warn("%d unsanitized pages\n", left_dirty);
>  
>  	while (!kthread_should_stop()) {
>  		if (try_to_freeze())


Reinette
Huang, Kai Aug. 31, 2022, 1:27 a.m. UTC | #2
On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > In sgx_init(), if misc_register() for the provision device fails, and
> > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > prematurely stopped.
> 
> I do not think misc_register() is required to fail for the scenario to
> be triggered (rather use "or" than "and"?). Perhaps just
> "In sgx_init(), if a failure is encountered after ksgxd is started
> (via sgx_page_reclaimer_init()) ...".

IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
be initialized successfully, sgx_init() still returns 0.

Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
initialized successfully.  Then the code change in this patch won't be necessary
if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
early stage before we are sure either the driver or KVM SGX will work.

Btw currently EPC pages assigned to KVM guest cannot be reclaimed, so
theoretically ksgxd can be moved to sgx_drv_init(), but who knows someday we
will decide to make KVM guest EPC pages to be able to be reclaimed. :)
Jarkko Sakkinen Aug. 31, 2022, 1:55 a.m. UTC | #3
On Tue, Aug 30, 2022 at 03:54:27PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > In sgx_init(), if misc_register() for the provision device fails, and
> > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > prematurely stopped.
> 
> I do not think misc_register() is required to fail for the scenario to
> be triggered (rather use "or" than "and"?). Perhaps just
> "In sgx_init(), if a failure is encountered after ksgxd is started
> (via sgx_page_reclaimer_init()) ...".

This would be the fixed version of the sentence:

"
In sgx_init(), if misc_register() fails or misc_register() succeeds but
neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
prematurely stopped. This may leave some unsanitized pages, which does
not matter, because SGX will be disabled for the whole power cycle.
"

I want to keep the end states listed and not make it more abstract.

The second sentence addresses the remark below.

> To help the reader understand the subject of this patch it may help
> to explain that prematurely stopping ksgxd may leave some
> unsanitized pages, but that is not a problem since SGX cannot
> be used on the platform anyway. 
> 
> > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > non-empty, and dumps the call stack:
> > 
> 
> Traces like below can be frowned upon. I recommend that you follow the
> guidance in "Backtraces in commit mesages"(sic) in 
> Documentation/process/submitting-patches.rst.
> 
> > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0

Is this good enough? I had not actually spotted this section before but
nice that it exists. Apparently has been added in 5.12.

>> > 
> > Ultimately this can crash the kernel, if the following is set:
> > 
> > 	/proc/sys/kernel/panic_on_warn
> > 
> > Print a simple warning instead, and improve the output by printing the
> > number of unsanitized pages, in order to provide debug informnation for
> > future needs.
> 
> informnation -> information

+1

> 
>  
> ...
> 
> > Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
> > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> 
> Should this go to stable?

I guess it should. The hard reason for this that it can panic
the kernel.

> 
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 515e2a5f25bb..903100fcfce3 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
> >   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
> >   * from the input list, and made available for the page allocator. SECS pages
> >   * prepending their children in the input list are left intact.
> > + *
> > + * Contents of the @dirty_page_list must be thread-local, i.e.
> > + * not shared by multiple threads.
> 
> Did you intend to mention something about the needed locking here? It looks
> like some information is lost during the move to the function description.

Nothing about the locking that concerns the parameter, as the
sentence defines clear constraints for the caller.

> 
> >   */
> > -static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > +static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
> >  {
> >  	struct sgx_epc_page *page;
> > +	int left_dirty = 0;
> 
> I do not know how many pages this code should be ready for but at least
> this could handle more by being an unsigned int considering that it is
> always positive ... maybe even unsigned long?

I would go for 'long'. More information below.

> 
> >  	LIST_HEAD(dirty);
> >  	int ret;
> >  
> > -	/* dirty_page_list is thread-local, no need for a lock: */
> >  	while (!list_empty(dirty_page_list)) {
> >  		if (kthread_should_stop())
> > -			return;
> > +			break;
> >  
> >  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
> >  
> > @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> >  		} else {
> >  			/* The page is not yet clean - move to the dirty list. */
> >  			list_move_tail(&page->list, &dirty);
> > +			left_dirty++;
> >  		}
> >  
> >  		cond_resched();
> >  	}
> >  
> >  	list_splice(&dirty, dirty_page_list);
> > +	return left_dirty;
> >  }
> >  
> >  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
> > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> >  
> >  static int ksgxd(void *p)
> >  {
> > +	int left_dirty;
> > +
> >  	set_freezable();
> >  
> >  	/*
> > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >  	 */
> >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> >  
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	if (left_dirty)
> > +		pr_warn("%d unsanitized pages\n", left_dirty);
> >  
> >  	while (!kthread_should_stop()) {
> >  		if (try_to_freeze())
> 
> 
> Reinette

We need to return -ECANCELED on premature stop, and number of
pages otherwise.

In premature stop, nothing should be printed, as the number
is by practical means a random number. Otherwise, it is an
indicator of a bug in the driver, and therefore a non-zero
number should be printed pr_err(), if that happens after the
second call.

Thanks for feedback.

BR, Jarkko
Jarkko Sakkinen Aug. 31, 2022, 1:58 a.m. UTC | #4
On Wed, Aug 31, 2022 at 04:55:24AM +0300, Jarkko Sakkinen wrote:
> On Tue, Aug 30, 2022 at 03:54:27PM -0700, Reinette Chatre wrote:
> > Hi Jarkko,
> > 
> > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > In sgx_init(), if misc_register() for the provision device fails, and
> > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > prematurely stopped.
> > 
> > I do not think misc_register() is required to fail for the scenario to
> > be triggered (rather use "or" than "and"?). Perhaps just
> > "In sgx_init(), if a failure is encountered after ksgxd is started
> > (via sgx_page_reclaimer_init()) ...".
> 
> This would be the fixed version of the sentence:
> 
> "
> In sgx_init(), if misc_register() fails or misc_register() succeeds but
> neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> prematurely stopped. This may leave some unsanitized pages, which does
> not matter, because SGX will be disabled for the whole power cycle.
> "
> 
> I want to keep the end states listed and not make it more abstract.
> 
> The second sentence addresses the remark below.
> 
> > To help the reader understand the subject of this patch it may help
> > to explain that prematurely stopping ksgxd may leave some
> > unsanitized pages, but that is not a problem since SGX cannot
> > be used on the platform anyway. 
> > 
> > > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > > non-empty, and dumps the call stack:
> > > 
> > 
> > Traces like below can be frowned upon. I recommend that you follow the
> > guidance in "Backtraces in commit mesages"(sic) in 
> > Documentation/process/submitting-patches.rst.
> > 
> > > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> 
> Is this good enough? I had not actually spotted this section before but
> nice that it exists. Apparently has been added in 5.12.
> 
> >> > 
> > > Ultimately this can crash the kernel, if the following is set:
> > > 
> > > 	/proc/sys/kernel/panic_on_warn
> > > 
> > > Print a simple warning instead, and improve the output by printing the
> > > number of unsanitized pages, in order to provide debug informnation for
> > > future needs.
> > 
> > informnation -> information
> 
> +1
> 
> > 
> >  
> > ...
> > 
> > > Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
> > > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
> > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > 
> > Should this go to stable?
> 
> I guess it should. The hard reason for this that it can panic
> the kernel.
> 
> > 
> > > 
> > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > index 515e2a5f25bb..903100fcfce3 100644
> > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > @@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
> > >   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
> > >   * from the input list, and made available for the page allocator. SECS pages
> > >   * prepending their children in the input list are left intact.
> > > + *
> > > + * Contents of the @dirty_page_list must be thread-local, i.e.
> > > + * not shared by multiple threads.
> > 
> > Did you intend to mention something about the needed locking here? It looks
> > like some information is lost during the move to the function description.
> 
> Nothing about the locking that concerns the parameter, as the
> sentence defines clear constraints for the caller.
> 
> > 
> > >   */
> > > -static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > > +static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > >  {
> > >  	struct sgx_epc_page *page;
> > > +	int left_dirty = 0;
> > 
> > I do not know how many pages this code should be ready for but at least
> > this could handle more by being an unsigned int considering that it is
> > always positive ... maybe even unsigned long?
> 
> I would go for 'long'. More information below.
> 
> > 
> > >  	LIST_HEAD(dirty);
> > >  	int ret;
> > >  
> > > -	/* dirty_page_list is thread-local, no need for a lock: */
> > >  	while (!list_empty(dirty_page_list)) {
> > >  		if (kthread_should_stop())
> > > -			return;
> > > +			break;
> > >  
> > >  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
> > >  
> > > @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > >  		} else {
> > >  			/* The page is not yet clean - move to the dirty list. */
> > >  			list_move_tail(&page->list, &dirty);
> > > +			left_dirty++;
> > >  		}
> > >  
> > >  		cond_resched();
> > >  	}
> > >  
> > >  	list_splice(&dirty, dirty_page_list);
> > > +	return left_dirty;
> > >  }
> > >  
> > >  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
> > > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> > >  
> > >  static int ksgxd(void *p)
> > >  {
> > > +	int left_dirty;
> > > +
> > >  	set_freezable();
> > >  
> > >  	/*
> > > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> > >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> > >  	 */
> > >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > >  
> > > -	/* sanity check: */
> > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	if (left_dirty)
> > > +		pr_warn("%d unsanitized pages\n", left_dirty);
> > >  
> > >  	while (!kthread_should_stop()) {
> > >  		if (try_to_freeze())
> > 
> > 
> > Reinette
> 
> We need to return -ECANCELED on premature stop, and number of
> pages otherwise.
> 
> In premature stop, nothing should be printed, as the number
> is by practical means a random number. Otherwise, it is an
> indicator of a bug in the driver, and therefore a non-zero
> number should be printed pr_err(), if that happens after the
> second call.

I.e. even though we print less we get more *information* what
is going inside the kernel. Warning is not correct for either
path IMHO.

BR, Jarkko
Jarkko Sakkinen Aug. 31, 2022, 2:01 a.m. UTC | #5
On Wed, Aug 31, 2022 at 04:58:58AM +0300, Jarkko Sakkinen wrote:
> On Wed, Aug 31, 2022 at 04:55:24AM +0300, Jarkko Sakkinen wrote:
> > On Tue, Aug 30, 2022 at 03:54:27PM -0700, Reinette Chatre wrote:
> > > Hi Jarkko,
> > > 
> > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > prematurely stopped.
> > > 
> > > I do not think misc_register() is required to fail for the scenario to
> > > be triggered (rather use "or" than "and"?). Perhaps just
> > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > (via sgx_page_reclaimer_init()) ...".
> > 
> > This would be the fixed version of the sentence:
> > 
> > "
> > In sgx_init(), if misc_register() fails or misc_register() succeeds but
> > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > prematurely stopped. This may leave some unsanitized pages, which does
> > not matter, because SGX will be disabled for the whole power cycle.
> > "
> > 
> > I want to keep the end states listed and not make it more abstract.
> > 
> > The second sentence addresses the remark below.
> > 
> > > To help the reader understand the subject of this patch it may help
> > > to explain that prematurely stopping ksgxd may leave some
> > > unsanitized pages, but that is not a problem since SGX cannot
> > > be used on the platform anyway. 
> > > 
> > > > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > > > non-empty, and dumps the call stack:
> > > > 
> > > 
> > > Traces like below can be frowned upon. I recommend that you follow the
> > > guidance in "Backtraces in commit mesages"(sic) in 
> > > Documentation/process/submitting-patches.rst.
> > > 
> > > > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > > > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> > 
> > Is this good enough? I had not actually spotted this section before but
> > nice that it exists. Apparently has been added in 5.12.
> > 
> > >> > 
> > > > Ultimately this can crash the kernel, if the following is set:
> > > > 
> > > > 	/proc/sys/kernel/panic_on_warn
> > > > 
> > > > Print a simple warning instead, and improve the output by printing the
> > > > number of unsanitized pages, in order to provide debug informnation for
> > > > future needs.
> > > 
> > > informnation -> information
> > 
> > +1
> > 
> > > 
> > >  
> > > ...
> > > 
> > > > Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
> > > > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > > Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
> > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > 
> > > Should this go to stable?
> > 
> > I guess it should. The hard reason for this that it can panic
> > the kernel.
> > 
> > > 
> > > > 
> > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > > index 515e2a5f25bb..903100fcfce3 100644
> > > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > > @@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
> > > >   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
> > > >   * from the input list, and made available for the page allocator. SECS pages
> > > >   * prepending their children in the input list are left intact.
> > > > + *
> > > > + * Contents of the @dirty_page_list must be thread-local, i.e.
> > > > + * not shared by multiple threads.
> > > 
> > > Did you intend to mention something about the needed locking here? It looks
> > > like some information is lost during the move to the function description.
> > 
> > Nothing about the locking that concerns the parameter, as the
> > sentence defines clear constraints for the caller.
> > 
> > > 
> > > >   */
> > > > -static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > > > +static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > > >  {
> > > >  	struct sgx_epc_page *page;
> > > > +	int left_dirty = 0;
> > > 
> > > I do not know how many pages this code should be ready for but at least
> > > this could handle more by being an unsigned int considering that it is
> > > always positive ... maybe even unsigned long?
> > 
> > I would go for 'long'. More information below.
> > 
> > > 
> > > >  	LIST_HEAD(dirty);
> > > >  	int ret;
> > > >  
> > > > -	/* dirty_page_list is thread-local, no need for a lock: */
> > > >  	while (!list_empty(dirty_page_list)) {
> > > >  		if (kthread_should_stop())
> > > > -			return;
> > > > +			break;
> > > >  
> > > >  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
> > > >  
> > > > @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > > >  		} else {
> > > >  			/* The page is not yet clean - move to the dirty list. */
> > > >  			list_move_tail(&page->list, &dirty);
> > > > +			left_dirty++;
> > > >  		}
> > > >  
> > > >  		cond_resched();
> > > >  	}
> > > >  
> > > >  	list_splice(&dirty, dirty_page_list);
> > > > +	return left_dirty;
> > > >  }
> > > >  
> > > >  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
> > > > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> > > >  
> > > >  static int ksgxd(void *p)
> > > >  {
> > > > +	int left_dirty;
> > > > +
> > > >  	set_freezable();
> > > >  
> > > >  	/*
> > > > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> > > >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> > > >  	 */
> > > >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > >  
> > > > -	/* sanity check: */
> > > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > +	if (left_dirty)
> > > > +		pr_warn("%d unsanitized pages\n", left_dirty);
> > > >  
> > > >  	while (!kthread_should_stop()) {
> > > >  		if (try_to_freeze())
> > > 
> > > 
> > > Reinette
> > 
> > We need to return -ECANCELED on premature stop, and number of
> > pages otherwise.
> > 
> > In premature stop, nothing should be printed, as the number
> > is by practical means a random number. Otherwise, it is an
> > indicator of a bug in the driver, and therefore a non-zero
> > number should be printed pr_err(), if that happens after the
> > second call.
> 
> I.e. even though we print less we get more *information* what
> is going inside the kernel. Warning is not correct for either
> path IMHO.

Oh, sorry, I forgot one thing. The devices should be actually
deinitialized in the error case. We do not want to leave a
broken driver running.

BR, Jarkko
Jarkko Sakkinen Aug. 31, 2022, 2:15 a.m. UTC | #6
On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > Hi Jarkko,
> > 
> > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > In sgx_init(), if misc_register() for the provision device fails, and
> > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > prematurely stopped.
> > 
> > I do not think misc_register() is required to fail for the scenario to
> > be triggered (rather use "or" than "and"?). Perhaps just
> > "In sgx_init(), if a failure is encountered after ksgxd is started
> > (via sgx_page_reclaimer_init()) ...".
> 
> IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> be initialized successfully, sgx_init() still returns 0.
> 
> Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> initialized successfully.  Then the code change in this patch won't be necessary
> if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> early stage before we are sure either the driver or KVM SGX will work.

I would focus fixing the existing flow rather than reinventing the flow.

It can be made to work, and therefore it is IMHO correct action to take.

> Btw currently EPC pages assigned to KVM guest cannot be reclaimed, so
> theoretically ksgxd can be moved to sgx_drv_init(), but who knows someday we
> will decide to make KVM guest EPC pages to be able to be reclaimed. :)

I'm open to changes but it is in my opinion out of context for this.

> 
> 
> 
> -- 
> Thanks,
> -Kai
> 
> 

BR, Jarkko
Huang, Kai Aug. 31, 2022, 2:35 a.m. UTC | #7
On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > Hi Jarkko,
> > > 
> > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > prematurely stopped.
> > > 
> > > I do not think misc_register() is required to fail for the scenario to
> > > be triggered (rather use "or" than "and"?). Perhaps just
> > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > (via sgx_page_reclaimer_init()) ...".
> > 
> > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > be initialized successfully, sgx_init() still returns 0.
> > 
> > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > initialized successfully.  Then the code change in this patch won't be necessary
> > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > early stage before we are sure either the driver or KVM SGX will work.
> 
> I would focus fixing the existing flow rather than reinventing the flow.
> 
> It can be made to work, and therefore it is IMHO correct action to take.

From another perspective, the *existing flow* is the reason which causes this
bug.  A real fix is to fix the flow itself.

> 
> > Btw currently EPC pages assigned to KVM guest cannot be reclaimed, so
> > theoretically ksgxd can be moved to sgx_drv_init(), but who knows someday we
> > will decide to make KVM guest EPC pages to be able to be reclaimed. :)
> 
> I'm open to changes but it is in my opinion out of context for this.
> 
> 

Yeah.  I was expressing the reason I suggested to move sgx_page_reclaimer_init()
to the end of sgx_init(), but not to sgx_drv_init().

But moving to sgx_drv_init() also makes sense to me given KVM guest EPC pages
are not reclaimable now.  For now there's no reason to run ksgxd() if only
virtual EPC driver is enabled.  We can move sgx_page_reclaimer_init() out of
sgx_drv_init() when we add KVM guest EPC page reclaiming support (if it happens
in the future).
Jarkko Sakkinen Aug. 31, 2022, 2:44 a.m. UTC | #8
On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > Hi Jarkko,
> > > > 
> > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > prematurely stopped.
> > > > 
> > > > I do not think misc_register() is required to fail for the scenario to
> > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > (via sgx_page_reclaimer_init()) ...".
> > > 
> > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > be initialized successfully, sgx_init() still returns 0.
> > > 
> > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > initialized successfully.  Then the code change in this patch won't be necessary
> > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > early stage before we are sure either the driver or KVM SGX will work.
> > 
> > I would focus fixing the existing flow rather than reinventing the flow.
> > 
> > It can be made to work, and therefore it is IMHO correct action to take.
> 
> From another perspective, the *existing flow* is the reason which causes this
> bug.  A real fix is to fix the flow itself.

Any existing flow in part of the kernel can have a bug. That
does not mean that switching flow would be proper way to fix
a bug.

BR, Jarkko
Huang, Kai Aug. 31, 2022, 2:55 a.m. UTC | #9
On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > Hi Jarkko,
> > > > > 
> > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > prematurely stopped.
> > > > > 
> > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > (via sgx_page_reclaimer_init()) ...".
> > > > 
> > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > be initialized successfully, sgx_init() still returns 0.
> > > > 
> > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > early stage before we are sure either the driver or KVM SGX will work.
> > > 
> > > I would focus fixing the existing flow rather than reinventing the flow.
> > > 
> > > It can be made to work, and therefore it is IMHO correct action to take.
> > 
> > From another perspective, the *existing flow* is the reason which causes this
> > bug.  A real fix is to fix the flow itself.
> 
> Any existing flow in part of the kernel can have a bug. That
> does not mean that switching flow would be proper way to fix
> a bug.
> 
> BR, Jarkko

Yes but I think this is only true when the flow is reasonable.  If the flow
itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).

Anyway, let us also hear from others.
Jarkko Sakkinen Aug. 31, 2022, 2:57 a.m. UTC | #10
On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > Hi Jarkko,
> > > > > > 
> > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > > prematurely stopped.
> > > > > > 
> > > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > 
> > > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > 
> > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > > early stage before we are sure either the driver or KVM SGX will work.
> > > > 
> > > > I would focus fixing the existing flow rather than reinventing the flow.
> > > > 
> > > > It can be made to work, and therefore it is IMHO correct action to take.
> > > 
> > > From another perspective, the *existing flow* is the reason which causes this
> > > bug.  A real fix is to fix the flow itself.
> > 
> > Any existing flow in part of the kernel can have a bug. That
> > does not mean that switching flow would be proper way to fix
> > a bug.
> > 
> > BR, Jarkko
> 
> Yes but I think this is only true when the flow is reasonable.  If the flow
> itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).
> 
> Anyway, let us also hear from others.

The flow can be made to work without issues, which in the
context of a bug fix is exactly what a bug fix should do.
Not more or less.

You don't gain any measurable value for the user with this
switch idea.

BR, Jarkko
Jarkko Sakkinen Aug. 31, 2022, 3:10 a.m. UTC | #11
On Wed, Aug 31, 2022 at 05:57:22AM +0300, jarkko@kernel.org wrote:
> On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > > Hi Jarkko,
> > > > > > > 
> > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > > > prematurely stopped.
> > > > > > > 
> > > > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > > 
> > > > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > > 
> > > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > > > early stage before we are sure either the driver or KVM SGX will work.
> > > > > 
> > > > > I would focus fixing the existing flow rather than reinventing the flow.
> > > > > 
> > > > > It can be made to work, and therefore it is IMHO correct action to take.
> > > > 
> > > > From another perspective, the *existing flow* is the reason which causes this
> > > > bug.  A real fix is to fix the flow itself.
> > > 
> > > Any existing flow in part of the kernel can have a bug. That
> > > does not mean that switching flow would be proper way to fix
> > > a bug.
> > > 
> > > BR, Jarkko
> > 
> > Yes but I think this is only true when the flow is reasonable.  If the flow
> > itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).
> > 
> > Anyway, let us also hear from others.
> 
> The flow can be made to work without issues, which in the
> context of a bug fix is exactly what a bug fix should do.
> Not more or less.
> 
> You don't gain any measurable value for the user with this
> switch idea.

And besides this not proper way to review patch anyway because you did
not review the code. I'll focus on fix what is broken e.g. so that it
is easy to backport to stable and distro kernels, and call it a day.
It certainly does not have to make code "perfect", as long as known
bugs are sorted out.

You are welcome to review the next version of the patch, once I've
resolved the issues that were pointed out by Reinette, if you still
see some issue but this type of speculative discussion is frankly just
wasting everyones time.

(need to check my mutt config, do not know why it is not always
putting real name to from field)

BR, Jarkko
Huang, Kai Aug. 31, 2022, 3:17 a.m. UTC | #12
On Wed, 2022-08-31 at 05:57 +0300, jarkko@kernel.org wrote:
> On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > > Hi Jarkko,
> > > > > > > 
> > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > > > prematurely stopped.
> > > > > > > 
> > > > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > > 
> > > > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > > 
> > > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > > > early stage before we are sure either the driver or KVM SGX will work.
> > > > > 
> > > > > I would focus fixing the existing flow rather than reinventing the flow.
> > > > > 
> > > > > It can be made to work, and therefore it is IMHO correct action to take.
> > > > 
> > > > From another perspective, the *existing flow* is the reason which causes this
> > > > bug.  A real fix is to fix the flow itself.
> > > 
> > > Any existing flow in part of the kernel can have a bug. That
> > > does not mean that switching flow would be proper way to fix
> > > a bug.
> > > 
> > > BR, Jarkko
> > 
> > Yes but I think this is only true when the flow is reasonable.  If the flow
> > itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).
> > 
> > Anyway, let us also hear from others.
> 
> The flow can be made to work without issues, which in the
> context of a bug fix is exactly what a bug fix should do.
> Not more or less.

No. To me the flow itself is buggy.  There's no reason to start ksgxd() before
at least SGX driver is initialized to work.

Patching the buggy flow is more like a workaround, but isn't a real fix.

> 
> You don't gain any measurable value for the user with this
> switch idea.

There is actual gain by moving sgx_page_reclaimer_init() to sgx_drv_init(), or
only calling sgx_page_reclaimer_init() when sgx_drv_init() returns success:

If somehow sgx_drv_init() fails to initialize, ksgxd() won't run.

Currently, if SGX driver fails to initialize but virtual EPC initializes
successfully, ksgxd() still runs. However it achieves nothing but only wastes
CPU cycles.
Huang, Kai Aug. 31, 2022, 3:28 a.m. UTC | #13
On Wed, 2022-08-31 at 06:10 +0300, Jarkko Sakkinen wrote:
> On Wed, Aug 31, 2022 at 05:57:22AM +0300, jarkko@kernel.org wrote:
> > On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> > > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > > > Hi Jarkko,
> > > > > > > > 
> > > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > > > > prematurely stopped.
> > > > > > > > 
> > > > > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > > > 
> > > > > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > > > 
> > > > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > > > > early stage before we are sure either the driver or KVM SGX will work.
> > > > > > 
> > > > > > I would focus fixing the existing flow rather than reinventing the flow.
> > > > > > 
> > > > > > It can be made to work, and therefore it is IMHO correct action to take.
> > > > > 
> > > > > From another perspective, the *existing flow* is the reason which causes this
> > > > > bug.  A real fix is to fix the flow itself.
> > > > 
> > > > Any existing flow in part of the kernel can have a bug. That
> > > > does not mean that switching flow would be proper way to fix
> > > > a bug.
> > > > 
> > > > BR, Jarkko
> > > 
> > > Yes but I think this is only true when the flow is reasonable.  If the flow
> > > itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).
> > > 
> > > Anyway, let us also hear from others.
> > 
> > The flow can be made to work without issues, which in the
> > context of a bug fix is exactly what a bug fix should do.
> > Not more or less.
> > 
> > You don't gain any measurable value for the user with this
> > switch idea.
> 
> And besides this not proper way to review patch anyway because you did
> not review the code. 
> 

I did review the code, but I couldn't agree on the fix.  That's why I expressed
my view here.


> I'll focus on fix what is broken e.g. so that it
> is easy to backport to stable and distro kernels, and call it a day.
> It certainly does not have to make code "perfect", as long as known
> bugs are sorted out.

Why cannot the fix which fixes the flow go to stable?

> 
> You are welcome to review the next version of the patch, once I've
> resolved the issues that were pointed out by Reinette, if you still
> see some issue but this type of speculative discussion is frankly just
> wasting everyones time.

Hmm.. Why pointing out a better fix (my perspective of course) is wasting
everyone's time?
Jarkko Sakkinen Aug. 31, 2022, 3:40 a.m. UTC | #14
On Wed, Aug 31, 2022 at 03:28:20AM +0000, Huang, Kai wrote:
> On Wed, 2022-08-31 at 06:10 +0300, Jarkko Sakkinen wrote:
> > On Wed, Aug 31, 2022 at 05:57:22AM +0300, jarkko@kernel.org wrote:
> > > On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> > > > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > > > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > > > > Hi Jarkko,
> > > > > > > > > 
> > > > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > > > > > prematurely stopped.
> > > > > > > > > 
> > > > > > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > > > > 
> > > > > > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > > > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > > > > 
> > > > > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > > > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > > > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > > > > > early stage before we are sure either the driver or KVM SGX will work.
> > > > > > > 
> > > > > > > I would focus fixing the existing flow rather than reinventing the flow.
> > > > > > > 
> > > > > > > It can be made to work, and therefore it is IMHO correct action to take.
> > > > > > 
> > > > > > From another perspective, the *existing flow* is the reason which causes this
> > > > > > bug.  A real fix is to fix the flow itself.
> > > > > 
> > > > > Any existing flow in part of the kernel can have a bug. That
> > > > > does not mean that switching flow would be proper way to fix
> > > > > a bug.
> > > > > 
> > > > > BR, Jarkko
> > > > 
> > > > Yes but I think this is only true when the flow is reasonable.  If the flow
> > > > itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).
> > > > 
> > > > Anyway, let us also hear from others.
> > > 
> > > The flow can be made to work without issues, which in the
> > > context of a bug fix is exactly what a bug fix should do.
> > > Not more or less.
> > > 
> > > You don't gain any measurable value for the user with this
> > > switch idea.
> > 
> > And besides this not proper way to review patch anyway because you did
> > not review the code. 
> > 
> 
> I did review the code, but I couldn't agree on the fix.  That's why I expressed
> my view here.
> 
> 
> > I'll focus on fix what is broken e.g. so that it
> > is easy to backport to stable and distro kernels, and call it a day.
> > It certainly does not have to make code "perfect", as long as known
> > bugs are sorted out.
> 
> Why cannot the fix which fixes the flow go to stable?
> 
> > 
> > You are welcome to review the next version of the patch, once I've
> > resolved the issues that were pointed out by Reinette, if you still
> > see some issue but this type of speculative discussion is frankly just
> > wasting everyones time.
> 
> Hmm.. Why pointing out a better fix (my perspective of course) is wasting
> everyone's time?

There was not a single inline comment.

BR, Jarkko
Haitao Huang Aug. 31, 2022, 3:18 p.m. UTC | #15
Hi Kai
On Tue, 30 Aug 2022 22:17:08 -0500, Huang, Kai <kai.huang@intel.com> wrote:

> On Wed, 2022-08-31 at 05:57 +0300, jarkko@kernel.org wrote:
>> On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
>> > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
>> > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
>> > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
>> > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
>> > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
>> > > > > > > Hi Jarkko,
>> > > > > > >
>> > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
>> > > > > > > > In sgx_init(), if misc_register() for the provision  
>> device fails, and
>> > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then  
>> ksgxd will be
>> > > > > > > > prematurely stopped.
>> > > > > > >
>> > > > > > > I do not think misc_register() is required to fail for the  
>> scenario to
>> > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
>> > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is  
>> started
>> > > > > > > (via sgx_page_reclaimer_init()) ...".
>> > > > > >
>> > > > > > IMHO "a failure" might be too vague.  For instance, failure  
>> to sgx_drv_init()
>> > > > > > won't immediately result in ksgxd to stop prematurally.  As  
>> long as KVM SGX can
>> > > > > > be initialized successfully, sgx_init() still returns 0.
>> > > > > >
>> > > > > > Btw I was thinking whether we should move  
>> sgx_page_reclaimer_init() to the end
>> > > > > > of sgx_init(), after we make sure at least one of the driver  
>> and the KVM SGX is
>> > > > > > initialized successfully.  Then the code change in this patch  
>> won't be necessary
>> > > > > > if I understand correctly.  AFAICT there's no good reason to  
>> start the ksgxd at
>> > > > > > early stage before we are sure either the driver or KVM SGX  
>> will work.
>> > > > >
>> > > > > I would focus fixing the existing flow rather than reinventing  
>> the flow.
>> > > > >
>> > > > > It can be made to work, and therefore it is IMHO correct action  
>> to take.
>> > > >
>> > > > From another perspective, the *existing flow* is the reason which  
>> causes this
>> > > > bug.  A real fix is to fix the flow itself.
>> > >
>> > > Any existing flow in part of the kernel can have a bug. That
>> > > does not mean that switching flow would be proper way to fix
>> > > a bug.
>> > >
>> > > BR, Jarkko
>> >
>> > Yes but I think this is only true when the flow is reasonable.  If  
>> the flow
>> > itself isn't reasonable, we should fix the flow (given it's easy to  
>> fix AFAICT).
>> >
>> > Anyway, let us also hear from others.
>>
>> The flow can be made to work without issues, which in the
>> context of a bug fix is exactly what a bug fix should do.
>> Not more or less.
>
> No. To me the flow itself is buggy.  There's no reason to start ksgxd()  
> before
> at least SGX driver is initialized to work.
>

Will it cause racing if we expose dev nodes to user space before
ksgxd is started and sensitization done?

> Patching the buggy flow is more like a workaround, but isn't a real fix.
>
>>
>> You don't gain any measurable value for the user with this
>> switch idea.
>
> There is actual gain by moving sgx_page_reclaimer_init() to  
> sgx_drv_init(), or
> only calling sgx_page_reclaimer_init() when sgx_drv_init() returns  
> success:
>
> If somehow sgx_drv_init() fails to initialize, ksgxd() won't run.
>
> Currently, if SGX driver fails to initialize but virtual EPC initializes
> successfully, ksgxd() still runs. However it achieves nothing but only  
> wastes
> CPU cycles.
>
>

You still need ksgxd for sanitizing (at least) and swapping (potentially)
even if only virtual EPC initializes.

Thanks
Haitao
Reinette Chatre Aug. 31, 2022, 6:08 p.m. UTC | #16
Hi Jarkko,

On 8/30/2022 6:55 PM, Jarkko Sakkinen wrote:
> On Tue, Aug 30, 2022 at 03:54:27PM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
>>> In sgx_init(), if misc_register() for the provision device fails, and
>>> neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
>>> prematurely stopped.
>>
>> I do not think misc_register() is required to fail for the scenario to
>> be triggered (rather use "or" than "and"?). Perhaps just
>> "In sgx_init(), if a failure is encountered after ksgxd is started
>> (via sgx_page_reclaimer_init()) ...".
> 
> This would be the fixed version of the sentence:
> 
> "
> In sgx_init(), if misc_register() fails or misc_register() succeeds but
> neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> prematurely stopped. This may leave some unsanitized pages, which does
> not matter, because SGX will be disabled for the whole power cycle.
> "
> 
> I want to keep the end states listed and not make it more abstract.
> 
> The second sentence addresses the remark below.

Thank you for capturing this. It sounds good.

> 
>> To help the reader understand the subject of this patch it may help
>> to explain that prematurely stopping ksgxd may leave some
>> unsanitized pages, but that is not a problem since SGX cannot
>> be used on the platform anyway. 
>>
>>> This triggers WARN_ON() because sgx_dirty_page_list ends up being
>>> non-empty, and dumps the call stack:
>>>
>>
>> Traces like below can be frowned upon. I recommend that you follow the
>> guidance in "Backtraces in commit mesages"(sic) in 
>> Documentation/process/submitting-patches.rst.
>>
>>> [    0.268592] WARNING: CPU: 6 PID: 83 at
>>> arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> 
> Is this good enough? I had not actually spotted this section before but
> nice that it exists. Apparently has been added in 5.12.

The timestamp should be removed, it is among the things listed as
"distracting information". In this case the backtrace and registers within
the trace are not adding value but I think it is important to mention
the kernel version somewhere for folks to be able to interpret the
line number provided.

Yet I see you kept the whole trace in V2 ?

>>> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
>>> index 515e2a5f25bb..903100fcfce3 100644
>>> --- a/arch/x86/kernel/cpu/sgx/main.c
>>> +++ b/arch/x86/kernel/cpu/sgx/main.c
>>> @@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
>>>   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
>>>   * from the input list, and made available for the page allocator. SECS pages
>>>   * prepending their children in the input list are left intact.
>>> + *
>>> + * Contents of the @dirty_page_list must be thread-local, i.e.
>>> + * not shared by multiple threads.
>>
>> Did you intend to mention something about the needed locking here? It looks
>> like some information is lost during the move to the function description.
> 
> Nothing about the locking that concerns the parameter, as the
> sentence defines clear constraints for the caller.

ok

> 
>>
>>>   */
>>> -static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
>>> +static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
>>>  {
>>>  	struct sgx_epc_page *page;
>>> +	int left_dirty = 0;
>>
>> I do not know how many pages this code should be ready for but at least
>> this could handle more by being an unsigned int considering that it is
>> always positive ... maybe even unsigned long?
> 
> I would go for 'long'. More information below.
> 
>>
>>>  	LIST_HEAD(dirty);
>>>  	int ret;
>>>  
>>> -	/* dirty_page_list is thread-local, no need for a lock: */
>>>  	while (!list_empty(dirty_page_list)) {
>>>  		if (kthread_should_stop())
>>> -			return;
>>> +			break;
>>>  
>>>  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
>>>  
>>> @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
>>>  		} else {
>>>  			/* The page is not yet clean - move to the dirty list. */
>>>  			list_move_tail(&page->list, &dirty);
>>> +			left_dirty++;
>>>  		}
>>>  
>>>  		cond_resched();
>>>  	}
>>>  
>>>  	list_splice(&dirty, dirty_page_list);
>>> +	return left_dirty;
>>>  }
>>>  
>>>  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
>>> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>>>  
>>>  static int ksgxd(void *p)
>>>  {
>>> +	int left_dirty;
>>> +
>>>  	set_freezable();
>>>  
>>>  	/*
>>> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>>>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>>>  	 */
>>>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
>>> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>>>  
>>> -	/* sanity check: */
>>> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
>>> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
>>> +	if (left_dirty)
>>> +		pr_warn("%d unsanitized pages\n", left_dirty);
>>>  
>>>  	while (!kthread_should_stop()) {
>>>  		if (try_to_freeze())
>>
>>
>> Reinette
> 
> We need to return -ECANCELED on premature stop, and number of
> pages otherwise.
> 
> In premature stop, nothing should be printed, as the number
> is by practical means a random number. Otherwise, it is an
> indicator of a bug in the driver, and therefore a non-zero
> number should be printed pr_err(), if that happens after the
> second call.
> 

Good point.

> Oh, sorry, I forgot one thing. The devices should be actually
> deinitialized in the error case. We do not want to leave a
> broken driver running.

Is this not already done on the error path of sgx_init()?

Reinette
Jarkko Sakkinen Aug. 31, 2022, 6:28 p.m. UTC | #17
On Wed, Aug 31, 2022 at 10:18:00AM -0500, Haitao Huang wrote:
> Hi Kai
> On Tue, 30 Aug 2022 22:17:08 -0500, Huang, Kai <kai.huang@intel.com> wrote:
> 
> > On Wed, 2022-08-31 at 05:57 +0300, jarkko@kernel.org wrote:
> > > On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> > > > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > > > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > > > > Hi Jarkko,
> > > > > > > > >
> > > > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > > > > In sgx_init(), if misc_register() for the provision
> > > device fails, and
> > > > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds,
> > > then ksgxd will be
> > > > > > > > > > prematurely stopped.
> > > > > > > > >
> > > > > > > > > I do not think misc_register() is required to fail for
> > > the scenario to
> > > > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > > > > "In sgx_init(), if a failure is encountered after ksgxd
> > > is started
> > > > > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > > > >
> > > > > > > > IMHO "a failure" might be too vague.  For instance,
> > > failure to sgx_drv_init()
> > > > > > > > won't immediately result in ksgxd to stop prematurally.
> > > As long as KVM SGX can
> > > > > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > > > >
> > > > > > > > Btw I was thinking whether we should move
> > > sgx_page_reclaimer_init() to the end
> > > > > > > > of sgx_init(), after we make sure at least one of the
> > > driver and the KVM SGX is
> > > > > > > > initialized successfully.  Then the code change in this
> > > patch won't be necessary
> > > > > > > > if I understand correctly.  AFAICT there's no good reason
> > > to start the ksgxd at
> > > > > > > > early stage before we are sure either the driver or KVM
> > > SGX will work.
> > > > > > >
> > > > > > > I would focus fixing the existing flow rather than
> > > reinventing the flow.
> > > > > > >
> > > > > > > It can be made to work, and therefore it is IMHO correct
> > > action to take.
> > > > > >
> > > > > > From another perspective, the *existing flow* is the reason
> > > which causes this
> > > > > > bug.  A real fix is to fix the flow itself.
> > > > >
> > > > > Any existing flow in part of the kernel can have a bug. That
> > > > > does not mean that switching flow would be proper way to fix
> > > > > a bug.
> > > > >
> > > > > BR, Jarkko
> > > >
> > > > Yes but I think this is only true when the flow is reasonable.  If
> > > the flow
> > > > itself isn't reasonable, we should fix the flow (given it's easy
> > > to fix AFAICT).
> > > >
> > > > Anyway, let us also hear from others.
> > > 
> > > The flow can be made to work without issues, which in the
> > > context of a bug fix is exactly what a bug fix should do.
> > > Not more or less.
> > 
> > No. To me the flow itself is buggy.  There's no reason to start ksgxd()
> > before
> > at least SGX driver is initialized to work.
> > 
> 
> Will it cause racing if we expose dev nodes to user space before
> ksgxd is started and sensitization done?

I'll to explain this.

So the point is to fix the issue at hand, and fix it locally.

Changing initialization order is simply out of context. It's
not really an argument for or against changing it

We are fixing sanitization here, and only that with zero
side-effects to any other semantics.

It's dictated by the development process [*] but more
importantly it's also just plain common sense.

[*] https://www.kernel.org/doc/html/v5.19/process/submitting-patches.html#separate-your-changes

BR, Jarkko
Dave Hansen Aug. 31, 2022, 6:35 p.m. UTC | #18
Jarkko, Kai and Haitao,

Can you three please start trimming your replies?  You don't need to and
should not quote the entirety of your messages every time you reply.

On 8/31/22 11:28, jarkko@kernel.org wrote:
>> Will it cause racing if we expose dev nodes to user space before
>> ksgxd is started and sensitization done?
> I'll to explain this.
> 
> So the point is to fix the issue at hand, and fix it locally.
> 
> Changing initialization order is simply out of context. It's
> not really an argument for or against changing it
> 
> We are fixing sanitization here, and only that with zero
> side-effects to any other semantics.
> 
> It's dictated by the development process [*] but more
> importantly it's also just plain common sense.

Kai, I think your suggestion is reasonable.  You make a good point about
not needing ksgxd for vepc.

*But*, I think it's a bit too much for a bugfix that's headed to
-stable.  I'm concerned that it will have unintended side effects,
*especially* when there's a working, tested alternative.
Jarkko Sakkinen Aug. 31, 2022, 6:44 p.m. UTC | #19
On Wed, Aug 31, 2022 at 11:35:10AM -0700, Dave Hansen wrote:
> Jarkko, Kai and Haitao,
> 
> Can you three please start trimming your replies?  You don't need to and
> should not quote the entirety of your messages every time you reply.
> 
> On 8/31/22 11:28, jarkko@kernel.org wrote:
> >> Will it cause racing if we expose dev nodes to user space before
> >> ksgxd is started and sensitization done?
> > I'll to explain this.
> > 
> > So the point is to fix the issue at hand, and fix it locally.
> > 
> > Changing initialization order is simply out of context. It's
> > not really an argument for or against changing it
> > 
> > We are fixing sanitization here, and only that with zero
> > side-effects to any other semantics.
> > 
> > It's dictated by the development process [*] but more
> > importantly it's also just plain common sense.
> 
> Kai, I think your suggestion is reasonable.  You make a good point about
> not needing ksgxd for vepc.
> 
> *But*, I think it's a bit too much for a bugfix that's headed to
> -stable.  I'm concerned that it will have unintended side effects,
> *especially* when there's a working, tested alternative.

Yeah, I also actually *do* agree that the suggestions could
be reasonable.

BR, Jarkko
Jarkko Sakkinen Aug. 31, 2022, 6:45 p.m. UTC | #20
On Wed, Aug 31, 2022 at 11:35:10AM -0700, Dave Hansen wrote:
> Jarkko, Kai and Haitao,
> 
> Can you three please start trimming your replies?  You don't need to and
> should not quote the entirety of your messages every time you reply.

Sure, sorry about that.

BR, Jarkko
Huang, Kai Aug. 31, 2022, 8:42 p.m. UTC | #21
On Wed, 2022-08-31 at 11:35 -0700, Dave Hansen wrote:
> Jarkko, Kai and Haitao,
> 
> Can you three please start trimming your replies?  You don't need to and
> should not quote the entirety of your messages every time you reply.
> 
> On 8/31/22 11:28, jarkko@kernel.org wrote:
> > > Will it cause racing if we expose dev nodes to user space before
> > > ksgxd is started and sensitization done?
> > I'll to explain this.
> > 
> > So the point is to fix the issue at hand, and fix it locally.
> > 
> > Changing initialization order is simply out of context. It's
> > not really an argument for or against changing it
> > 
> > We are fixing sanitization here, and only that with zero
> > side-effects to any other semantics.
> > 
> > It's dictated by the development process [*] but more
> > importantly it's also just plain common sense.
> 
> Kai, I think your suggestion is reasonable.  You make a good point about
> not needing ksgxd for vepc.
> 
> *But*, I think it's a bit too much for a bugfix that's headed to
> -stable.  I'm concerned that it will have unintended side effects,
> *especially* when there's a working, tested alternative.

Agreed. Thanks Dave/Jarkko.
Jarkko Sakkinen Sept. 1, 2022, 10:27 p.m. UTC | #22
On Wed, Aug 31, 2022 at 08:42:59PM +0000, Huang, Kai wrote:
> On Wed, 2022-08-31 at 11:35 -0700, Dave Hansen wrote:
> > Jarkko, Kai and Haitao,
> > 
> > Can you three please start trimming your replies?  You don't need to and
> > should not quote the entirety of your messages every time you reply.
> > 
> > On 8/31/22 11:28, jarkko@kernel.org wrote:
> > > > Will it cause racing if we expose dev nodes to user space before
> > > > ksgxd is started and sensitization done?
> > > I'll to explain this.
> > > 
> > > So the point is to fix the issue at hand, and fix it locally.
> > > 
> > > Changing initialization order is simply out of context. It's
> > > not really an argument for or against changing it
> > > 
> > > We are fixing sanitization here, and only that with zero
> > > side-effects to any other semantics.
> > > 
> > > It's dictated by the development process [*] but more
> > > importantly it's also just plain common sense.
> > 
> > Kai, I think your suggestion is reasonable.  You make a good point about
> > not needing ksgxd for vepc.
> > 
> > *But*, I think it's a bit too much for a bugfix that's headed to
> > -stable.  I'm concerned that it will have unintended side effects,
> > *especially* when there's a working, tested alternative.
> 
> Agreed. Thanks Dave/Jarkko.

Please do a patch. It's a very reasonable suggestion when
considered out of context of this bug.

If you go really rigid with this, the compilation process
should not compile in sanitization process in the case when
only vepc is enabled. It's useless functionality in that
case.

BR, Jarkko
Huang, Kai Sept. 1, 2022, 10:41 p.m. UTC | #23
On Fri, 2022-09-02 at 01:27 +0300, jarkko@kernel.org wrote:
> On Wed, Aug 31, 2022 at 08:42:59PM +0000, Huang, Kai wrote:
> > On Wed, 2022-08-31 at 11:35 -0700, Dave Hansen wrote:
> > > Jarkko, Kai and Haitao,
> > > 
> > > Can you three please start trimming your replies?  You don't need to and
> > > should not quote the entirety of your messages every time you reply.
> > > 
> > > On 8/31/22 11:28, jarkko@kernel.org wrote:
> > > > > Will it cause racing if we expose dev nodes to user space before
> > > > > ksgxd is started and sensitization done?
> > > > I'll to explain this.
> > > > 
> > > > So the point is to fix the issue at hand, and fix it locally.
> > > > 
> > > > Changing initialization order is simply out of context. It's
> > > > not really an argument for or against changing it
> > > > 
> > > > We are fixing sanitization here, and only that with zero
> > > > side-effects to any other semantics.
> > > > 
> > > > It's dictated by the development process [*] but more
> > > > importantly it's also just plain common sense.
> > > 
> > > Kai, I think your suggestion is reasonable.  You make a good point about
> > > not needing ksgxd for vepc.
> > > 
> > > *But*, I think it's a bit too much for a bugfix that's headed to
> > > -stable.  I'm concerned that it will have unintended side effects,
> > > *especially* when there's a working, tested alternative.
> > 
> > Agreed. Thanks Dave/Jarkko.
> 
> Please do a patch. It's a very reasonable suggestion when
> considered out of context of this bug.
> 
> If you go really rigid with this, the compilation process
> should not compile in sanitization process in the case when
> only vepc is enabled. It's useless functionality in that
> case.
> 
> BR, Jarkko

Yeah I am planning to work out one to see how it goes.
Jarkko Sakkinen Sept. 1, 2022, 11:58 p.m. UTC | #24
On Thu, Sep 01, 2022 at 10:41:54PM +0000, Huang, Kai wrote:
> On Fri, 2022-09-02 at 01:27 +0300, jarkko@kernel.org wrote:
> > On Wed, Aug 31, 2022 at 08:42:59PM +0000, Huang, Kai wrote:
> > > On Wed, 2022-08-31 at 11:35 -0700, Dave Hansen wrote:
> > > > Jarkko, Kai and Haitao,
> > > > 
> > > > Can you three please start trimming your replies?  You don't need to and
> > > > should not quote the entirety of your messages every time you reply.
> > > > 
> > > > On 8/31/22 11:28, jarkko@kernel.org wrote:
> > > > > > Will it cause racing if we expose dev nodes to user space before
> > > > > > ksgxd is started and sensitization done?
> > > > > I'll to explain this.
> > > > > 
> > > > > So the point is to fix the issue at hand, and fix it locally.
> > > > > 
> > > > > Changing initialization order is simply out of context. It's
> > > > > not really an argument for or against changing it
> > > > > 
> > > > > We are fixing sanitization here, and only that with zero
> > > > > side-effects to any other semantics.
> > > > > 
> > > > > It's dictated by the development process [*] but more
> > > > > importantly it's also just plain common sense.
> > > > 
> > > > Kai, I think your suggestion is reasonable.  You make a good point about
> > > > not needing ksgxd for vepc.
> > > > 
> > > > *But*, I think it's a bit too much for a bugfix that's headed to
> > > > -stable.  I'm concerned that it will have unintended side effects,
> > > > *especially* when there's a working, tested alternative.
> > > 
> > > Agreed. Thanks Dave/Jarkko.
> > 
> > Please do a patch. It's a very reasonable suggestion when
> > considered out of context of this bug.
> > 
> > If you go really rigid with this, the compilation process
> > should not compile in sanitization process in the case when
> > only vepc is enabled. It's useless functionality in that
> > case.
> > 
> > BR, Jarkko
> 
> Yeah I am planning to work out one to see how it goes.

Looking forward to try it out :-)

BR, Jarkko
Huang, Kai Sept. 2, 2022, 12:26 a.m. UTC | #25
> >
> > Yeah I am planning to work out one to see how it goes.
> 
> Looking forward to try it out :-)

Just let you know I'll work on it perhaps late next week or the week after, since I have other higher priority tasks to do. :)

Thanks,
-Kai
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 515e2a5f25bb..903100fcfce3 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -49,17 +49,20 @@  static LIST_HEAD(sgx_dirty_page_list);
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
  * from the input list, and made available for the page allocator. SECS pages
  * prepending their children in the input list are left intact.
+ *
+ * Contents of the @dirty_page_list must be thread-local, i.e.
+ * not shared by multiple threads.
  */
-static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
+static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
 {
 	struct sgx_epc_page *page;
+	int left_dirty = 0;
 	LIST_HEAD(dirty);
 	int ret;
 
-	/* dirty_page_list is thread-local, no need for a lock: */
 	while (!list_empty(dirty_page_list)) {
 		if (kthread_should_stop())
-			return;
+			break;
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
@@ -92,12 +95,14 @@  static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 		} else {
 			/* The page is not yet clean - move to the dirty list. */
 			list_move_tail(&page->list, &dirty);
+			left_dirty++;
 		}
 
 		cond_resched();
 	}
 
 	list_splice(&dirty, dirty_page_list);
+	return left_dirty;
 }
 
 static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
@@ -388,6 +393,8 @@  void sgx_reclaim_direct(void)
 
 static int ksgxd(void *p)
 {
+	int left_dirty;
+
 	set_freezable();
 
 	/*
@@ -395,10 +402,10 @@  static int ksgxd(void *p)
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
 	__sgx_sanitize_pages(&sgx_dirty_page_list);
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
 
-	/* sanity check: */
-	WARN_ON(!list_empty(&sgx_dirty_page_list));
+	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	if (left_dirty)
+		pr_warn("%d unsanitized pages\n", left_dirty);
 
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())