Message ID | 20220903060108.1709739-2-jarkko@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/sgx: Fixes for v6.0 | expand |
On Sat, Sep 03, 2022 at 09:01:07AM +0300, Jarkko Sakkinen wrote: > Unsanitized pages trigger WARN_ON() unconditionally, which can panic the > whole computer, if /proc/sys/kernel/panic_on_warn is set. > > In sgx_init(), if misc_register() fails or misc_register() succeeds but > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be > prematurely stopped. This may leave unsanitized pages, which will result a > false warning. > > Refine __sgx_sanitize_pages() to return: > > 1. Zero when the sanitization process is complete or ksgxd has been > requested to stop. > 2. The number of unsanitized pages otherwise. > > Use the return value as the criteria for triggering output, and tone down > the output to pr_err() to prevent the whole system to be taken down if for > some reason sanitization process does not complete. > > Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u > Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list") > Cc: stable@vger.kernel.org # v5.13+ > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > --- > v7: > - Rewrote commit message. > - Do not return -ECANCELED on premature stop. Instead use zero both > premature stop and complete sanitization. > > v6: > - Address Reinette's feedback: > https://lore.kernel.org/linux-sgx/Yw6%2FiTzSdSw%2FY%2FVO@kernel.org/ > > v5: > - Add the klog dump and sysctl option to the commit message. > > v4: > - Explain expectations for dirty_page_list in the function header, instead > of an inline comment. > - Improve commit message to explain the conditions better. > - Return the number of pages left dirty to ksgxd() and print warning after > the 2nd call, if there are any. > > v3: > - Remove WARN_ON(). > - Tuned comments and the commit message a bit. > > v2: > - Replaced WARN_ON() with optional pr_info() inside > __sgx_sanitize_pages(). > - Rewrote the commit message. > - Added the fixes tag. > --- > arch/x86/kernel/cpu/sgx/main.c | 33 ++++++++++++++++++++++++++------- > 1 file changed, 26 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index 515e2a5f25bb..c0a5ce19c608 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -49,17 +49,23 @@ static LIST_HEAD(sgx_dirty_page_list); > * Reset post-kexec EPC pages to the uninitialized state. The pages are removed > * from the input list, and made available for the page allocator. SECS pages > * prepending their children in the input list are left intact. > + * > + * Contents of the @dirty_page_list must be thread-local, i.e. > + * not shared by multiple threads. > + * > + * Return 0 when sanitization was successful or kthread was stopped, and the > + * number of unsanitized pages otherwise. > */ > -static void __sgx_sanitize_pages(struct list_head *dirty_page_list) > +static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list) > { > + unsigned long left_dirty = 0; > struct sgx_epc_page *page; > LIST_HEAD(dirty); > int ret; > > - /* dirty_page_list is thread-local, no need for a lock: */ > while (!list_empty(dirty_page_list)) { > if (kthread_should_stop()) > - return; > + return 0; > > page = list_first_entry(dirty_page_list, struct sgx_epc_page, list); > > @@ -92,12 +98,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list) > } else { > /* The page is not yet clean - move to the dirty list. */ > list_move_tail(&page->list, &dirty); > + left_dirty++; > } > > cond_resched(); > } > > list_splice(&dirty, dirty_page_list); > + return left_dirty; > } > > static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page) > @@ -388,17 +396,28 @@ void sgx_reclaim_direct(void) > > static int ksgxd(void *p) > { > + unsigned long left_dirty; > + > set_freezable(); > > /* > * Sanitize pages in order to recover from kexec(). The 2nd pass is > * required for SECS pages, whose child pages blocked EREMOVE. > */ > - __sgx_sanitize_pages(&sgx_dirty_page_list); > - __sgx_sanitize_pages(&sgx_dirty_page_list); > + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list); > + pr_debug("%ld unsanitized pages\n", left_dirty); %lu > > - /* sanity check: */ > - WARN_ON(!list_empty(&sgx_dirty_page_list)); > + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list); > + /* > + * Never expected to happen in a working driver. If it happens the bug > + * is expected to be in the sanitization process, but successfully > + * sanitized pages are still valid and driver can be used and most > + * importantly debugged without issues. To put short, the global state > + * of kernel is not corrupted so no reason to do any more complicated > + * rollback. > + */ > + if (left_dirty) > + pr_err("%ld unsanitized pages\n", left_dirty); %lu > > while (!kthread_should_stop()) { > if (try_to_freeze()) > -- > 2.37.2 > BR, Jarkko
On Sat, 2022-09-03 at 13:26 +0300, Jarkko Sakkinen wrote: > > static int ksgxd(void *p) > > { > > + unsigned long left_dirty; > > + > > set_freezable(); > > > > /* > > * Sanitize pages in order to recover from kexec(). The 2nd pass is > > * required for SECS pages, whose child pages blocked EREMOVE. > > */ > > - __sgx_sanitize_pages(&sgx_dirty_page_list); > > - __sgx_sanitize_pages(&sgx_dirty_page_list); > > + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list); > > + pr_debug("%ld unsanitized pages\n", left_dirty); > %lu > I assume the intention is to print out the unsanitized SECS pages, but what is the value of printing it? To me it doesn't provide any useful information, even for debug. Besides, the first call of __sgx_sanitize_pages() can return 0, due to either kthread_should_stop() being true, or all EPC pages are EREMOVED successfully. So in this case kernel will print out "0 unsanitized pages\n", which doesn't make a lot sense? > > > > - /* sanity check: */ > > - WARN_ON(!list_empty(&sgx_dirty_page_list)); > > + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list); > > + /* > > + * Never expected to happen in a working driver. If it happens the > > bug > > + * is expected to be in the sanitization process, but successfully > > + * sanitized pages are still valid and driver can be used and most > > + * importantly debugged without issues. To put short, the global > > state > > + * of kernel is not corrupted so no reason to do any more > > complicated > > + * rollback. > > + */ > > + if (left_dirty) > > + pr_err("%ld unsanitized pages\n", left_dirty); > %lu No strong opinion, but IMHO we can still just WARN() when it is driver bug: 1) There's no guarantee the driver can continue to work if it has bug; 2) WARN() can panic() the kernel if /proc/sys/kernel/panic_on_warn is set is fine. It's expected behaviour. If I understand correctly, there are many places in the kernel that uses WARN() to catch bugs. In fact, we can even view WARN() as an advantage. For instance, if we only print out "xx unsanitized pages" in the existing code, people may even wouldn't have noticed this bug. From this perspective, if you want to print out, I think you may want to make the message more visible, that people can know it's driver bug. Perhaps something like "The driver has bug, please report to kernel community..", etc. 3) Changing WARN() to pr_err() conceptually isn't mandatory to fix this particular bug. So, it's kinda mixing things together. But again, no strong opinion here.
On Mon, Sep 05, 2022 at 07:50:33AM +0000, Huang, Kai wrote: > On Sat, 2022-09-03 at 13:26 +0300, Jarkko Sakkinen wrote: > > > static int ksgxd(void *p) > > > { > > > + unsigned long left_dirty; > > > + > > > set_freezable(); > > > > > > /* > > > * Sanitize pages in order to recover from kexec(). The 2nd pass is > > > * required for SECS pages, whose child pages blocked EREMOVE. > > > */ > > > - __sgx_sanitize_pages(&sgx_dirty_page_list); > > > - __sgx_sanitize_pages(&sgx_dirty_page_list); > > > + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list); > > > + pr_debug("%ld unsanitized pages\n", left_dirty); > > %lu > > > > I assume the intention is to print out the unsanitized SECS pages, but what is > the value of printing it? To me it doesn't provide any useful information, even > for debug. How do you measure "useful"? If for some reason there were unsanitized pages, I would at least want to know where it ended on the first value. Plus it does zero harm unless you explicitly turn it on. > Besides, the first call of __sgx_sanitize_pages() can return 0, due to either > kthread_should_stop() being true, or all EPC pages are EREMOVED successfully. > So in this case kernel will print out "0 unsanitized pages\n", which doesn't > make a lot sense? > > > > > > > - /* sanity check: */ > > > - WARN_ON(!list_empty(&sgx_dirty_page_list)); > > > + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list); > > > + /* > > > + * Never expected to happen in a working driver. If it happens the > > > bug > > > + * is expected to be in the sanitization process, but successfully > > > + * sanitized pages are still valid and driver can be used and most > > > + * importantly debugged without issues. To put short, the global > > > state > > > + * of kernel is not corrupted so no reason to do any more > > > complicated > > > + * rollback. > > > + */ > > > + if (left_dirty) > > > + pr_err("%ld unsanitized pages\n", left_dirty); > > %lu > > No strong opinion, but IMHO we can still just WARN() when it is driver bug: > > 1) There's no guarantee the driver can continue to work if it has bug; > > 2) WARN() can panic() the kernel if /proc/sys/kernel/panic_on_warn is set is > fine. It's expected behaviour. If I understand correctly, there are many > places in the kernel that uses WARN() to catch bugs. > > In fact, we can even view WARN() as an advantage. For instance, if we only print > out "xx unsanitized pages" in the existing code, people may even wouldn't have > noticed this bug. > > From this perspective, if you want to print out, I think you may want to make > the message more visible, that people can know it's driver bug. Perhaps > something like "The driver has bug, please report to kernel community..", etc. > > 3) Changing WARN() to pr_err() conceptually isn't mandatory to fix this > particular bug. So, it's kinda mixing things together. > > But again, no strong opinion here. > > -- > Thanks, > -Kai > > BR, Jarkko
On Mon, Sep 05, 2022 at 12:44:56PM +0300, jarkko@kernel.org wrote: > On Mon, Sep 05, 2022 at 07:50:33AM +0000, Huang, Kai wrote: > > On Sat, 2022-09-03 at 13:26 +0300, Jarkko Sakkinen wrote: > > > > static int ksgxd(void *p) > > > > { > > > > + unsigned long left_dirty; > > > > + > > > > set_freezable(); > > > > > > > > /* > > > > * Sanitize pages in order to recover from kexec(). The 2nd pass is > > > > * required for SECS pages, whose child pages blocked EREMOVE. > > > > */ > > > > - __sgx_sanitize_pages(&sgx_dirty_page_list); > > > > - __sgx_sanitize_pages(&sgx_dirty_page_list); > > > > + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list); > > > > + pr_debug("%ld unsanitized pages\n", left_dirty); > > > %lu > > > > > > > I assume the intention is to print out the unsanitized SECS pages, but what is > > the value of printing it? To me it doesn't provide any useful information, even > > for debug. > > How do you measure "useful"? > > If for some reason there were unsanitized pages, I would at least > want to know where it ended on the first value. > > Plus it does zero harm unless you explicitly turn it on. I would split it though for a separate patch because it does not need to be part of the stable fix and change it to: if (left_dirty) pr_debug("%lu unsanitized pages\n", left_dirty); BR, Jarkko
On Mon, 2022-09-05 at 12:44 +0300, jarkko@kernel.org wrote: > On Mon, Sep 05, 2022 at 07:50:33AM +0000, Huang, Kai wrote: > > On Sat, 2022-09-03 at 13:26 +0300, Jarkko Sakkinen wrote: > > > > static int ksgxd(void *p) > > > > { > > > > + unsigned long left_dirty; > > > > + > > > > set_freezable(); > > > > > > > > /* > > > > * Sanitize pages in order to recover from kexec(). The 2nd pass is > > > > * required for SECS pages, whose child pages blocked EREMOVE. > > > > */ > > > > - __sgx_sanitize_pages(&sgx_dirty_page_list); > > > > - __sgx_sanitize_pages(&sgx_dirty_page_list); > > > > + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list); > > > > + pr_debug("%ld unsanitized pages\n", left_dirty); > > > %lu > > > > > > > I assume the intention is to print out the unsanitized SECS pages, but what is > > the value of printing it? To me it doesn't provide any useful information, even > > for debug. > > How do you measure "useful"? > > If for some reason there were unsanitized pages, I would at least > want to know where it ended on the first value. Using pr_debug() means it's for debugging the driver, but to me it doesn't help on debugging the driver, so it is not useful. Anyway, I will stop argue here.
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 515e2a5f25bb..c0a5ce19c608 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -49,17 +49,23 @@ static LIST_HEAD(sgx_dirty_page_list); * Reset post-kexec EPC pages to the uninitialized state. The pages are removed * from the input list, and made available for the page allocator. SECS pages * prepending their children in the input list are left intact. + * + * Contents of the @dirty_page_list must be thread-local, i.e. + * not shared by multiple threads. + * + * Return 0 when sanitization was successful or kthread was stopped, and the + * number of unsanitized pages otherwise. */ -static void __sgx_sanitize_pages(struct list_head *dirty_page_list) +static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list) { + unsigned long left_dirty = 0; struct sgx_epc_page *page; LIST_HEAD(dirty); int ret; - /* dirty_page_list is thread-local, no need for a lock: */ while (!list_empty(dirty_page_list)) { if (kthread_should_stop()) - return; + return 0; page = list_first_entry(dirty_page_list, struct sgx_epc_page, list); @@ -92,12 +98,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list) } else { /* The page is not yet clean - move to the dirty list. */ list_move_tail(&page->list, &dirty); + left_dirty++; } cond_resched(); } list_splice(&dirty, dirty_page_list); + return left_dirty; } static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page) @@ -388,17 +396,28 @@ void sgx_reclaim_direct(void) static int ksgxd(void *p) { + unsigned long left_dirty; + set_freezable(); /* * Sanitize pages in order to recover from kexec(). The 2nd pass is * required for SECS pages, whose child pages blocked EREMOVE. */ - __sgx_sanitize_pages(&sgx_dirty_page_list); - __sgx_sanitize_pages(&sgx_dirty_page_list); + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list); + pr_debug("%ld unsanitized pages\n", left_dirty); - /* sanity check: */ - WARN_ON(!list_empty(&sgx_dirty_page_list)); + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list); + /* + * Never expected to happen in a working driver. If it happens the bug + * is expected to be in the sanitization process, but successfully + * sanitized pages are still valid and driver can be used and most + * importantly debugged without issues. To put short, the global state + * of kernel is not corrupted so no reason to do any more complicated + * rollback. + */ + if (left_dirty) + pr_err("%ld unsanitized pages\n", left_dirty); while (!kthread_should_stop()) { if (try_to_freeze())
Unsanitized pages trigger WARN_ON() unconditionally, which can panic the whole computer, if /proc/sys/kernel/panic_on_warn is set. In sgx_init(), if misc_register() fails or misc_register() succeeds but neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be prematurely stopped. This may leave unsanitized pages, which will result a false warning. Refine __sgx_sanitize_pages() to return: 1. Zero when the sanitization process is complete or ksgxd has been requested to stop. 2. The number of unsanitized pages otherwise. Use the return value as the criteria for triggering output, and tone down the output to pr_err() to prevent the whole system to be taken down if for some reason sanitization process does not complete. Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list") Cc: stable@vger.kernel.org # v5.13+ Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> --- v7: - Rewrote commit message. - Do not return -ECANCELED on premature stop. Instead use zero both premature stop and complete sanitization. v6: - Address Reinette's feedback: https://lore.kernel.org/linux-sgx/Yw6%2FiTzSdSw%2FY%2FVO@kernel.org/ v5: - Add the klog dump and sysctl option to the commit message. v4: - Explain expectations for dirty_page_list in the function header, instead of an inline comment. - Improve commit message to explain the conditions better. - Return the number of pages left dirty to ksgxd() and print warning after the 2nd call, if there are any. v3: - Remove WARN_ON(). - Tuned comments and the commit message a bit. v2: - Replaced WARN_ON() with optional pr_info() inside __sgx_sanitize_pages(). - Rewrote the commit message. - Added the fixes tag. --- arch/x86/kernel/cpu/sgx/main.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-)