diff mbox series

x86/sgx: Avoid softlockup from sgx_vepc_release

Message ID 20230814043714.107971-1-jinpu.wang@ionos.com (mailing list archive)
State New, archived
Headers show
Series x86/sgx: Avoid softlockup from sgx_vepc_release | expand

Commit Message

Jinpu Wang Aug. 14, 2023, 4:37 a.m. UTC
We hit softlocup with following call trace:
 kernel: [ 2041.288210] Call Trace:
kernel: [ 2041.288213]  <IRQ>
kernel: [ 2041.288216]  ? watchdog_timer_fn+0x1b4/0x210
kernel: [ 2041.288222]  ? lockup_detector_update_enable+0x50/0x50
kernel: [ 2041.288225]  ? __hrtimer_run_queues+0x127/0x280
kernel: [ 2041.288230]  ? hrtimer_interrupt+0xfc/0x210
kernel: [ 2041.288233]  ? __sysvec_apic_timer_interrupt+0x59/0xd0
kernel: [ 2041.288237]  ? sysvec_apic_timer_interrupt+0x6d/0x90
kernel: [ 2041.288242]  </IRQ>
kernel: [ 2041.288242]  <TASK>
kernel: [ 2041.288244]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
kernel: [ 2041.288250]  xa_erase+0x21/0xb0
kernel: [ 2041.288256]  ? sgx_free_epc_page+0x20/0x50
kernel: [ 2041.288260]  sgx_vepc_release+0x75/0x220
kernel: [ 2041.288263]  __fput+0x89/0x250
kernel: [ 2041.288269]  task_work_run+0x59/0x90
kernel: [ 2041.288275]  do_exit+0x337/0x9a0

Similar like 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
The test system has 64GB of enclave memory, and all assigned to a single
VM. Release vepc take longer time and triggers the softlockup warning.

Add a cond_resched() to give other tasks a chance to run and placate
the softlockup detector.

Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Haitao Huang <haitao.huang@linux.intel.com>
Reported-by: Yu Zhang <yu.zhang@ionos.com>

Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 arch/x86/kernel/cpu/sgx/virt.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jarkko Sakkinen Aug. 14, 2023, 5:35 p.m. UTC | #1
On Mon Aug 14, 2023 at 7:37 AM EEST, Jack Wang wrote:
> We hit softlocup with following call trace:
>  kernel: [ 2041.288210] Call Trace:
> kernel: [ 2041.288213]  <IRQ>
> kernel: [ 2041.288216]  ? watchdog_timer_fn+0x1b4/0x210
> kernel: [ 2041.288222]  ? lockup_detector_update_enable+0x50/0x50
> kernel: [ 2041.288225]  ? __hrtimer_run_queues+0x127/0x280
> kernel: [ 2041.288230]  ? hrtimer_interrupt+0xfc/0x210
> kernel: [ 2041.288233]  ? __sysvec_apic_timer_interrupt+0x59/0xd0
> kernel: [ 2041.288237]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> kernel: [ 2041.288242]  </IRQ>
> kernel: [ 2041.288242]  <TASK>
> kernel: [ 2041.288244]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> kernel: [ 2041.288250]  xa_erase+0x21/0xb0
> kernel: [ 2041.288256]  ? sgx_free_epc_page+0x20/0x50
> kernel: [ 2041.288260]  sgx_vepc_release+0x75/0x220
> kernel: [ 2041.288263]  __fput+0x89/0x250
> kernel: [ 2041.288269]  task_work_run+0x59/0x90
> kernel: [ 2041.288275]  do_exit+0x337/0x9a0
>
> Similar like 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
> The test system has 64GB of enclave memory, and all assigned to a single
> VM. Release vepc take longer time and triggers the softlockup warning.
>
> Add a cond_resched() to give other tasks a chance to run and placate
> the softlockup detector.
>
> Cc: Jarkko Sakkinen <jarkko@kernel.org>
> Cc: Haitao Huang <haitao.huang@linux.intel.com>
> Reported-by: Yu Zhang <yu.zhang@ionos.com>
>
> Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> ---
>  arch/x86/kernel/cpu/sgx/virt.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index c3e37eaec8ec..01d2795792cc 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -204,6 +204,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
>  			continue;
>  
>  		xa_erase(&vepc->page_array, index);
> +		cond_resched();
>  	}
>  
>  	/*
> @@ -222,6 +223,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
>  			list_add_tail(&epc_page->list, &secs_pages);
>  
>  		xa_erase(&vepc->page_array, index);
> +		cond_resched();
>  	}
>  
>  	/*
> -- 
> 2.34.1

Thank you for looking into this.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko
Huang, Kai Aug. 14, 2023, 11:58 p.m. UTC | #2
On Mon, 2023-08-14 at 06:37 +0200, Jack Wang wrote:
> We hit softlocup with following call trace:
>  kernel: [ 2041.288210] Call Trace:
> kernel: [ 2041.288213]  <IRQ>
> kernel: [ 2041.288216]  ? watchdog_timer_fn+0x1b4/0x210
> kernel: [ 2041.288222]  ? lockup_detector_update_enable+0x50/0x50
> kernel: [ 2041.288225]  ? __hrtimer_run_queues+0x127/0x280
> kernel: [ 2041.288230]  ? hrtimer_interrupt+0xfc/0x210
> kernel: [ 2041.288233]  ? __sysvec_apic_timer_interrupt+0x59/0xd0
> kernel: [ 2041.288237]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> kernel: [ 2041.288242]  </IRQ>
> kernel: [ 2041.288242]  <TASK>
> kernel: [ 2041.288244]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> kernel: [ 2041.288250]  xa_erase+0x21/0xb0
> kernel: [ 2041.288256]  ? sgx_free_epc_page+0x20/0x50
> kernel: [ 2041.288260]  sgx_vepc_release+0x75/0x220
> kernel: [ 2041.288263]  __fput+0x89/0x250
> kernel: [ 2041.288269]  task_work_run+0x59/0x90
> kernel: [ 2041.288275]  do_exit+0x337/0x9a0
> 
> Similar like 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
> The test system has 64GB of enclave memory, and all assigned to a single
> VM. Release vepc take longer time and triggers the softlockup warning.
> 
> Add a cond_resched() to give other tasks a chance to run and placate
> the softlockup detector.
> 
> Cc: Jarkko Sakkinen <jarkko@kernel.org>
> Cc: Haitao Huang <haitao.huang@linux.intel.com>
> Reported-by: Yu Zhang <yu.zhang@ionos.com>
> 
> Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> ---
>  arch/x86/kernel/cpu/sgx/virt.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index c3e37eaec8ec..01d2795792cc 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -204,6 +204,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
>  			continue;
>  
>  		xa_erase(&vepc->page_array, index);
> +		cond_resched();
>  	}
>  
>  	/*
> @@ -222,6 +223,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
>  			list_add_tail(&epc_page->list, &secs_pages);
>  
>  		xa_erase(&vepc->page_array, index);
> +		cond_resched();
>  	}
>  
>  	/*

This adds cond_resched() to the first two loops: releasing non-SECS pages and
releasing SECS pages which don't have children on remote vepc instance(s). 
There is a third round loop to release SECS pages which have children on remote
vepc instance(s).  I think it's better to add cond_resched() there too, although
it's extremely unlikely it could trigger softlockup w/o cond_resched().

Also similar to 8795359e35bc ("x86/sgx: Silence softlockup detection when
releasing large enclaves"), this looks stable-kernel material.  So perhaps add:

Cc: stable@vger.kernel.org
Haitao Huang Aug. 15, 2023, 1:57 p.m. UTC | #3
On Sun, 13 Aug 2023 23:37:14 -0500, Jack Wang <jinpu.wang@ionos.com> wrote:

> We hit softlocup with following call trace:
>  kernel: [ 2041.288210] Call Trace:
> kernel: [ 2041.288213]  <IRQ>
> kernel: [ 2041.288216]  ? watchdog_timer_fn+0x1b4/0x210
> kernel: [ 2041.288222]  ? lockup_detector_update_enable+0x50/0x50
> kernel: [ 2041.288225]  ? __hrtimer_run_queues+0x127/0x280
> kernel: [ 2041.288230]  ? hrtimer_interrupt+0xfc/0x210
> kernel: [ 2041.288233]  ? __sysvec_apic_timer_interrupt+0x59/0xd0
> kernel: [ 2041.288237]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> kernel: [ 2041.288242]  </IRQ>
> kernel: [ 2041.288242]  <TASK>
> kernel: [ 2041.288244]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> kernel: [ 2041.288250]  xa_erase+0x21/0xb0
> kernel: [ 2041.288256]  ? sgx_free_epc_page+0x20/0x50
> kernel: [ 2041.288260]  sgx_vepc_release+0x75/0x220
> kernel: [ 2041.288263]  __fput+0x89/0x250
> kernel: [ 2041.288269]  task_work_run+0x59/0x90
> kernel: [ 2041.288275]  do_exit+0x337/0x9a0
>

If you would spin another version, please remove columns of the time stamp  
and "kernel"
Otherwise,

Acked-by: Haitao Huang <haitao.huang@linux.intel.com>

Thanks
Haitao
Jarkko Sakkinen Aug. 16, 2023, 8:34 p.m. UTC | #4
On Tue Aug 15, 2023 at 2:58 AM EEST, Huang, Kai wrote:
> On Mon, 2023-08-14 at 06:37 +0200, Jack Wang wrote:
> > We hit softlocup with following call trace:
> >  kernel: [ 2041.288210] Call Trace:
> > kernel: [ 2041.288213]  <IRQ>
> > kernel: [ 2041.288216]  ? watchdog_timer_fn+0x1b4/0x210
> > kernel: [ 2041.288222]  ? lockup_detector_update_enable+0x50/0x50
> > kernel: [ 2041.288225]  ? __hrtimer_run_queues+0x127/0x280
> > kernel: [ 2041.288230]  ? hrtimer_interrupt+0xfc/0x210
> > kernel: [ 2041.288233]  ? __sysvec_apic_timer_interrupt+0x59/0xd0
> > kernel: [ 2041.288237]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> > kernel: [ 2041.288242]  </IRQ>
> > kernel: [ 2041.288242]  <TASK>
> > kernel: [ 2041.288244]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > kernel: [ 2041.288250]  xa_erase+0x21/0xb0
> > kernel: [ 2041.288256]  ? sgx_free_epc_page+0x20/0x50
> > kernel: [ 2041.288260]  sgx_vepc_release+0x75/0x220
> > kernel: [ 2041.288263]  __fput+0x89/0x250
> > kernel: [ 2041.288269]  task_work_run+0x59/0x90
> > kernel: [ 2041.288275]  do_exit+0x337/0x9a0
> > 
> > Similar like 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
> > The test system has 64GB of enclave memory, and all assigned to a single
> > VM. Release vepc take longer time and triggers the softlockup warning.
> > 
> > Add a cond_resched() to give other tasks a chance to run and placate
> > the softlockup detector.
> > 
> > Cc: Jarkko Sakkinen <jarkko@kernel.org>
> > Cc: Haitao Huang <haitao.huang@linux.intel.com>
> > Reported-by: Yu Zhang <yu.zhang@ionos.com>
> > 
> > Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/virt.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> > index c3e37eaec8ec..01d2795792cc 100644
> > --- a/arch/x86/kernel/cpu/sgx/virt.c
> > +++ b/arch/x86/kernel/cpu/sgx/virt.c
> > @@ -204,6 +204,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
> >  			continue;
> >  
> >  		xa_erase(&vepc->page_array, index);
> > +		cond_resched();
> >  	}
> >  
> >  	/*
> > @@ -222,6 +223,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
> >  			list_add_tail(&epc_page->list, &secs_pages);
> >  
> >  		xa_erase(&vepc->page_array, index);
> > +		cond_resched();
> >  	}
> >  
> >  	/*
>
> This adds cond_resched() to the first two loops: releasing non-SECS pages and
> releasing SECS pages which don't have children on remote vepc instance(s). 
> There is a third round loop to release SECS pages which have children on remote
> vepc instance(s).  I think it's better to add cond_resched() there too, although
> it's extremely unlikely it could trigger softlockup w/o cond_resched().
>
> Also similar to 8795359e35bc ("x86/sgx: Silence softlockup detection when
> releasing large enclaves"), this looks stable-kernel material.  So perhaps add:
>
> Cc: stable@vger.kernel.org

+ would not hurt to have also fixes tag for clarity.

BR, Jarkko
Jinpu Wang Aug. 17, 2023, 7:38 a.m. UTC | #5
Hi Kai,

On Tue, Aug 15, 2023 at 1:58 AM Huang, Kai <kai.huang@intel.com> wrote:
>
> On Mon, 2023-08-14 at 06:37 +0200, Jack Wang wrote:
> > We hit softlocup with following call trace:
> >  kernel: [ 2041.288210] Call Trace:
> > kernel: [ 2041.288213]  <IRQ>
> > kernel: [ 2041.288216]  ? watchdog_timer_fn+0x1b4/0x210
> > kernel: [ 2041.288222]  ? lockup_detector_update_enable+0x50/0x50
> > kernel: [ 2041.288225]  ? __hrtimer_run_queues+0x127/0x280
> > kernel: [ 2041.288230]  ? hrtimer_interrupt+0xfc/0x210
> > kernel: [ 2041.288233]  ? __sysvec_apic_timer_interrupt+0x59/0xd0
> > kernel: [ 2041.288237]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> > kernel: [ 2041.288242]  </IRQ>
> > kernel: [ 2041.288242]  <TASK>
> > kernel: [ 2041.288244]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > kernel: [ 2041.288250]  xa_erase+0x21/0xb0
> > kernel: [ 2041.288256]  ? sgx_free_epc_page+0x20/0x50
> > kernel: [ 2041.288260]  sgx_vepc_release+0x75/0x220
> > kernel: [ 2041.288263]  __fput+0x89/0x250
> > kernel: [ 2041.288269]  task_work_run+0x59/0x90
> > kernel: [ 2041.288275]  do_exit+0x337/0x9a0
> >
> > Similar like 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
> > The test system has 64GB of enclave memory, and all assigned to a single
> > VM. Release vepc take longer time and triggers the softlockup warning.
> >
> > Add a cond_resched() to give other tasks a chance to run and placate
> > the softlockup detector.
> >
> > Cc: Jarkko Sakkinen <jarkko@kernel.org>
> > Cc: Haitao Huang <haitao.huang@linux.intel.com>
> > Reported-by: Yu Zhang <yu.zhang@ionos.com>
> >
> > Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/virt.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> > index c3e37eaec8ec..01d2795792cc 100644
> > --- a/arch/x86/kernel/cpu/sgx/virt.c
> > +++ b/arch/x86/kernel/cpu/sgx/virt.c
> > @@ -204,6 +204,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
> >                       continue;
> >
> >               xa_erase(&vepc->page_array, index);
> > +             cond_resched();
> >       }
> >
> >       /*
> > @@ -222,6 +223,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
> >                       list_add_tail(&epc_page->list, &secs_pages);
> >
> >               xa_erase(&vepc->page_array, index);
> > +             cond_resched();
> >       }
> >
> >       /*
>
> This adds cond_resched() to the first two loops: releasing non-SECS pages and
> releasing SECS pages which don't have children on remote vepc instance(s).
> There is a third round loop to release SECS pages which have children on remote
> vepc instance(s).  I think it's better to add cond_resched() there too, although
> it's extremely unlikely it could trigger softlockup w/o cond_resched().

I also think the third one are extremely unlikely could cause softlockup.
And our tests do not hit it either.

>
> Also similar to 8795359e35bc ("x86/sgx: Silence softlockup detection when
> releasing large enclaves"), this looks stable-kernel material.  So perhaps add:
>
> Cc: stable@vger.kernel.org

I've added the fixed tag and cc stable in the v2 patch.

Thx for the feedback and review.
Jinpu Wang Aug. 17, 2023, 7:39 a.m. UTC | #6
On Wed, Aug 16, 2023 at 10:34 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
>
> On Tue Aug 15, 2023 at 2:58 AM EEST, Huang, Kai wrote:
> > On Mon, 2023-08-14 at 06:37 +0200, Jack Wang wrote:
> > > We hit softlocup with following call trace:
> > >  kernel: [ 2041.288210] Call Trace:
> > > kernel: [ 2041.288213]  <IRQ>
> > > kernel: [ 2041.288216]  ? watchdog_timer_fn+0x1b4/0x210
> > > kernel: [ 2041.288222]  ? lockup_detector_update_enable+0x50/0x50
> > > kernel: [ 2041.288225]  ? __hrtimer_run_queues+0x127/0x280
> > > kernel: [ 2041.288230]  ? hrtimer_interrupt+0xfc/0x210
> > > kernel: [ 2041.288233]  ? __sysvec_apic_timer_interrupt+0x59/0xd0
> > > kernel: [ 2041.288237]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> > > kernel: [ 2041.288242]  </IRQ>
> > > kernel: [ 2041.288242]  <TASK>
> > > kernel: [ 2041.288244]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > kernel: [ 2041.288250]  xa_erase+0x21/0xb0
> > > kernel: [ 2041.288256]  ? sgx_free_epc_page+0x20/0x50
> > > kernel: [ 2041.288260]  sgx_vepc_release+0x75/0x220
> > > kernel: [ 2041.288263]  __fput+0x89/0x250
> > > kernel: [ 2041.288269]  task_work_run+0x59/0x90
> > > kernel: [ 2041.288275]  do_exit+0x337/0x9a0
> > >
> > > Similar like 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
> > > The test system has 64GB of enclave memory, and all assigned to a single
> > > VM. Release vepc take longer time and triggers the softlockup warning.
> > >
> > > Add a cond_resched() to give other tasks a chance to run and placate
> > > the softlockup detector.
> > >
> > > Cc: Jarkko Sakkinen <jarkko@kernel.org>
> > > Cc: Haitao Huang <haitao.huang@linux.intel.com>
> > > Reported-by: Yu Zhang <yu.zhang@ionos.com>
> > >
> > > Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> > > ---
> > >  arch/x86/kernel/cpu/sgx/virt.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> > > index c3e37eaec8ec..01d2795792cc 100644
> > > --- a/arch/x86/kernel/cpu/sgx/virt.c
> > > +++ b/arch/x86/kernel/cpu/sgx/virt.c
> > > @@ -204,6 +204,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
> > >                     continue;
> > >
> > >             xa_erase(&vepc->page_array, index);
> > > +           cond_resched();
> > >     }
> > >
> > >     /*
> > > @@ -222,6 +223,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
> > >                     list_add_tail(&epc_page->list, &secs_pages);
> > >
> > >             xa_erase(&vepc->page_array, index);
> > > +           cond_resched();
> > >     }
> > >
> > >     /*
> >
> > This adds cond_resched() to the first two loops: releasing non-SECS pages and
> > releasing SECS pages which don't have children on remote vepc instance(s).
> > There is a third round loop to release SECS pages which have children on remote
> > vepc instance(s).  I think it's better to add cond_resched() there too, although
> > it's extremely unlikely it could trigger softlockup w/o cond_resched().
> >
> > Also similar to 8795359e35bc ("x86/sgx: Silence softlockup detection when
> > releasing large enclaves"), this looks stable-kernel material.  So perhaps add:
> >
> > Cc: stable@vger.kernel.org
>
> + would not hurt to have also fixes tag for clarity.
>
> BR, Jarkko

I did it in v2 here,

https://lore.kernel.org/linux-sgx/3b93dfa438fb2f42f917393c3752ffc2dec35e52.camel@intel.com/T/#t

Thx for the review!


>
Huang, Kai Aug. 17, 2023, 10 a.m. UTC | #7
On Thu, 2023-08-17 at 09:38 +0200, Jinpu Wang wrote:
> > This adds cond_resched() to the first two loops: releasing non-SECS pages and
> > releasing SECS pages which don't have children on remote vepc instance(s).
> > There is a third round loop to release SECS pages which have children on remote
> > vepc instance(s).  I think it's better to add cond_resched() there too, although
> > it's extremely unlikely it could trigger softlockup w/o cond_resched().
> 
> I also think the third one are extremely unlikely could cause softlockup.
> And our tests do not hit it either.

If we only consider your tests, then I believe you even don't need the second
cond_resched()?

I think we should just add cond_resched() to all the three loops.  No harm, and
we don't need to worry about this anymore.
Jarkko Sakkinen Aug. 17, 2023, 4:01 p.m. UTC | #8
On Thu Aug 17, 2023 at 10:00 AM UTC, Huang, Kai wrote:
> On Thu, 2023-08-17 at 09:38 +0200, Jinpu Wang wrote:
> > > This adds cond_resched() to the first two loops: releasing non-SECS pages and
> > > releasing SECS pages which don't have children on remote vepc instance(s).
> > > There is a third round loop to release SECS pages which have children on remote
> > > vepc instance(s).  I think it's better to add cond_resched() there too, although
> > > it's extremely unlikely it could trigger softlockup w/o cond_resched().
> > 
> > I also think the third one are extremely unlikely could cause softlockup.
> > And our tests do not hit it either.
>
> If we only consider your tests, then I believe you even don't need the second
> cond_resched()?
>
> I think we should just add cond_resched() to all the three loops.  No harm, and
> we don't need to worry about this anymore.

I'm fine with this.

BR, Jarkko
Jinpu Wang Aug. 18, 2023, 8:12 a.m. UTC | #9
On Thu, Aug 17, 2023 at 6:02 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
>
> On Thu Aug 17, 2023 at 10:00 AM UTC, Huang, Kai wrote:
> > On Thu, 2023-08-17 at 09:38 +0200, Jinpu Wang wrote:
> > > > This adds cond_resched() to the first two loops: releasing non-SECS pages and
> > > > releasing SECS pages which don't have children on remote vepc instance(s).
> > > > There is a third round loop to release SECS pages which have children on remote
> > > > vepc instance(s).  I think it's better to add cond_resched() there too, although
> > > > it's extremely unlikely it could trigger softlockup w/o cond_resched().
> > >
> > > I also think the third one are extremely unlikely could cause softlockup.
> > > And our tests do not hit it either.
> >
> > If we only consider your tests, then I believe you even don't need the second
> > cond_resched()?
> >
> > I think we should just add cond_resched() to all the three loops.  No harm, and
> > we don't need to worry about this anymore.
>
> I'm fine with this.

Ok, as both of you think it's better to cover all 3 cases. I will do
it in v3 later today.
>
> BR, Jarkko
Thx!
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index c3e37eaec8ec..01d2795792cc 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -204,6 +204,7 @@  static int sgx_vepc_release(struct inode *inode, struct file *file)
 			continue;
 
 		xa_erase(&vepc->page_array, index);
+		cond_resched();
 	}
 
 	/*
@@ -222,6 +223,7 @@  static int sgx_vepc_release(struct inode *inode, struct file *file)
 			list_add_tail(&epc_page->list, &secs_pages);
 
 		xa_erase(&vepc->page_array, index);
+		cond_resched();
 	}
 
 	/*