Message ID | 20250305-pcc_fixes_updates-v2-1-1b1822bc8746@arm.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | mailbox: pcc: Fixes and cleanup/refactoring | expand |
在 2025/3/6 0:38, Sudeep Holla 写道: > From: Huisong Li <lihuisong@huawei.com> > > The function mbox_chan_received_data() calls the Rx callback of the > mailbox client driver. The callback might set chan_in_use flag from > pcc_send_data(). This flag's status determines whether the PCC channel > is in use. > > However, there is a potential race condition where chan_in_use is > updated incorrectly due to concurrency between the interrupt handler > (pcc_mbox_irq()) and the command sender(pcc_send_data()). > > The 'chan_in_use' flag of a channel is set to true after sending a > command. And the flag of the new command may be cleared erroneous by > the interrupt handler afer mbox_chan_received_data() returns, > > As a result, the interrupt being level triggered can't be cleared in > pcc_mbox_irq() and it will be disabled after the number of handled times > exceeds the specified value. The error log is as follows: > > | kunpeng_hccs HISI04B2:00: PCC command executed timeout! > | kunpeng_hccs HISI04B2:00: get port link status info failed, ret = -110 > | irq 13: nobody cared (try booting with the "irqpoll" option) > | Call trace: > | dump_backtrace+0x0/0x210 > | show_stack+0x1c/0x2c > | dump_stack+0xec/0x130 > | __report_bad_irq+0x50/0x190 > | note_interrupt+0x1e4/0x260 > | handle_irq_event+0x144/0x17c > | handle_fasteoi_irq+0xd0/0x240 > | __handle_domain_irq+0x80/0xf0 > | gic_handle_irq+0x74/0x2d0 > | el1_irq+0xbc/0x140 > | mnt_clone_write+0x0/0x70 > | file_update_time+0xcc/0x160 > | fault_dirty_shared_page+0xe8/0x150 > | do_shared_fault+0x80/0x1d0 > | do_fault+0x118/0x1a4 > | handle_pte_fault+0x154/0x230 > | __handle_mm_fault+0x1ac/0x390 > | handle_mm_fault+0xf0/0x250 > | do_page_fault+0x184/0x454 > | do_translation_fault+0xac/0xd4 > | do_mem_abort+0x44/0xb4 > | el0_da+0x40/0x74 > | el0_sync_handler+0x60/0xb4 > | el0_sync+0x168/0x180 > | handlers: > | pcc_mbox_irq > | Disabling IRQ #13 > > To solve this issue, pcc_mbox_irq() must clear 'chan_in_use' flag before > the call to mbox_chan_received_data(). > > Signed-off-by: Huisong Li <lihuisong@huawei.com> > (sudeep.holla: Minor updates to the subject and commit message) > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> > --- > drivers/mailbox/pcc.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c > index 82102a4c5d68839170238540a6fed61afa5185a0..f2e4087281c70eeb5b9b33371596613a371dff4f 100644 > --- a/drivers/mailbox/pcc.c > +++ b/drivers/mailbox/pcc.c > @@ -333,10 +333,15 @@ static irqreturn_t pcc_mbox_irq(int irq, void *p) > if (pcc_chan_reg_read_modify_write(&pchan->plat_irq_ack)) > return IRQ_NONE; > > + /* > + * Clear this flag immediately after updating interrupt ack register > + * to avoid possible race in updatation of the flag from > + * pcc_send_data() that could execute from mbox_chan_received_data() This comment may be inappropriate becuase of the moving of clearing interrupt ack register in patch 2/13. I suggested that fix it in this patch or patch 2/13. > + */ > + pchan->chan_in_use = false; > mbox_chan_received_data(chan, NULL); > > check_and_ack(pchan, chan); > - pchan->chan_in_use = false; > > return IRQ_HANDLED; > } >
On Tue, Mar 11, 2025 at 07:40:53PM +0800, lihuisong (C) wrote: > > 在 2025/3/6 0:38, Sudeep Holla 写道: > > From: Huisong Li <lihuisong@huawei.com> > > > > The function mbox_chan_received_data() calls the Rx callback of the > > mailbox client driver. The callback might set chan_in_use flag from > > pcc_send_data(). This flag's status determines whether the PCC channel > > is in use. > > > > However, there is a potential race condition where chan_in_use is > > updated incorrectly due to concurrency between the interrupt handler > > (pcc_mbox_irq()) and the command sender(pcc_send_data()). > > > > The 'chan_in_use' flag of a channel is set to true after sending a > > command. And the flag of the new command may be cleared erroneous by > > the interrupt handler afer mbox_chan_received_data() returns, > > > > As a result, the interrupt being level triggered can't be cleared in > > pcc_mbox_irq() and it will be disabled after the number of handled times > > exceeds the specified value. The error log is as follows: > > > > | kunpeng_hccs HISI04B2:00: PCC command executed timeout! > > | kunpeng_hccs HISI04B2:00: get port link status info failed, ret = -110 > > | irq 13: nobody cared (try booting with the "irqpoll" option) > > | Call trace: > > | dump_backtrace+0x0/0x210 > > | show_stack+0x1c/0x2c > > | dump_stack+0xec/0x130 > > | __report_bad_irq+0x50/0x190 > > | note_interrupt+0x1e4/0x260 > > | handle_irq_event+0x144/0x17c > > | handle_fasteoi_irq+0xd0/0x240 > > | __handle_domain_irq+0x80/0xf0 > > | gic_handle_irq+0x74/0x2d0 > > | el1_irq+0xbc/0x140 > > | mnt_clone_write+0x0/0x70 > > | file_update_time+0xcc/0x160 > > | fault_dirty_shared_page+0xe8/0x150 > > | do_shared_fault+0x80/0x1d0 > > | do_fault+0x118/0x1a4 > > | handle_pte_fault+0x154/0x230 > > | __handle_mm_fault+0x1ac/0x390 > > | handle_mm_fault+0xf0/0x250 > > | do_page_fault+0x184/0x454 > > | do_translation_fault+0xac/0xd4 > > | do_mem_abort+0x44/0xb4 > > | el0_da+0x40/0x74 > > | el0_sync_handler+0x60/0xb4 > > | el0_sync+0x168/0x180 > > | handlers: > > | pcc_mbox_irq > > | Disabling IRQ #13 > > > > To solve this issue, pcc_mbox_irq() must clear 'chan_in_use' flag before > > the call to mbox_chan_received_data(). > > > > Signed-off-by: Huisong Li <lihuisong@huawei.com> > > (sudeep.holla: Minor updates to the subject and commit message) > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> > > --- > > drivers/mailbox/pcc.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c > > index 82102a4c5d68839170238540a6fed61afa5185a0..f2e4087281c70eeb5b9b33371596613a371dff4f 100644 > > --- a/drivers/mailbox/pcc.c > > +++ b/drivers/mailbox/pcc.c > > @@ -333,10 +333,15 @@ static irqreturn_t pcc_mbox_irq(int irq, void *p) > > if (pcc_chan_reg_read_modify_write(&pchan->plat_irq_ack)) > > return IRQ_NONE; > > + /* > > + * Clear this flag immediately after updating interrupt ack register > > + * to avoid possible race in updatation of the flag from > > + * pcc_send_data() that could execute from mbox_chan_received_data() > This comment may be inappropriate becuase of the moving of clearing > interrupt ack register in patch 2/13. > I suggested that fix it in this patch or patch 2/13. Right, I did think of updating or did update and missed to commit. I wanted to update something like: " Clear this flag after updating interrupt ack register and just before mbox_chan_received_data() which might call pcc_send_data() where the flag is set again to start new transfer " Hope that is more apt to the current situation.
在 2025/3/11 20:02, Sudeep Holla 写道: > On Tue, Mar 11, 2025 at 07:40:53PM +0800, lihuisong (C) wrote: >> 在 2025/3/6 0:38, Sudeep Holla 写道: >>> From: Huisong Li <lihuisong@huawei.com> >>> >>> The function mbox_chan_received_data() calls the Rx callback of the >>> mailbox client driver. The callback might set chan_in_use flag from >>> pcc_send_data(). This flag's status determines whether the PCC channel >>> is in use. >>> >>> However, there is a potential race condition where chan_in_use is >>> updated incorrectly due to concurrency between the interrupt handler >>> (pcc_mbox_irq()) and the command sender(pcc_send_data()). >>> >>> The 'chan_in_use' flag of a channel is set to true after sending a >>> command. And the flag of the new command may be cleared erroneous by >>> the interrupt handler afer mbox_chan_received_data() returns, >>> >>> As a result, the interrupt being level triggered can't be cleared in >>> pcc_mbox_irq() and it will be disabled after the number of handled times >>> exceeds the specified value. The error log is as follows: >>> >>> | kunpeng_hccs HISI04B2:00: PCC command executed timeout! >>> | kunpeng_hccs HISI04B2:00: get port link status info failed, ret = -110 >>> | irq 13: nobody cared (try booting with the "irqpoll" option) >>> | Call trace: >>> | dump_backtrace+0x0/0x210 >>> | show_stack+0x1c/0x2c >>> | dump_stack+0xec/0x130 >>> | __report_bad_irq+0x50/0x190 >>> | note_interrupt+0x1e4/0x260 >>> | handle_irq_event+0x144/0x17c >>> | handle_fasteoi_irq+0xd0/0x240 >>> | __handle_domain_irq+0x80/0xf0 >>> | gic_handle_irq+0x74/0x2d0 >>> | el1_irq+0xbc/0x140 >>> | mnt_clone_write+0x0/0x70 >>> | file_update_time+0xcc/0x160 >>> | fault_dirty_shared_page+0xe8/0x150 >>> | do_shared_fault+0x80/0x1d0 >>> | do_fault+0x118/0x1a4 >>> | handle_pte_fault+0x154/0x230 >>> | __handle_mm_fault+0x1ac/0x390 >>> | handle_mm_fault+0xf0/0x250 >>> | do_page_fault+0x184/0x454 >>> | do_translation_fault+0xac/0xd4 >>> | do_mem_abort+0x44/0xb4 >>> | el0_da+0x40/0x74 >>> | el0_sync_handler+0x60/0xb4 >>> | el0_sync+0x168/0x180 >>> | handlers: >>> | pcc_mbox_irq >>> | Disabling IRQ #13 >>> >>> To solve this issue, pcc_mbox_irq() must clear 'chan_in_use' flag before >>> the call to mbox_chan_received_data(). >>> >>> Signed-off-by: Huisong Li <lihuisong@huawei.com> >>> (sudeep.holla: Minor updates to the subject and commit message) >>> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> >>> --- >>> drivers/mailbox/pcc.c | 7 ++++++- >>> 1 file changed, 6 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c >>> index 82102a4c5d68839170238540a6fed61afa5185a0..f2e4087281c70eeb5b9b33371596613a371dff4f 100644 >>> --- a/drivers/mailbox/pcc.c >>> +++ b/drivers/mailbox/pcc.c >>> @@ -333,10 +333,15 @@ static irqreturn_t pcc_mbox_irq(int irq, void *p) >>> if (pcc_chan_reg_read_modify_write(&pchan->plat_irq_ack)) >>> return IRQ_NONE; >>> + /* >>> + * Clear this flag immediately after updating interrupt ack register >>> + * to avoid possible race in updatation of the flag from >>> + * pcc_send_data() that could execute from mbox_chan_received_data() >> This comment may be inappropriate becuase of the moving of clearing >> interrupt ack register in patch 2/13. >> I suggested that fix it in this patch or patch 2/13. > Right, I did think of updating or did update and missed to commit. > I wanted to update something like: > " > Clear this flag after updating interrupt ack register and just before > mbox_chan_received_data() which might call pcc_send_data() where the > flag is set again to start new transfer > " Ack > > Hope that is more apt to the current situation. >
On 3/5/2025 11:38 AM, Sudeep Holla wrote: > From: Huisong Li <lihuisong@huawei.com> > > The function mbox_chan_received_data() calls the Rx callback of the > mailbox client driver. The callback might set chan_in_use flag from > pcc_send_data(). This flag's status determines whether the PCC channel > is in use. > > However, there is a potential race condition where chan_in_use is > updated incorrectly due to concurrency between the interrupt handler > (pcc_mbox_irq()) and the command sender(pcc_send_data()). > > The 'chan_in_use' flag of a channel is set to true after sending a > command. And the flag of the new command may be cleared erroneous by > the interrupt handler afer mbox_chan_received_data() returns, > > As a result, the interrupt being level triggered can't be cleared in > pcc_mbox_irq() and it will be disabled after the number of handled times > exceeds the specified value. The error log is as follows: > > | kunpeng_hccs HISI04B2:00: PCC command executed timeout! > | kunpeng_hccs HISI04B2:00: get port link status info failed, ret = -110 > | irq 13: nobody cared (try booting with the "irqpoll" option) > | Call trace: > | dump_backtrace+0x0/0x210 > | show_stack+0x1c/0x2c > | dump_stack+0xec/0x130 > | __report_bad_irq+0x50/0x190 > | note_interrupt+0x1e4/0x260 > | handle_irq_event+0x144/0x17c > | handle_fasteoi_irq+0xd0/0x240 > | __handle_domain_irq+0x80/0xf0 > | gic_handle_irq+0x74/0x2d0 > | el1_irq+0xbc/0x140 > | mnt_clone_write+0x0/0x70 > | file_update_time+0xcc/0x160 > | fault_dirty_shared_page+0xe8/0x150 > | do_shared_fault+0x80/0x1d0 > | do_fault+0x118/0x1a4 > | handle_pte_fault+0x154/0x230 > | __handle_mm_fault+0x1ac/0x390 > | handle_mm_fault+0xf0/0x250 > | do_page_fault+0x184/0x454 > | do_translation_fault+0xac/0xd4 > | do_mem_abort+0x44/0xb4 > | el0_da+0x40/0x74 > | el0_sync_handler+0x60/0xb4 > | el0_sync+0x168/0x180 > | handlers: > | pcc_mbox_irq > | Disabling IRQ #13 > > To solve this issue, pcc_mbox_irq() must clear 'chan_in_use' flag before > the call to mbox_chan_received_data(). > > Signed-off-by: Huisong Li <lihuisong@huawei.com> > (sudeep.holla: Minor updates to the subject and commit message) > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Robbie King <robbiek@xsightlabs.com>
diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c index 82102a4c5d68839170238540a6fed61afa5185a0..f2e4087281c70eeb5b9b33371596613a371dff4f 100644 --- a/drivers/mailbox/pcc.c +++ b/drivers/mailbox/pcc.c @@ -333,10 +333,15 @@ static irqreturn_t pcc_mbox_irq(int irq, void *p) if (pcc_chan_reg_read_modify_write(&pchan->plat_irq_ack)) return IRQ_NONE; + /* + * Clear this flag immediately after updating interrupt ack register + * to avoid possible race in updatation of the flag from + * pcc_send_data() that could execute from mbox_chan_received_data() + */ + pchan->chan_in_use = false; mbox_chan_received_data(chan, NULL); check_and_ack(pchan, chan); - pchan->chan_in_use = false; return IRQ_HANDLED; }