Message ID | 20240617210844.337476-9-bvanassche@acm.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | UFS patches for kernel 6.11 | expand |
On Mon, 2024-06-17 at 14:07 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > If ufshcd_abort() returns SUCCESS for an already completed command > then > that command is completed twice. This results in a crash. Prevent > this by > checking whether a command has completed without completion interrupt > from > the timeout handler. This CL fixes the following kernel crash: > > Unable to handle kernel NULL pointer dereference at virtual address > 0000000000000000 > Call trace: > dma_direct_map_sg+0x70/0x274 > scsi_dma_map+0x84/0x124 > ufshcd_queuecommand+0x3fc/0x880 > scsi_queue_rq+0x7d0/0x111c > blk_mq_dispatch_rq_list+0x440/0xebc > blk_mq_do_dispatch_sched+0x5a4/0x6b8 > __blk_mq_sched_dispatch_requests+0x150/0x220 > __blk_mq_run_hw_queue+0xf0/0x218 > __blk_mq_delay_run_hw_queue+0x8c/0x18c > blk_mq_run_hw_queue+0x1a4/0x360 > blk_mq_sched_insert_requests+0x130/0x334 > blk_mq_flush_plug_list+0x138/0x234 > blk_flush_plug_list+0x118/0x164 > blk_finish_plug() > read_pages+0x38c/0x408 > page_cache_ra_unbounded+0x230/0x2f8 > do_sync_mmap_readahead+0x1a4/0x208 > filemap_fault+0x27c/0x8f4 > f2fs_filemap_fault+0x28/0xfc > __do_fault+0xc4/0x208 > handle_pte_fault+0x290/0xe04 > do_handle_mm_fault+0x52c/0x858 > do_page_fault+0x5dc/0x798 > do_translation_fault+0x40/0x54 > do_mem_abort+0x60/0x134 > el0_da+0x40/0xb8 > el0t_64_sync_handler+0xc4/0xe4 > el0t_64_sync+0x1b4/0x1b8 > Hi Bart, This backtrace is ufshcd_queuecommand KE. If ufshcd_abort() complete an already completed command, it should be KE with ufshcd_abort backtrace? More, if a command is completed by irq. The rq may be release and ufshcd_mcq_req_to_hwq(hba, rq) will get KE Here is our backtrace of this case. platform +platform:112b0000.ufshci ufshcd-mtk 112b0000.ufshci: ufshcd_try_to_abort_task: cmd at tag 41 not pending in the device. platform +platform:112b0000.ufshci ufshcd-mtk 112b0000.ufshci: ufshcd_try_to_abort_task: cmd at tag=41 is cleared. platform +platform:112b0000.ufshci ufshcd-mtk 112b0000.ufshci: Aborting tag 41 / CDB 0x28 succeeded Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194 pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14 lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40 [ufs_mediatek_mod_ise] do_mem_abort+0x58/0x118 el1_abort+0x3c/0x5c el1h_64_sync_handler+0x54/0x90 el1h_64_sync+0x68/0x6c blk_mq_unique_tag+0x8/0x14 ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise] process_one_work+0x208/0x4fc worker_thread+0x228/0x438 kthread+0x104/0x1d4 ret_from_fork+0x10/0x20 Thanks. Peter > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/ufs/core/ufshcd.c | 23 ++++++++++++++++++++++- > 1 file changed, 22 insertions(+), 1 deletion(-) > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index e3835e61e4b1..47cc0802c4f4 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -8922,7 +8922,28 @@ static void ufshcd_async_scan(void *data, > async_cookie_t cookie) > > static enum scsi_timeout_action ufshcd_eh_timed_out(struct scsi_cmnd > *scmd) > { > -struct ufs_hba *hba = shost_priv(scmd->device->host); > +struct scsi_device *sdev = scmd->device; > +struct ufs_hba *hba = shost_priv(sdev->host); > +struct scsi_cmnd *cmd2 = scmd; > +const u32 unique_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd)); > + > +WARN_ON_ONCE(!scmd); > + > +if (is_mcq_enabled(hba)) { > +struct request *rq = scsi_cmd_to_rq(scmd); > +struct ufs_hw_queue *hwq = ufshcd_mcq_req_to_hwq(hba, rq); > + > +ufshcd_mcq_poll_cqe_lock(hba, hwq, &cmd2); > +} else { > +__ufshcd_poll(hba->host, UFSHCD_POLL_FROM_INTERRUPT_CONTEXT, > + &cmd2); > +} > +if (cmd2 == NULL) { > +sdev_printk(KERN_INFO, sdev, > + "%s: cmd with tag %#x has already been completed\n", > + __func__, unique_tag); > +return SCSI_EH_DONE; > +} > > if (!hba->system_suspending) { > /* Activate the error handler in the SCSI core. */
On 6/20/24 11:54 PM, Peter Wang (王信友) wrote: > Unable to handle kernel NULL pointer dereference at virtual address > 0000000000000194 > pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14 > lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40 > [ufs_mediatek_mod_ise] > do_mem_abort+0x58/0x118 > el1_abort+0x3c/0x5c > el1h_64_sync_handler+0x54/0x90 > el1h_64_sync+0x68/0x6c > blk_mq_unique_tag+0x8/0x14 > ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise] > process_one_work+0x208/0x4fc > worker_thread+0x228/0x438 > kthread+0x104/0x1d4 > ret_from_fork+0x10/0x20 Hi Peter, The above backtrace can only occur with MCQ enabled. The backtrace I posted was triggered on a system with a UFSHCI 3.0 controller (no MCQ). So I think that the backtraces have different root causes and hence that different patches are required to fix both crashes. Thanks, Bart.
On Fri, 2024-06-21 at 10:23 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 6/20/24 11:54 PM, Peter Wang (王信友) wrote: > > Unable to handle kernel NULL pointer dereference at virtual > address > > 0000000000000194 > > pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14 > > lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40 > > [ufs_mediatek_mod_ise] > > do_mem_abort+0x58/0x118 > > el1_abort+0x3c/0x5c > > el1h_64_sync_handler+0x54/0x90 > > el1h_64_sync+0x68/0x6c > > blk_mq_unique_tag+0x8/0x14 > > ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise] > > process_one_work+0x208/0x4fc > > worker_thread+0x228/0x438 > > kthread+0x104/0x1d4 > > ret_from_fork+0x10/0x20 > > Hi Peter, > > The above backtrace can only occur with MCQ enabled. The backtrace I > posted was triggered on a system with a UFSHCI 3.0 controller (no > MCQ). > So I think that the backtraces have different root causes and hence > that > different patches are required to fix both crashes. > > Thanks, > > Bart. > Hi Bart, Your backtrace is Single doorbell mode. But I am curious that how could it happen if complete a cmd twice and get null pointer next time queuecommand? could you describe the exactly flow? More, because ufshcd_eh_timed_out could run in MCQ mode. So, this patch will get null pointer when racing happen in MCQ mode. Thanks Peter
On 6/24/24 1:54 AM, Peter Wang (王信友) wrote: > Your backtrace is Single doorbell mode. But I am curious that > how could it happen if complete a cmd twice and get null pointer > next time queuecommand? could you describe the exactly flow? SCSI commands are completed only once. See also the code in the SCSI core that sets the SCMD_STATE_COMPLETE bit: $ git grep -nH 'test_and_set_bit(SCMD_STATE_COMPLETE' drivers/scsi/scsi_error.c:362: if (test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state)) drivers/scsi/scsi_lib.c:1716: if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state))) In other words, either the regular completion code is executed by scsi_done_internal() or error handling is initiated by scsi_timeout(). Only one of the two happens. The only exception is that .eh_timed_out() may be called concurrently with the regular completion handler. Hence this patch for ufshcd_eh_timed_out(). Bart.
On Mon, 2024-06-24 at 11:12 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 6/24/24 1:54 AM, Peter Wang (王信友) wrote: > > Your backtrace is Single doorbell mode. But I am curious that > > how could it happen if complete a cmd twice and get null pointer > > next time queuecommand? could you describe the exactly flow? > > SCSI commands are completed only once. See also the code in the SCSI > core that sets the SCMD_STATE_COMPLETE bit: > > $ git grep -nH 'test_and_set_bit(SCMD_STATE_COMPLETE' > drivers/scsi/scsi_error.c:362:if > (test_and_set_bit(SCMD_STATE_COMPLETE, > &scmd->state)) > drivers/scsi/scsi_lib.c:1716:if > (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state))) > > In other words, either the regular completion code is executed by > scsi_done_internal() or error handling is initiated by > scsi_timeout(). > Only one of the two happens. > > The only exception is that .eh_timed_out() may be called concurrently > with the regular completion handler. Hence this patch for > ufshcd_eh_timed_out(). > > Bart. Hi Bart, I still confused. When eh_timed_out() called concurrently with the regular completion handler. Both process use test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state) to set SCMD_STATE_COMPLETE flag. test_and_set_bit should be atomice operation? and only one can set this bit than run forward? BTW, I still don't understand if both eh_timed_out and regular completion handler set SCMD_STATE_COMPLETE, what is the relation between SCMD_STATE_COMPLETE and ufshcd_queuecommand null pointer? Thanks. Peter
On 6/25/24 3:04 AM, Peter Wang (王信友) wrote: > When eh_timed_out() called concurrently with the regular completion > handler. > Both process use test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state) to > set SCMD_STATE_COMPLETE flag. > test_and_set_bit should be atomice operation? and only one can set this > bit than run forward? Yes, only one of the two test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state) calls will set the SCMD_STATE_COMPLETE bit and only one of these two calls will return the value 'true'. > BTW, I still don't understand if both eh_timed_out and regular > completion handler set SCMD_STATE_COMPLETE, > what is the relation between SCMD_STATE_COMPLETE and > ufshcd_queuecommand null pointer? While ufshcd_eh_timed_out() does not set the SCMD_STATE_COMPLETE bit, its caller may set that bit after ufshcd_eh_timed_out() has returned. Hence, a command may be completed while ufshcd_eh_timed_out() is in progress. Thanks, Bart.
On Tue, 2024-06-25 at 09:33 -0700, Bart Van Assche wrote: > > > While ufshcd_eh_timed_out() does not set the SCMD_STATE_COMPLETE bit, > its caller may set that bit after ufshcd_eh_timed_out() has returned. > Hence, a command may be completed while ufshcd_eh_timed_out() is in > progress. > > Thanks, > > Bart. Hi Bart, So, when this concurrency issue happen, which one set the SCMD_STATE_COMPLETE flag? scsi_timeout or scsi_done_internal? And why ufshcd_queuecommand got null pointer? which pointer is null? Thanks. Peter
On 6/25/24 8:54 PM, Peter Wang (王信友) wrote:
> And why ufshcd_queuecommand got null pointer? which pointer is null?
I'm not sure. faddr2line reports that the crash happens in the source
code line with the following assignment: "sg_dma_len(sg) = sg->length".
That seems weird to me. If the sg pointer would be invalid then an
earlier dereference of the 'sg' pointer should already have triggered a
crash. The entire function is as follows:
int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int
nents,
enum dma_data_direction dir, unsigned long attrs)
{
int i;
struct scatterlist *sg;
for_each_sg(sgl, sg, nents, i) {
sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
sg->offset, sg->length, dir, attrs);
if (sg->dma_address == DMA_MAPPING_ERROR)
goto out_unmap;
sg_dma_len(sg) = sg->length;
}
return nents;
out_unmap:
dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
return -EIO;
}
Bart.
On 2024/6/18 5:07, Bart Van Assche wrote: > If ufshcd_abort() returns SUCCESS for an already completed command then > that command is completed twice. This results in a crash. Prevent this by > checking whether a command has completed without completion interrupt from > the timeout handler. This CL fixes the following kernel crash: > Hi Bart, Could you describe in more detail about how command completed twice happened? I think this would not happen, below is my analysis, if it is wrong, please correct me. In your description, ufshcd_abort() returns SUCCESS is condition which trigger the issue. There are 2 paths would call ufshcd_abort(). The first one is triggered by block layer's timeout, which is: scsi_timeout() hostt->eh_timed_out() [ufshcd_eh_timed_out] // return SCSI_EH_NOT_HANDLED test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state) // return true scsi_abort_command() queue_delayed_work(shost->tmf_work_q, &scmd->abort_work, ...) The above flow would trigger ufshcd_abort() be called in workqueue: scmd_eh_abort_handler() scsi_try_to_abort_cmd() hostt->eh_abort_handler() [ufshcd_abort] While when ufshcd_abort() is called by above flow, the command has been marked as SCMD_STATE_COMPLETE, and it means scsi_timeout() win the scsi_done(), so normal context(usually triggered by driver's irq handler which call scsi_done()) would not handle this command any more. The only path which would handle this command is scmd_eh_abort_handler() if ufshcd_abort() return SUCCESS. Is any other context which would finish this command? Another path is: scsi_send_eh_cmnd() shost->hostt->queuecommand() // queued command did not finish in time scsi_abort_eh_cmnd() This is error handler path when host is set to RECOVERY state, no new normal command would be send any more. What's more, if ufshcd_abort() is called now, the aborted command is also command send in scsi_send_eh_cmnd(). > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > Call trace: > dma_direct_map_sg+0x70/0x274 > scsi_dma_map+0x84/0x124 > ufshcd_queuecommand+0x3fc/0x880 > scsi_queue_rq+0x7d0/0x111c > blk_mq_dispatch_rq_list+0x440/0xebc > blk_mq_do_dispatch_sched+0x5a4/0x6b8 > __blk_mq_sched_dispatch_requests+0x150/0x220 > __blk_mq_run_hw_queue+0xf0/0x218 > __blk_mq_delay_run_hw_queue+0x8c/0x18c > blk_mq_run_hw_queue+0x1a4/0x360 > blk_mq_sched_insert_requests+0x130/0x334 > blk_mq_flush_plug_list+0x138/0x234 > blk_flush_plug_list+0x118/0x164 > blk_finish_plug() > read_pages+0x38c/0x408 > page_cache_ra_unbounded+0x230/0x2f8 > do_sync_mmap_readahead+0x1a4/0x208 > filemap_fault+0x27c/0x8f4 > f2fs_filemap_fault+0x28/0xfc > __do_fault+0xc4/0x208 > handle_pte_fault+0x290/0xe04 > do_handle_mm_fault+0x52c/0x858 > do_page_fault+0x5dc/0x798 > do_translation_fault+0x40/0x54 > do_mem_abort+0x60/0x134 > el0_da+0x40/0xb8 > el0t_64_sync_handler+0xc4/0xe4 > el0t_64_sync+0x1b4/0x1b8 > > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/ufs/core/ufshcd.c | 23 ++++++++++++++++++++++- > 1 file changed, 22 insertions(+), 1 deletion(-) > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index e3835e61e4b1..47cc0802c4f4 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -8922,7 +8922,28 @@ static void ufshcd_async_scan(void *data, async_cookie_t cookie) > > static enum scsi_timeout_action ufshcd_eh_timed_out(struct scsi_cmnd *scmd) > { > - struct ufs_hba *hba = shost_priv(scmd->device->host); > + struct scsi_device *sdev = scmd->device; > + struct ufs_hba *hba = shost_priv(sdev->host); > + struct scsi_cmnd *cmd2 = scmd; > + const u32 unique_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd)); > + > + WARN_ON_ONCE(!scmd); > + > + if (is_mcq_enabled(hba)) { > + struct request *rq = scsi_cmd_to_rq(scmd); > + struct ufs_hw_queue *hwq = ufshcd_mcq_req_to_hwq(hba, rq); > + > + ufshcd_mcq_poll_cqe_lock(hba, hwq, &cmd2); > + } else { > + __ufshcd_poll(hba->host, UFSHCD_POLL_FROM_INTERRUPT_CONTEXT, > + &cmd2); > + } > + if (cmd2 == NULL) { > + sdev_printk(KERN_INFO, sdev, > + "%s: cmd with tag %#x has already been completed\n", > + __func__, unique_tag); > + return SCSI_EH_DONE; > + } > > if (!hba->system_suspending) { > /* Activate the error handler in the SCSI core. */ >
On Wed, 2024-06-26 at 14:54 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 6/25/24 8:54 PM, Peter Wang (王信友) wrote: > > And why ufshcd_queuecommand got null pointer? which pointer is > null? > > I'm not sure. faddr2line reports that the crash happens in the source > code line with the following assignment: "sg_dma_len(sg) = sg- > >length". > That seems weird to me. If the sg pointer would be invalid then an > earlier dereference of the 'sg' pointer should already have triggered > a > Hi Bart, This is really weird. Perphas it is dram corrupt issue? And is unrelated to the ufshcd_abort racing I think. Thanks. Peter
On 6/27/24 3:56 AM, Peter Wang (王信友) wrote: > This is really weird. > Perphas it is dram corrupt issue? > And is unrelated to the ufshcd_abort racing I think. We have seen this crash five times with kernel 5.15. I think that the number of occurrences is too high to be caused by DRAM corruption. Anyway, I will leave out patches 7/8 and 8/8 from this patch series. We can revisit these patches if the issue would be observed again with a later kernel. Bart.
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index e3835e61e4b1..47cc0802c4f4 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -8922,7 +8922,28 @@ static void ufshcd_async_scan(void *data, async_cookie_t cookie) static enum scsi_timeout_action ufshcd_eh_timed_out(struct scsi_cmnd *scmd) { - struct ufs_hba *hba = shost_priv(scmd->device->host); + struct scsi_device *sdev = scmd->device; + struct ufs_hba *hba = shost_priv(sdev->host); + struct scsi_cmnd *cmd2 = scmd; + const u32 unique_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd)); + + WARN_ON_ONCE(!scmd); + + if (is_mcq_enabled(hba)) { + struct request *rq = scsi_cmd_to_rq(scmd); + struct ufs_hw_queue *hwq = ufshcd_mcq_req_to_hwq(hba, rq); + + ufshcd_mcq_poll_cqe_lock(hba, hwq, &cmd2); + } else { + __ufshcd_poll(hba->host, UFSHCD_POLL_FROM_INTERRUPT_CONTEXT, + &cmd2); + } + if (cmd2 == NULL) { + sdev_printk(KERN_INFO, sdev, + "%s: cmd with tag %#x has already been completed\n", + __func__, unique_tag); + return SCSI_EH_DONE; + } if (!hba->system_suspending) { /* Activate the error handler in the SCSI core. */
If ufshcd_abort() returns SUCCESS for an already completed command then that command is completed twice. This results in a crash. Prevent this by checking whether a command has completed without completion interrupt from the timeout handler. This CL fixes the following kernel crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: dma_direct_map_sg+0x70/0x274 scsi_dma_map+0x84/0x124 ufshcd_queuecommand+0x3fc/0x880 scsi_queue_rq+0x7d0/0x111c blk_mq_dispatch_rq_list+0x440/0xebc blk_mq_do_dispatch_sched+0x5a4/0x6b8 __blk_mq_sched_dispatch_requests+0x150/0x220 __blk_mq_run_hw_queue+0xf0/0x218 __blk_mq_delay_run_hw_queue+0x8c/0x18c blk_mq_run_hw_queue+0x1a4/0x360 blk_mq_sched_insert_requests+0x130/0x334 blk_mq_flush_plug_list+0x138/0x234 blk_flush_plug_list+0x118/0x164 blk_finish_plug() read_pages+0x38c/0x408 page_cache_ra_unbounded+0x230/0x2f8 do_sync_mmap_readahead+0x1a4/0x208 filemap_fault+0x27c/0x8f4 f2fs_filemap_fault+0x28/0xfc __do_fault+0xc4/0x208 handle_pte_fault+0x290/0xe04 do_handle_mm_fault+0x52c/0x858 do_page_fault+0x5dc/0x798 do_translation_fault+0x40/0x54 do_mem_abort+0x60/0x134 el0_da+0x40/0xb8 el0t_64_sync_handler+0xc4/0xe4 el0t_64_sync+0x1b4/0x1b8 Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/ufs/core/ufshcd.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)