Message ID | 20230212204454.2938561-1-ogabbay@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/27] habanalabs/gaudi2: increase user interrupt grace time | expand |
On Sun, Feb 12, 2023 at 10:44:28PM +0200, Oded Gabbay wrote: > @@ -3178,11 +3181,12 @@ static int ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf, > > /* irq handling in the middle give it time to finish */ > spin_unlock_irqrestore(wait_list_lock, flags); > - usleep_range(1, 10); > + usleep_range(100, 1000); > if (++iter_counter == MAX_TS_ITER_NUM) { > dev_err(buf->mmg->dev, > - "handling registration interrupt took too long!!\n"); > - return -EINVAL; > + "Timestamp offest processing reached timeout of %lld ms\n", Typo in offest, you can use './scripts/checkpatch.pl --codespell' to coughs some of those. Regards Stanislaw
On Thu, Feb 16, 2023 at 12:53 PM Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> wrote: > > On Sun, Feb 12, 2023 at 10:44:28PM +0200, Oded Gabbay wrote: > > @@ -3178,11 +3181,12 @@ static int ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf, > > > > /* irq handling in the middle give it time to finish */ > > spin_unlock_irqrestore(wait_list_lock, flags); > > - usleep_range(1, 10); > > + usleep_range(100, 1000); > > if (++iter_counter == MAX_TS_ITER_NUM) { > > dev_err(buf->mmg->dev, > > - "handling registration interrupt took too long!!\n"); > > - return -EINVAL; > > + "Timestamp offest processing reached timeout of %lld ms\n", > > Typo in offest, you can use './scripts/checkpatch.pl --codespell' > to coughs some of those. Thanks, I'll fix it in the patch in the tree. @Ofir Bitton Can you please take care adding the codespell to our CI ? Thanks, Oded > > Regards > Stanislaw >
On Thu, Feb 16, 2023 at 16:24 PM, Oded Gabbay wrote: > > On Thu, Feb 16, 2023 at 12:53 PM Stanislaw Gruszka > <stanislaw.gruszka@linux.intel.com> wrote: > > > > On Sun, Feb 12, 2023 at 10:44:28PM +0200, Oded Gabbay wrote: > > > @@ -3178,11 +3181,12 @@ static int > > > ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf, > > > > > > /* irq handling in the middle give it time to finish */ > > > spin_unlock_irqrestore(wait_list_lock, flags); > > > - usleep_range(1, 10); > > > + usleep_range(100, 1000); > > > if (++iter_counter == MAX_TS_ITER_NUM) { > > > dev_err(buf->mmg->dev, > > > - "handling registration interrupt took too long!!\n"); > > > - return -EINVAL; > > > + "Timestamp offest processing > > > + reached timeout of %lld ms\n", > > > > Typo in offest, you can use './scripts/checkpatch.pl --codespell' > > to coughs some of those. > Thanks, I'll fix it in the patch in the tree. > > @Ofir Bitton Can you please take care adding the codespell to our CI ? > Thanks, > Oded I will make sure we run with this option from now on. Thanks, Ofir > > > > > Regards > > Stanislaw > >
On Sun, Feb 12, 2023 at 10:44:28PM +0200, Oded Gabbay wrote: > From: Ofir Bitton <obitton@habana.ai> > > Currently we support scenarios where a timestamp registration request > of a certain offset is received during the interrupt handling of the > same offset. In this case we give a grace period of up to 100us for > the interrupt handler to finish. > It seems that sometimes the interrupt handling takes more than expected, > and therefore this path should be optimized. Until that happens, let's > increase the grace period in order not to reach timeout which will > cause user call to be rejected. > > Signed-off-by: Ofir Bitton <obitton@habana.ai> > Reviewed-by: Oded Gabbay <ogabbay@kernel.org> > Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> for the whole series.
diff --git a/drivers/accel/habanalabs/common/command_submission.c b/drivers/accel/habanalabs/common/command_submission.c index 8270db0a72a2..e313ff8af7cc 100644 --- a/drivers/accel/habanalabs/common/command_submission.c +++ b/drivers/accel/habanalabs/common/command_submission.c @@ -17,7 +17,7 @@ HL_CS_FLAGS_FLUSH_PCI_HBW_WRITES) -#define MAX_TS_ITER_NUM 10 +#define MAX_TS_ITER_NUM 100 /** * enum hl_cs_wait_status - cs wait status @@ -3145,6 +3145,7 @@ static int ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf, (ts_buff->kernel_buff_size / sizeof(struct hl_user_pending_interrupt)); unsigned long flags, iter_counter = 0; u64 current_cq_counter; + ktime_t timestamp; /* Validate ts_offset not exceeding last max */ if (requested_offset_record >= cb_last) { @@ -3153,6 +3154,8 @@ static int ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf, return -EINVAL; } + timestamp = ktime_get(); + start_over: spin_lock_irqsave(wait_list_lock, flags); @@ -3178,11 +3181,12 @@ static int ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf, /* irq handling in the middle give it time to finish */ spin_unlock_irqrestore(wait_list_lock, flags); - usleep_range(1, 10); + usleep_range(100, 1000); if (++iter_counter == MAX_TS_ITER_NUM) { dev_err(buf->mmg->dev, - "handling registration interrupt took too long!!\n"); - return -EINVAL; + "Timestamp offest processing reached timeout of %lld ms\n", + ktime_ms_delta(ktime_get(), timestamp)); + return -EAGAIN; } goto start_over;