diff mbox series

[01/27] habanalabs/gaudi2: increase user interrupt grace time

Message ID 20230212204454.2938561-1-ogabbay@kernel.org (mailing list archive)
State New, archived
Headers show
Series [01/27] habanalabs/gaudi2: increase user interrupt grace time | expand

Commit Message

Oded Gabbay Feb. 12, 2023, 8:44 p.m. UTC
From: Ofir Bitton <obitton@habana.ai>

Currently we support scenarios where a timestamp registration request
of a certain offset is received during the interrupt handling of the
same offset. In this case we give a grace period of up to 100us for
the interrupt handler to finish.
It seems that sometimes the interrupt handling takes more than expected,
and therefore this path should be optimized. Until that happens, let's
increase the grace period in order not to reach timeout which will
cause user call to be rejected.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/accel/habanalabs/common/command_submission.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Stanislaw Gruszka Feb. 16, 2023, 10:53 a.m. UTC | #1
On Sun, Feb 12, 2023 at 10:44:28PM +0200, Oded Gabbay wrote:
> @@ -3178,11 +3181,12 @@ static int ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf,
>  
>  			/* irq handling in the middle give it time to finish */
>  			spin_unlock_irqrestore(wait_list_lock, flags);
> -			usleep_range(1, 10);
> +			usleep_range(100, 1000);
>  			if (++iter_counter == MAX_TS_ITER_NUM) {
>  				dev_err(buf->mmg->dev,
> -					"handling registration interrupt took too long!!\n");
> -				return -EINVAL;
> +					"Timestamp offest processing reached timeout of %lld ms\n",

Typo in offest, you can use './scripts/checkpatch.pl --codespell'
to coughs some of those.

Regards
Stanislaw
Oded Gabbay Feb. 16, 2023, 2:24 p.m. UTC | #2
On Thu, Feb 16, 2023 at 12:53 PM Stanislaw Gruszka
<stanislaw.gruszka@linux.intel.com> wrote:
>
> On Sun, Feb 12, 2023 at 10:44:28PM +0200, Oded Gabbay wrote:
> > @@ -3178,11 +3181,12 @@ static int ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf,
> >
> >                       /* irq handling in the middle give it time to finish */
> >                       spin_unlock_irqrestore(wait_list_lock, flags);
> > -                     usleep_range(1, 10);
> > +                     usleep_range(100, 1000);
> >                       if (++iter_counter == MAX_TS_ITER_NUM) {
> >                               dev_err(buf->mmg->dev,
> > -                                     "handling registration interrupt took too long!!\n");
> > -                             return -EINVAL;
> > +                                     "Timestamp offest processing reached timeout of %lld ms\n",
>
> Typo in offest, you can use './scripts/checkpatch.pl --codespell'
> to coughs some of those.
Thanks, I'll fix it in the patch in the tree.

@Ofir Bitton Can you please take care adding the codespell to our CI ?
Thanks,
Oded

>
> Regards
> Stanislaw
>
Ofir Bitton Feb. 18, 2023, 7:13 p.m. UTC | #3
On Thu, Feb 16, 2023 at 16:24 PM, Oded Gabbay wrote:
> 
> On Thu, Feb 16, 2023 at 12:53 PM Stanislaw Gruszka
> <stanislaw.gruszka@linux.intel.com> wrote:
> >
> > On Sun, Feb 12, 2023 at 10:44:28PM +0200, Oded Gabbay wrote:
> > > @@ -3178,11 +3181,12 @@ static int
> > > ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf,
> > >
> > >                       /* irq handling in the middle give it time to finish */
> > >                       spin_unlock_irqrestore(wait_list_lock, flags);
> > > -                     usleep_range(1, 10);
> > > +                     usleep_range(100, 1000);
> > >                       if (++iter_counter == MAX_TS_ITER_NUM) {
> > >                               dev_err(buf->mmg->dev,
> > > -                                     "handling registration interrupt took too long!!\n");
> > > -                             return -EINVAL;
> > > +                                     "Timestamp offest processing
> > > + reached timeout of %lld ms\n",
> >
> > Typo in offest, you can use './scripts/checkpatch.pl --codespell'
> > to coughs some of those.
> Thanks, I'll fix it in the patch in the tree.
> 
> @Ofir Bitton Can you please take care adding the codespell to our CI ?
> Thanks,
> Oded

I will make sure we run with this option from now on.
Thanks,
Ofir

> 
> >
> > Regards
> > Stanislaw
> >
Stanislaw Gruszka Feb. 20, 2023, 3:31 p.m. UTC | #4
On Sun, Feb 12, 2023 at 10:44:28PM +0200, Oded Gabbay wrote:
> From: Ofir Bitton <obitton@habana.ai>
> 
> Currently we support scenarios where a timestamp registration request
> of a certain offset is received during the interrupt handling of the
> same offset. In this case we give a grace period of up to 100us for
> the interrupt handler to finish.
> It seems that sometimes the interrupt handling takes more than expected,
> and therefore this path should be optimized. Until that happens, let's
> increase the grace period in order not to reach timeout which will
> cause user call to be rejected.
> 
> Signed-off-by: Ofir Bitton <obitton@habana.ai>
> Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> for the whole series.
diff mbox series

Patch

diff --git a/drivers/accel/habanalabs/common/command_submission.c b/drivers/accel/habanalabs/common/command_submission.c
index 8270db0a72a2..e313ff8af7cc 100644
--- a/drivers/accel/habanalabs/common/command_submission.c
+++ b/drivers/accel/habanalabs/common/command_submission.c
@@ -17,7 +17,7 @@ 
 			HL_CS_FLAGS_FLUSH_PCI_HBW_WRITES)
 
 
-#define MAX_TS_ITER_NUM 10
+#define MAX_TS_ITER_NUM 100
 
 /**
  * enum hl_cs_wait_status - cs wait status
@@ -3145,6 +3145,7 @@  static int ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf,
 			(ts_buff->kernel_buff_size / sizeof(struct hl_user_pending_interrupt));
 	unsigned long flags, iter_counter = 0;
 	u64 current_cq_counter;
+	ktime_t timestamp;
 
 	/* Validate ts_offset not exceeding last max */
 	if (requested_offset_record >= cb_last) {
@@ -3153,6 +3154,8 @@  static int ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf,
 		return -EINVAL;
 	}
 
+	timestamp = ktime_get();
+
 start_over:
 	spin_lock_irqsave(wait_list_lock, flags);
 
@@ -3178,11 +3181,12 @@  static int ts_buff_get_kernel_ts_record(struct hl_mmap_mem_buf *buf,
 
 			/* irq handling in the middle give it time to finish */
 			spin_unlock_irqrestore(wait_list_lock, flags);
-			usleep_range(1, 10);
+			usleep_range(100, 1000);
 			if (++iter_counter == MAX_TS_ITER_NUM) {
 				dev_err(buf->mmg->dev,
-					"handling registration interrupt took too long!!\n");
-				return -EINVAL;
+					"Timestamp offest processing reached timeout of %lld ms\n",
+					ktime_ms_delta(ktime_get(), timestamp));
+				return -EAGAIN;
 			}
 
 			goto start_over;