Message ID | 20241111140244.13474-2-quic_jseerapu@quicinc.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add Block event interrupt support for I2C protocol | expand |
Ping, Vinod :-) Andi On Mon, Nov 11, 2024 at 07:32:42PM +0530, Jyothi Kumar Seerapu wrote: > GSI hardware generates an interrupt for each transfer completion. > For multiple messages within a single transfer, this results > in receiving N interrupts for N messages, which can introduce > significant software interrupt latency. To mitigate this latency, > utilize Block Event Interrupt (BEI) only when an interrupt is necessary. > When using BEI, consider splitting a single multi-message transfer into > chunks of 8. This approach can enhance overall transfer time and > efficiency. > > Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com> > --- > > v1 -> v2: > - Changed dma_addr type from array of pointers to array. > - To support BEI functionality with the TRE size of 64 defined in GPI driver, > updated QCOM_GPI_MAX_NUM_MSGS to 16 and NUM_MSGS_PER_IRQ to 8. > > drivers/dma/qcom/gpi.c | 49 ++++++++++++++++++++++++++++++++ > include/linux/dma/qcom-gpi-dma.h | 37 ++++++++++++++++++++++++ > 2 files changed, 86 insertions(+) > > diff --git a/drivers/dma/qcom/gpi.c b/drivers/dma/qcom/gpi.c > index 52a7c8f2498f..a98de3178764 100644 > --- a/drivers/dma/qcom/gpi.c > +++ b/drivers/dma/qcom/gpi.c > @@ -1693,6 +1693,9 @@ static int gpi_create_i2c_tre(struct gchan *chan, struct gpi_desc *desc, > > tre->dword[3] = u32_encode_bits(TRE_TYPE_DMA, TRE_FLAGS_TYPE); > tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_IEOT); > + > + if (i2c->flags & QCOM_GPI_BLOCK_EVENT_IRQ) > + tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_BEI); > } > > for (i = 0; i < tre_idx; i++) > @@ -2098,6 +2101,52 @@ static int gpi_find_avail_gpii(struct gpi_dev *gpi_dev, u32 seid) > return -EIO; > } > > +/** > + * gpi_multi_desc_process() - Process received transfers from GSI HW > + * @dev: pointer to the corresponding dev node > + * @multi_xfer: pointer to the gpi_multi_xfer > + * @num_xfers: total number of transfers > + * @transfer_timeout_msecs: transfer timeout value > + * @transfer_comp: completion object of the transfer > + * > + * This function is used to process the received transfers based on the > + * completion events > + * > + * Return: On success returns 0, otherwise return error code > + */ > +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer, > + u32 num_xfers, u32 transfer_timeout_msecs, > + struct completion *transfer_comp) > +{ > + int i; > + u32 max_irq_cnt, time_left; > + > + max_irq_cnt = num_xfers / NUM_MSGS_PER_IRQ; > + if (num_xfers % NUM_MSGS_PER_IRQ) > + max_irq_cnt++; > + > + /* > + * Wait for the interrupts of the processed transfers in multiple > + * of 64 and for the last transfer. If the hardware is fast and > + * already processed all the transfers then no need to wait. > + */ > + for (i = 0; i < max_irq_cnt; i++) { > + reinit_completion(transfer_comp); > + if (max_irq_cnt != multi_xfer->irq_cnt) { > + time_left = wait_for_completion_timeout(transfer_comp, > + transfer_timeout_msecs); > + if (!time_left) { > + dev_err(dev, "%s: Transfer timeout\n", __func__); > + return -ETIMEDOUT; > + } > + } > + if (num_xfers > multi_xfer->msg_idx_cnt) > + return 0; > + } > + return 0; > +} > +EXPORT_SYMBOL_GPL(gpi_multi_desc_process); > + > /* gpi_of_dma_xlate: open client requested channel */ > static struct dma_chan *gpi_of_dma_xlate(struct of_phandle_args *args, > struct of_dma *of_dma) > diff --git a/include/linux/dma/qcom-gpi-dma.h b/include/linux/dma/qcom-gpi-dma.h > index 6680dd1a43c6..1341ff0db808 100644 > --- a/include/linux/dma/qcom-gpi-dma.h > +++ b/include/linux/dma/qcom-gpi-dma.h > @@ -15,6 +15,12 @@ enum spi_transfer_cmd { > SPI_DUPLEX, > }; > > +#define QCOM_GPI_BLOCK_EVENT_IRQ BIT(0) > + > +#define QCOM_GPI_MAX_NUM_MSGS 16 > +#define NUM_MSGS_PER_IRQ 8 > +#define MIN_NUM_OF_MSGS_MULTI_DESC 4 > + > /** > * struct gpi_spi_config - spi config for peripheral > * > @@ -51,6 +57,29 @@ enum i2c_op { > I2C_READ, > }; > > +/** > + * struct gpi_multi_xfer - Used for multi transfer support > + * > + * @msg_idx_cnt: message index for the transfer > + * @buf_idx: dma buffer index > + * @unmap_msg_cnt: unampped transfer index > + * @freed_msg_cnt: freed transfer index > + * @irq_cnt: received interrupt count > + * @irq_msg_cnt: transfer message count for the received irqs > + * @dma_buf: virtual address of the buffer > + * @dma_addr: dma address of the buffer > + */ > +struct gpi_multi_xfer { > + u32 msg_idx_cnt; > + u32 buf_idx; > + u32 unmap_msg_cnt; > + u32 freed_msg_cnt; > + u32 irq_cnt; > + u32 irq_msg_cnt; > + void *dma_buf[QCOM_GPI_MAX_NUM_MSGS]; > + dma_addr_t dma_addr[QCOM_GPI_MAX_NUM_MSGS]; > +}; > + > /** > * struct gpi_i2c_config - i2c config for peripheral > * > @@ -65,6 +94,8 @@ enum i2c_op { > * @rx_len: receive length for buffer > * @op: i2c cmd > * @muli-msg: is part of multi i2c r-w msgs > + * @flags: true for block event interrupt support > + * @multi_xfer: indicates transfer has multi messages > */ > struct gpi_i2c_config { > u8 set_config; > @@ -78,6 +109,12 @@ struct gpi_i2c_config { > u32 rx_len; > enum i2c_op op; > bool multi_msg; > + u8 flags; > + struct gpi_multi_xfer multi_xfer; > }; > > +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer, > + u32 num_xfers, u32 tranfer_timeout_msecs, > + struct completion *transfer_comp); > + > #endif /* QCOM_GPI_DMA_H */ > -- > 2.17.1 >
On Mon, Nov 11, 2024 at 07:32:42PM +0530, Jyothi Kumar Seerapu wrote: > GSI hardware generates an interrupt for each transfer completion. > For multiple messages within a single transfer, this results > in receiving N interrupts for N messages, which can introduce > significant software interrupt latency. Here's an excellent opportunity for splitting your problem description and solution description in two easy to read paragraphs by adding some newlines. > To mitigate this latency, > utilize Block Event Interrupt (BEI) only when an interrupt is necessary. > When using BEI, consider splitting a single multi-message transfer into > chunks of 8. This approach can enhance overall transfer time and The reason for the number 8 must be documented. "This approach..." wouldn't hurt from having it's own paragraph once again. > efficiency. > > Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com> > --- > > v1 -> v2: > - Changed dma_addr type from array of pointers to array. > - To support BEI functionality with the TRE size of 64 defined in GPI driver, > updated QCOM_GPI_MAX_NUM_MSGS to 16 and NUM_MSGS_PER_IRQ to 8. > > drivers/dma/qcom/gpi.c | 49 ++++++++++++++++++++++++++++++++ > include/linux/dma/qcom-gpi-dma.h | 37 ++++++++++++++++++++++++ > 2 files changed, 86 insertions(+) > > diff --git a/drivers/dma/qcom/gpi.c b/drivers/dma/qcom/gpi.c > index 52a7c8f2498f..a98de3178764 100644 > --- a/drivers/dma/qcom/gpi.c > +++ b/drivers/dma/qcom/gpi.c > @@ -1693,6 +1693,9 @@ static int gpi_create_i2c_tre(struct gchan *chan, struct gpi_desc *desc, > > tre->dword[3] = u32_encode_bits(TRE_TYPE_DMA, TRE_FLAGS_TYPE); > tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_IEOT); > + > + if (i2c->flags & QCOM_GPI_BLOCK_EVENT_IRQ) > + tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_BEI); > } > > for (i = 0; i < tre_idx; i++) > @@ -2098,6 +2101,52 @@ static int gpi_find_avail_gpii(struct gpi_dev *gpi_dev, u32 seid) > return -EIO; > } > > +/** > + * gpi_multi_desc_process() - Process received transfers from GSI HW > + * @dev: pointer to the corresponding dev node > + * @multi_xfer: pointer to the gpi_multi_xfer > + * @num_xfers: total number of transfers > + * @transfer_timeout_msecs: transfer timeout value > + * @transfer_comp: completion object of the transfer > + * > + * This function is used to process the received transfers based on the > + * completion events As far as I can tell it doesn't "process" anything. All it does is reinit the completion (n + 7) / 8 times, and for the first n / 8 iterations it will wait for an externally defined completion. Why is this function even defined here, it solely operates on parameters coming from the I2C driver? > + * > + * Return: On success returns 0, otherwise return error code > + */ > +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer, > + u32 num_xfers, u32 transfer_timeout_msecs, > + struct completion *transfer_comp) > +{ > + int i; > + u32 max_irq_cnt, time_left; > + > + max_irq_cnt = num_xfers / NUM_MSGS_PER_IRQ; > + if (num_xfers % NUM_MSGS_PER_IRQ) > + max_irq_cnt++; > + > + /* > + * Wait for the interrupts of the processed transfers in multiple > + * of 64 and for the last transfer. If the hardware is fast and I'm confused, where does this 64 come from? > + * already processed all the transfers then no need to wait. > + */ > + for (i = 0; i < max_irq_cnt; i++) { > + reinit_completion(transfer_comp); I'm trying to convince myself that this isn't racey, but the split ownership of updating and checking multi_xfer->irq_cnt between the GPI and I2C drivers is just too hard for me to follow. > + if (max_irq_cnt != multi_xfer->irq_cnt) { > + time_left = wait_for_completion_timeout(transfer_comp, > + transfer_timeout_msecs); > + if (!time_left) { > + dev_err(dev, "%s: Transfer timeout\n", __func__); > + return -ETIMEDOUT; > + } > + } > + if (num_xfers > multi_xfer->msg_idx_cnt) > + return 0; > + } > + return 0; > +} > +EXPORT_SYMBOL_GPL(gpi_multi_desc_process); The dmaengine framework is expected to provide an abstraction between clients and DMA engines, so this doesn't look right. > + > /* gpi_of_dma_xlate: open client requested channel */ > static struct dma_chan *gpi_of_dma_xlate(struct of_phandle_args *args, > struct of_dma *of_dma) > diff --git a/include/linux/dma/qcom-gpi-dma.h b/include/linux/dma/qcom-gpi-dma.h > index 6680dd1a43c6..1341ff0db808 100644 > --- a/include/linux/dma/qcom-gpi-dma.h > +++ b/include/linux/dma/qcom-gpi-dma.h > @@ -15,6 +15,12 @@ enum spi_transfer_cmd { > SPI_DUPLEX, > }; > > +#define QCOM_GPI_BLOCK_EVENT_IRQ BIT(0) > + > +#define QCOM_GPI_MAX_NUM_MSGS 16 > +#define NUM_MSGS_PER_IRQ 8 > +#define MIN_NUM_OF_MSGS_MULTI_DESC 4 Prefixing these QCOM_GPI_ seems like an excellent idea. Still puzzled about the numbers 8 and 4 though, are they universal for all variants of GPI or are they just arbitrary numbers picked by experimentation? > + > /** > * struct gpi_spi_config - spi config for peripheral > * > @@ -51,6 +57,29 @@ enum i2c_op { > I2C_READ, > }; > > +/** > + * struct gpi_multi_xfer - Used for multi transfer support > + * > + * @msg_idx_cnt: message index for the transfer > + * @buf_idx: dma buffer index > + * @unmap_msg_cnt: unampped transfer index s/unampped/unmapped > + * @freed_msg_cnt: freed transfer index > + * @irq_cnt: received interrupt count > + * @irq_msg_cnt: transfer message count for the received irqs > + * @dma_buf: virtual address of the buffer > + * @dma_addr: dma address of the buffer "the buffer"? There's up to 16 of them... As mentioned above, I'm skeptical about this custom API - but if we were to go this route, the exact responsibilities and semantics should be documented. Regards, Bjorn > + */ > +struct gpi_multi_xfer { > + u32 msg_idx_cnt; > + u32 buf_idx; > + u32 unmap_msg_cnt; > + u32 freed_msg_cnt; > + u32 irq_cnt; > + u32 irq_msg_cnt; > + void *dma_buf[QCOM_GPI_MAX_NUM_MSGS]; > + dma_addr_t dma_addr[QCOM_GPI_MAX_NUM_MSGS]; > +}; > + > /** > * struct gpi_i2c_config - i2c config for peripheral > * > @@ -65,6 +94,8 @@ enum i2c_op { > * @rx_len: receive length for buffer > * @op: i2c cmd > * @muli-msg: is part of multi i2c r-w msgs > + * @flags: true for block event interrupt support > + * @multi_xfer: indicates transfer has multi messages > */ > struct gpi_i2c_config { > u8 set_config; > @@ -78,6 +109,12 @@ struct gpi_i2c_config { > u32 rx_len; > enum i2c_op op; > bool multi_msg; > + u8 flags; > + struct gpi_multi_xfer multi_xfer; > }; > > +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer, > + u32 num_xfers, u32 tranfer_timeout_msecs, > + struct completion *transfer_comp); > + > #endif /* QCOM_GPI_DMA_H */ > -- > 2.17.1 > >
diff --git a/drivers/dma/qcom/gpi.c b/drivers/dma/qcom/gpi.c index 52a7c8f2498f..a98de3178764 100644 --- a/drivers/dma/qcom/gpi.c +++ b/drivers/dma/qcom/gpi.c @@ -1693,6 +1693,9 @@ static int gpi_create_i2c_tre(struct gchan *chan, struct gpi_desc *desc, tre->dword[3] = u32_encode_bits(TRE_TYPE_DMA, TRE_FLAGS_TYPE); tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_IEOT); + + if (i2c->flags & QCOM_GPI_BLOCK_EVENT_IRQ) + tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_BEI); } for (i = 0; i < tre_idx; i++) @@ -2098,6 +2101,52 @@ static int gpi_find_avail_gpii(struct gpi_dev *gpi_dev, u32 seid) return -EIO; } +/** + * gpi_multi_desc_process() - Process received transfers from GSI HW + * @dev: pointer to the corresponding dev node + * @multi_xfer: pointer to the gpi_multi_xfer + * @num_xfers: total number of transfers + * @transfer_timeout_msecs: transfer timeout value + * @transfer_comp: completion object of the transfer + * + * This function is used to process the received transfers based on the + * completion events + * + * Return: On success returns 0, otherwise return error code + */ +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer, + u32 num_xfers, u32 transfer_timeout_msecs, + struct completion *transfer_comp) +{ + int i; + u32 max_irq_cnt, time_left; + + max_irq_cnt = num_xfers / NUM_MSGS_PER_IRQ; + if (num_xfers % NUM_MSGS_PER_IRQ) + max_irq_cnt++; + + /* + * Wait for the interrupts of the processed transfers in multiple + * of 64 and for the last transfer. If the hardware is fast and + * already processed all the transfers then no need to wait. + */ + for (i = 0; i < max_irq_cnt; i++) { + reinit_completion(transfer_comp); + if (max_irq_cnt != multi_xfer->irq_cnt) { + time_left = wait_for_completion_timeout(transfer_comp, + transfer_timeout_msecs); + if (!time_left) { + dev_err(dev, "%s: Transfer timeout\n", __func__); + return -ETIMEDOUT; + } + } + if (num_xfers > multi_xfer->msg_idx_cnt) + return 0; + } + return 0; +} +EXPORT_SYMBOL_GPL(gpi_multi_desc_process); + /* gpi_of_dma_xlate: open client requested channel */ static struct dma_chan *gpi_of_dma_xlate(struct of_phandle_args *args, struct of_dma *of_dma) diff --git a/include/linux/dma/qcom-gpi-dma.h b/include/linux/dma/qcom-gpi-dma.h index 6680dd1a43c6..1341ff0db808 100644 --- a/include/linux/dma/qcom-gpi-dma.h +++ b/include/linux/dma/qcom-gpi-dma.h @@ -15,6 +15,12 @@ enum spi_transfer_cmd { SPI_DUPLEX, }; +#define QCOM_GPI_BLOCK_EVENT_IRQ BIT(0) + +#define QCOM_GPI_MAX_NUM_MSGS 16 +#define NUM_MSGS_PER_IRQ 8 +#define MIN_NUM_OF_MSGS_MULTI_DESC 4 + /** * struct gpi_spi_config - spi config for peripheral * @@ -51,6 +57,29 @@ enum i2c_op { I2C_READ, }; +/** + * struct gpi_multi_xfer - Used for multi transfer support + * + * @msg_idx_cnt: message index for the transfer + * @buf_idx: dma buffer index + * @unmap_msg_cnt: unampped transfer index + * @freed_msg_cnt: freed transfer index + * @irq_cnt: received interrupt count + * @irq_msg_cnt: transfer message count for the received irqs + * @dma_buf: virtual address of the buffer + * @dma_addr: dma address of the buffer + */ +struct gpi_multi_xfer { + u32 msg_idx_cnt; + u32 buf_idx; + u32 unmap_msg_cnt; + u32 freed_msg_cnt; + u32 irq_cnt; + u32 irq_msg_cnt; + void *dma_buf[QCOM_GPI_MAX_NUM_MSGS]; + dma_addr_t dma_addr[QCOM_GPI_MAX_NUM_MSGS]; +}; + /** * struct gpi_i2c_config - i2c config for peripheral * @@ -65,6 +94,8 @@ enum i2c_op { * @rx_len: receive length for buffer * @op: i2c cmd * @muli-msg: is part of multi i2c r-w msgs + * @flags: true for block event interrupt support + * @multi_xfer: indicates transfer has multi messages */ struct gpi_i2c_config { u8 set_config; @@ -78,6 +109,12 @@ struct gpi_i2c_config { u32 rx_len; enum i2c_op op; bool multi_msg; + u8 flags; + struct gpi_multi_xfer multi_xfer; }; +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer, + u32 num_xfers, u32 tranfer_timeout_msecs, + struct completion *transfer_comp); + #endif /* QCOM_GPI_DMA_H */
GSI hardware generates an interrupt for each transfer completion. For multiple messages within a single transfer, this results in receiving N interrupts for N messages, which can introduce significant software interrupt latency. To mitigate this latency, utilize Block Event Interrupt (BEI) only when an interrupt is necessary. When using BEI, consider splitting a single multi-message transfer into chunks of 8. This approach can enhance overall transfer time and efficiency. Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com> --- v1 -> v2: - Changed dma_addr type from array of pointers to array. - To support BEI functionality with the TRE size of 64 defined in GPI driver, updated QCOM_GPI_MAX_NUM_MSGS to 16 and NUM_MSGS_PER_IRQ to 8. drivers/dma/qcom/gpi.c | 49 ++++++++++++++++++++++++++++++++ include/linux/dma/qcom-gpi-dma.h | 37 ++++++++++++++++++++++++ 2 files changed, 86 insertions(+)