Message ID | 1426741501.10003.6.camel@midgaarde (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Mar 19, 2015 at 01:05:01AM -0400, Greg Knight wrote:
> What is the process for getting this upstreamed?
Please submit the patch following the process in
Documentation/SubmittingPatches.
On 03/19/2015 07:05 AM, Greg Knight wrote: > Hi, Mark, > > I've attached a patch which adds a device-tree field "ti,dma-min-bytes" > which replaces the macro DMA_MIN_BYTES. Adjusting this field addresses > issues we've had where, in our particular use case, the usleep() in the > SPI worker thread eats a full 20% of our CPU (AM3359). > > I opted to implement it as a device-tree parameter and keep the original > value (160) as the default, in order to avoid impacting anyone else. > > The patch is attached. Patches 1-2 are an unrelated McASP change (see my > other message). > > What is the process for getting this upstreamed? Please follow the guidelines in Documentation/SubmittingPatches. Patches as attachments are not preferred since it makes replying/commenting on the patches hard. Strictly speaking the dma-min-bytes should not be in DT, it is a software parameter for the Linux SPI driver implementation. Also, when changing DT bindings, please update the documentation as well (and CC the relevant lists with that). This threshold of 160 bytes in the omap2-mcspi driver is artificial anyways it is changed from 8 to 160 by this commit: 8b66c13474e16 spi/omap2_mcspi: change default DMA_MIN_BYTES value to 160 It has been changed because of wl1271, but I'm not sure if banging bytes over the bus when the transfer is less then 160bytes is that great thing. I would guess that the sweet spot is at around the low tens. But if it is really like this that different devices perform better with different threshold for choosing between PIO or DMA transfer then this setting should come from the slave device and should only affect the transfer setup when communicating with that device. Probably adding a parameter (optional) to spi_device struct, so drivers can pass dma_over_poi_threshold? If it is not set, than just use whatever is the default. But I don't think this setting should be in the DT.
Will refer to that documentation and update the SPI docs before resubmitting. Re; Threshold of 160 is artificial: Believe me, I am more than aware of this. SPI runs in any speed from low kHz to multi MHz. The only reason I can fathom for having such a high DMA_MIN_BYTES is to facilitate high-speed low-volume communication (eg read one byte at a time from userspace without buffering.) The reason I'm looking at this at all is because we're doing low-speed low-volume communication, for which the burn in PIO mode causes severe performance degradation. Internally we'd changed it to 20, but I might try 8. I originally tried 0, but observed poor behavior for our use cases. DMA_MIN_BYTES at 8 would be sensible for our application, but at 160 it is not. The current solution is for everybody who needs to change their device settings to churn that macro, just as 8b66c1474e16 did. Changing that values incurs significant risk of excess CPU load (if increased) or timing slop (if decreased) to all hardware using the McSPI. Moving the param to DT allows those of us working on custom boards to modify the value for some hardware without risking the entire ecosystem. Let's please not keep a bad solution just because no perfect solution exists... I think the proper location for this patch might actually be in the spidev nodes in DT, rather than at the mcspi level - but I don't understand why this does not belong in DT. DT is, after all, where one would normally describe the rest of the slave device bindings. A sensible value for DMA_MIN_BYTES requires the user to know the CPU clock speed alongside the SPI bus speed, and estimate acceptable levels of slop in timing. I don't think userspace should need to do these computations to avoid excess CPU load; could do it in kernel space, or leave it up to DT or kernel parameters. How about moving the speed to the spidev DT nodes? Regards, Greg On Thu, Mar 19, 2015 at 8:34 AM, Peter Ujfalusi <peter.ujfalusi@ti.com> wrote: > > On 03/19/2015 07:05 AM, Greg Knight wrote: > > Hi, Mark, > > > > I've attached a patch which adds a device-tree field "ti,dma-min-bytes" > > which replaces the macro DMA_MIN_BYTES. Adjusting this field addresses > > issues we've had where, in our particular use case, the usleep() in the > > SPI worker thread eats a full 20% of our CPU (AM3359). > > > > I opted to implement it as a device-tree parameter and keep the original > > value (160) as the default, in order to avoid impacting anyone else. > > > > The patch is attached. Patches 1-2 are an unrelated McASP change (see my > > other message). > > > > What is the process for getting this upstreamed? > > Please follow the guidelines in Documentation/SubmittingPatches. Patches as > attachments are not preferred since it makes replying/commenting on the > patches hard. > > Strictly speaking the dma-min-bytes should not be in DT, it is a software > parameter for the Linux SPI driver implementation. > Also, when changing DT bindings, please update the documentation as well (and > CC the relevant lists with that). > > This threshold of 160 bytes in the omap2-mcspi driver is artificial anyways it > is changed from 8 to 160 by this commit: > 8b66c13474e16 spi/omap2_mcspi: change default DMA_MIN_BYTES value to 160 > > It has been changed because of wl1271, but I'm not sure if banging bytes over > the bus when the transfer is less then 160bytes is that great thing. I would > guess that the sweet spot is at around the low tens. > > But if it is really like this that different devices perform better with > different threshold for choosing between PIO or DMA transfer then this setting > should come from the slave device and should only affect the transfer setup > when communicating with that device. > > Probably adding a parameter (optional) to spi_device struct, so drivers can > pass dma_over_poi_threshold? > If it is not set, than just use whatever is the default. > > But I don't think this setting should be in the DT. > > -- > Péter -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 19, 2015 at 09:16:31AM -0400, Greg Knight wrote:
> Will refer to that documentation and update the SPI docs before resubmitting.
Please don't top post on kernel lists, reply in line (deleting any
unneeded context) so people have context for what you are talking about.
This makes discussions much easier to follow.
On 03/19/2015 03:16 PM, Greg Knight wrote: > Will refer to that documentation and update the SPI docs before resubmitting. > > Re; Threshold of 160 is artificial: Believe me, I am more than aware > of this. SPI runs in any speed from low kHz to multi MHz. The only > reason I can fathom for having such a high DMA_MIN_BYTES is to > facilitate high-speed low-volume communication (eg read one byte at a > time from userspace without buffering.) The reason I'm looking at this > at all is because we're doing low-speed low-volume communication, for > which the burn in PIO mode causes severe performance degradation. > Internally we'd changed it to 20, but I might try 8. I originally > tried 0, but observed poor behavior for our use cases. DMA_MIN_BYTES > at 8 would be sensible for our application, but at 160 it is not. > > How about moving the speed to the spidev DT nodes? The issue is that: "The device tree is a data structure for describing hardware" DMA_MIN_BYTES is a software parameter. To be specific, it is Linux software parameter applicable only to spi-omap2-mcspi.c driver. I think the best thing we could do is to calculate the DMA_MIN_BYTES in the driver based on the SPI speed. Something which will give ~160 in the speed in which the wl driver is used and something which works best in your setup in the speed you are using the SPI bus. The best way is to give some msec as a limit. If the transfer would take more then X msec on the bus to be transferred, we will use DMA, if it is less than that we fall back to PIO. Yes, the CPU speed is not taken into account, but IMHO the bus speed is more important. What do you think? Will this work for you?
> I think the best thing we could do is to calculate the DMA_MIN_BYTES in the > driver based on the SPI speed. Something which will give ~160 in the speed in > which the wl driver is used and something which works best in your setup in > the speed you are using the SPI bus. > The best way is to give some msec as a limit. If the transfer would take more > then X msec on the bus to be transferred, we will use DMA, if it is less than > that we fall back to PIO. > Yes, the CPU speed is not taken into account, but IMHO the bus speed is more > Changing DMA_MIN_BYTES to, say, "dma_min_time_ms" sounds reasonable to me. I don't know how to compute it completely accurately as some SPI implementations I've seen seem to like to inject little delays between bytes for some reason, but a reasonable enough estimate should just be spi_transfer_time_ms = (bits * 1000) / spi_clock_speed. I would like to have the ability to configure it without a kernel recompile, though. Would a kernel param (module_param) be acceptable for this application? You don't happen to know the WL12xx SPI clock speed off the top of your head, do you? Regards, Greg -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 19, 2015 at 01:28:00PM -0400, Greg Knight wrote: > Changing DMA_MIN_BYTES to, say, "dma_min_time_ms" sounds reasonable to > me. I don't know how to compute it completely accurately as some SPI You probably need both - there's often a hard limit where the FIFO size in the hardware becomes a limit for DMA as well as a soft limit on top of that for performance reasons. > implementations I've seen seem to like to inject little delays between > bytes for some reason, but a reasonable enough estimate should just be > spi_transfer_time_ms = (bits * 1000) / spi_clock_speed. > I would like to have the ability to configure it without a kernel > recompile, though. Would a kernel param (module_param) be acceptable for > this application? Exposing it as a sysfs or debugfs file might make more sense if you were going to do this. Is that required after merge though? It seems like this is something where we've perhaps been tuning the wrong parameter (transfer size instead of runtime), or need to investigate dynamic tuning.
On 03/19/2015 07:28 PM, Greg Knight wrote: > Changing DMA_MIN_BYTES to, say, "dma_min_time_ms" sounds reasonable to > me. I don't know how to compute it completely accurately as some SPI > implementations I've seen seem to like to inject little delays between > bytes for some reason, but a reasonable enough estimate should just be > spi_transfer_time_ms = (bits * 1000) / spi_clock_speed. > > I would like to have the ability to configure it without a kernel > recompile, though. Would a kernel param (module_param) be acceptable for > this application? Or sysfs interface as Mark suggested? But module_param should work as well. > You don't happen to know the WL12xx SPI clock speed off the top of your > head, do you? In case of n900 it is 48000000 (board-rx51-peripherals.c). It is using 32bit words over SPI. Based on this and your experience I guess it is possible to come up with a formula which satisfy both. > Regards, > Greg >
On 03/19/2015 08:51 PM, Mark Brown wrote: > On Thu, Mar 19, 2015 at 01:28:00PM -0400, Greg Knight wrote: > >> Changing DMA_MIN_BYTES to, say, "dma_min_time_ms" sounds reasonable to >> me. I don't know how to compute it completely accurately as some SPI > > You probably need both - there's often a hard limit where the FIFO size > in the hardware becomes a limit for DMA as well as a soft limit on top > of that for performance reasons. The FIFO is only going to be enabled when the DMA is used for transfer so we should have some lower limit for the PIO/DMA threshold. The FIFO in McSPI is a tricky one anyways, since it has only one FIFO but several channels and the FIFO can be enabled for only one channel, if it is enabled for more channels it is not going to be used by either channel.
On Fri, Mar 20, 2015 at 03:10:04PM +0200, Peter Ujfalusi wrote: > On 03/19/2015 08:51 PM, Mark Brown wrote: > > You probably need both - there's often a hard limit where the FIFO size > > in the hardware becomes a limit for DMA as well as a soft limit on top > > of that for performance reasons. > The FIFO is only going to be enabled when the DMA is used for transfer so we > should have some lower limit for the PIO/DMA threshold. The FIFO in McSPI is a Right, that's pretty much what I'm trying to say. > tricky one anyways, since it has only one FIFO but several channels and the > FIFO can be enabled for only one channel, if it is enabled for more channels > it is not going to be used by either channel. Not sure I follow this - how does this whole multiple channels thing work for SPI exactly?
Hi I'm interessted about the state of the patch? I have a similar situation with a am3352 device and 10kHz SPI transfer. (Kernel version 4.1) The number of bytes to transfer are in between of 3 and 30. The SPI worker thread occupies the CPU with ~60% load. With a DMA_MIN_BYTES = 10, the CPU load will be reduced to 10%, which is still to much in my opinion. Are there other workarounds or options to get rid of busy waits with omap2-mcspi? -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From 2b51699d1f7f05de45f0f0f065c37da81181f4eb Mon Sep 17 00:00:00 2001 From: Greg Knight <g.knight@symetrica.com> Date: Mon, 2 Mar 2015 10:44:21 -0500 Subject: [PATCH 3/3] spi-omap2-mcspi: DMA_MIN_BYTES hashdef => ti,dma-min-bytes device tree option --- drivers/spi/spi-omap2-mcspi.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/spi/spi-omap2-mcspi.c b/drivers/spi/spi-omap2-mcspi.c index 70cd418..4ac1f3e 100644 --- a/drivers/spi/spi-omap2-mcspi.c +++ b/drivers/spi/spi-omap2-mcspi.c @@ -117,8 +117,7 @@ struct omap2_mcspi_dma { /* use PIO for small transfers, avoiding DMA setup/teardown overhead and * cache operations; better heuristics consider wordsize and bitrate. */ -#define DMA_MIN_BYTES 160 - +#define DMA_MIN_BYTES_DEFAULT 160 /* * Used for context save and restore, structure members to be updated whenever @@ -141,6 +140,9 @@ struct omap2_mcspi { struct omap2_mcspi_regs ctx; int fifo_depth; unsigned int pin_dir:1; + + /* SPI transfer threshold over which we prefer DMA to PIO */ + unsigned dma_min_bytes; }; struct omap2_mcspi_cs { @@ -1115,7 +1117,7 @@ static void omap2_mcspi_work(struct omap2_mcspi *mcspi, struct spi_message *m) unsigned count; if ((mcspi_dma->dma_rx && mcspi_dma->dma_tx) && - (m->is_dma_mapped || t->len >= DMA_MIN_BYTES)) + (m->is_dma_mapped || t->len >= mcspi->dma_min_bytes)) omap2_mcspi_set_fifo(spi, t, 1); omap2_mcspi_set_enable(spi, 1); @@ -1126,7 +1128,7 @@ static void omap2_mcspi_work(struct omap2_mcspi *mcspi, struct spi_message *m) + OMAP2_MCSPI_TX0); if ((mcspi_dma->dma_rx && mcspi_dma->dma_tx) && - (m->is_dma_mapped || t->len >= DMA_MIN_BYTES)) + (m->is_dma_mapped || t->len >= mcspi->dma_min_bytes)) count = omap2_mcspi_txrx_dma(spi, t); else count = omap2_mcspi_txrx_pio(spi, t); @@ -1216,7 +1218,7 @@ static int omap2_mcspi_transfer_one_message(struct spi_master *master, return -EINVAL; } - if (m->is_dma_mapped || len < DMA_MIN_BYTES) + if (m->is_dma_mapped || len < mcspi->dma_min_bytes) continue; if (mcspi_dma->dma_tx && tx_buf != NULL) { @@ -1331,10 +1333,12 @@ static int omap2_mcspi_probe(struct platform_device *pdev) mcspi = spi_master_get_devdata(master); mcspi->master = master; + mcspi->dma_min_bytes = DMA_MIN_BYTES_DEFAULT; match = of_match_device(omap_mcspi_of_match, &pdev->dev); if (match) { u32 num_cs = 1; /* default number of chipselect */ + u32 dma_min_bytes; pdata = match->data; of_property_read_u32(node, "ti,spi-num-cs", &num_cs); @@ -1342,6 +1346,8 @@ static int omap2_mcspi_probe(struct platform_device *pdev) master->bus_num = bus_num++; if (of_get_property(node, "ti,pindir-d0-out-d1-in", NULL)) mcspi->pin_dir = MCSPI_PINDIR_D0_OUT_D1_IN; + if (!of_property_read_u32(node, "ti,dma-min-bytes", &dma_min_bytes)) + mcspi->dma_min_bytes = (unsigned) dma_min_bytes; } else { pdata = dev_get_platdata(&pdev->dev); master->num_chipselect = pdata->num_cs; -- 1.9.1