diff mbox

Patch to parameterize DMA_MIN_BYTES for omap2-mcspi

Message ID 1426741501.10003.6.camel@midgaarde (mailing list archive)
State New, archived
Headers show

Commit Message

Greg Knight March 19, 2015, 5:05 a.m. UTC
Hi, Mark,

I've attached a patch which adds a device-tree field "ti,dma-min-bytes"
which replaces the macro DMA_MIN_BYTES. Adjusting this field addresses
issues we've had where, in our particular use case, the usleep() in the
SPI worker thread eats a full 20% of our CPU (AM3359).

I opted to implement it as a device-tree parameter and keep the original
value (160) as the default, in order to avoid impacting anyone else.

The patch is attached. Patches 1-2 are an unrelated McASP change (see my
other message).

What is the process for getting this upstreamed?

Thanks,
Greg

Comments

Mark Brown March 19, 2015, 9:47 a.m. UTC | #1
On Thu, Mar 19, 2015 at 01:05:01AM -0400, Greg Knight wrote:

> What is the process for getting this upstreamed?

Please submit the patch following the process in
Documentation/SubmittingPatches.
Peter Ujfalusi March 19, 2015, 12:34 p.m. UTC | #2
On 03/19/2015 07:05 AM, Greg Knight wrote:
> Hi, Mark,
> 
> I've attached a patch which adds a device-tree field "ti,dma-min-bytes"
> which replaces the macro DMA_MIN_BYTES. Adjusting this field addresses
> issues we've had where, in our particular use case, the usleep() in the
> SPI worker thread eats a full 20% of our CPU (AM3359).
> 
> I opted to implement it as a device-tree parameter and keep the original
> value (160) as the default, in order to avoid impacting anyone else.
> 
> The patch is attached. Patches 1-2 are an unrelated McASP change (see my
> other message).
> 
> What is the process for getting this upstreamed?

Please follow the guidelines in Documentation/SubmittingPatches. Patches as
attachments are not preferred since it makes replying/commenting on the
patches hard.

Strictly speaking the dma-min-bytes should not be in DT, it is a software
parameter for the Linux SPI driver implementation.
Also, when changing DT bindings, please update the documentation as well (and
CC the relevant lists with that).

This threshold of 160 bytes in the omap2-mcspi driver is artificial anyways it
is changed from 8 to 160 by this commit:
8b66c13474e16 spi/omap2_mcspi: change default DMA_MIN_BYTES value to 160

It has been changed because of wl1271, but I'm not sure if banging bytes over
the bus when the transfer is less then 160bytes is that great thing. I would
guess that the sweet spot is at around the low tens.

But if it is really like this that different devices perform better with
different threshold for choosing between PIO or DMA transfer then this setting
should come from the slave device and should only affect the transfer setup
when communicating with that device.

Probably adding a parameter (optional) to spi_device struct, so drivers can
pass dma_over_poi_threshold?
If it is not set, than just use whatever is the default.

But I don't think this setting should be in the DT.
Greg Knight March 19, 2015, 1:16 p.m. UTC | #3
Will refer to that documentation and update the SPI docs before resubmitting.

Re; Threshold of 160 is artificial: Believe me, I am more than aware
of this. SPI runs in any speed from low kHz to multi MHz. The only
reason I can fathom for having such a high DMA_MIN_BYTES is to
facilitate high-speed low-volume communication (eg read one byte at a
time from userspace without buffering.) The reason I'm looking at this
at all is because we're doing low-speed low-volume communication, for
which the burn in PIO mode causes severe performance degradation.
Internally we'd changed it to 20, but I might try 8. I originally
tried 0, but observed poor behavior for our use cases. DMA_MIN_BYTES
at 8 would be sensible for our application, but at 160 it is not.

The current solution is for everybody who needs to change their device
settings to churn that macro, just as 8b66c1474e16 did. Changing that
values incurs significant risk of excess CPU load (if increased) or
timing slop (if decreased) to all hardware using the McSPI. Moving the
param to DT allows those of us working on custom boards to modify the
value for some hardware without risking the entire ecosystem. Let's
please not keep a bad solution just because no perfect solution
exists...

I think the proper location for this patch might actually be in the
spidev nodes in DT, rather than at the mcspi level - but I don't
understand why this does not belong in DT. DT is, after all, where one
would normally describe the rest of the slave device bindings. A
sensible value for DMA_MIN_BYTES requires the user to know the CPU
clock speed alongside the SPI bus speed, and estimate acceptable
levels of slop in timing. I don't think userspace should need to do
these computations to avoid excess CPU load; could do it in kernel
space, or leave it up to DT or kernel parameters.

How about moving the speed to the spidev DT nodes?

Regards,
Greg


On Thu, Mar 19, 2015 at 8:34 AM, Peter Ujfalusi <peter.ujfalusi@ti.com> wrote:
>
> On 03/19/2015 07:05 AM, Greg Knight wrote:
> > Hi, Mark,
> >
> > I've attached a patch which adds a device-tree field "ti,dma-min-bytes"
> > which replaces the macro DMA_MIN_BYTES. Adjusting this field addresses
> > issues we've had where, in our particular use case, the usleep() in the
> > SPI worker thread eats a full 20% of our CPU (AM3359).
> >
> > I opted to implement it as a device-tree parameter and keep the original
> > value (160) as the default, in order to avoid impacting anyone else.
> >
> > The patch is attached. Patches 1-2 are an unrelated McASP change (see my
> > other message).
> >
> > What is the process for getting this upstreamed?
>
> Please follow the guidelines in Documentation/SubmittingPatches. Patches as
> attachments are not preferred since it makes replying/commenting on the
> patches hard.
>
> Strictly speaking the dma-min-bytes should not be in DT, it is a software
> parameter for the Linux SPI driver implementation.
> Also, when changing DT bindings, please update the documentation as well (and
> CC the relevant lists with that).
>
> This threshold of 160 bytes in the omap2-mcspi driver is artificial anyways it
> is changed from 8 to 160 by this commit:
> 8b66c13474e16 spi/omap2_mcspi: change default DMA_MIN_BYTES value to 160
>
> It has been changed because of wl1271, but I'm not sure if banging bytes over
> the bus when the transfer is less then 160bytes is that great thing. I would
> guess that the sweet spot is at around the low tens.
>
> But if it is really like this that different devices perform better with
> different threshold for choosing between PIO or DMA transfer then this setting
> should come from the slave device and should only affect the transfer setup
> when communicating with that device.
>
> Probably adding a parameter (optional) to spi_device struct, so drivers can
> pass dma_over_poi_threshold?
> If it is not set, than just use whatever is the default.
>
> But I don't think this setting should be in the DT.
>
> --
> Péter
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Brown March 19, 2015, 2:03 p.m. UTC | #4
On Thu, Mar 19, 2015 at 09:16:31AM -0400, Greg Knight wrote:
> Will refer to that documentation and update the SPI docs before resubmitting.

Please don't top post on kernel lists, reply in line (deleting any
unneeded context) so people have context for what you are talking about.
This makes discussions much easier to follow.
Peter Ujfalusi March 19, 2015, 4:23 p.m. UTC | #5
On 03/19/2015 03:16 PM, Greg Knight wrote:
> Will refer to that documentation and update the SPI docs before resubmitting.
> 
> Re; Threshold of 160 is artificial: Believe me, I am more than aware
> of this. SPI runs in any speed from low kHz to multi MHz. The only
> reason I can fathom for having such a high DMA_MIN_BYTES is to
> facilitate high-speed low-volume communication (eg read one byte at a
> time from userspace without buffering.) The reason I'm looking at this
> at all is because we're doing low-speed low-volume communication, for
> which the burn in PIO mode causes severe performance degradation.
> Internally we'd changed it to 20, but I might try 8. I originally
> tried 0, but observed poor behavior for our use cases. DMA_MIN_BYTES
> at 8 would be sensible for our application, but at 160 it is not.
> 
> How about moving the speed to the spidev DT nodes?

The issue is that: "The device tree is a data structure for describing hardware"
DMA_MIN_BYTES is a software parameter. To be specific, it is Linux software
parameter applicable only to spi-omap2-mcspi.c driver.

I think the best thing we could do is to calculate the DMA_MIN_BYTES in the
driver based on the SPI speed. Something which will give ~160 in the speed in
which the wl driver is used and something which works best in your setup in
the speed you are using the SPI bus.
The best way is to give some msec as a limit. If the transfer would take more
then X msec on the bus to be transferred, we will use DMA, if it is less than
that we fall back to PIO.
Yes, the CPU speed is not taken into account, but IMHO the bus speed is more
important.

What do you think? Will this work for you?
Greg Knight March 19, 2015, 5:28 p.m. UTC | #6
> I think the best thing we could do is to calculate the DMA_MIN_BYTES
in the
> driver based on the SPI speed. Something which will give ~160 in the
speed in
> which the wl driver is used and something which works best in your
setup in
> the speed you are using the SPI bus.
> The best way is to give some msec as a limit. If the transfer would
take more
> then X msec on the bus to be transferred, we will use DMA, if it is
less than
> that we fall back to PIO.
> Yes, the CPU speed is not taken into account, but IMHO the bus speed
is more
> 

Changing DMA_MIN_BYTES to, say, "dma_min_time_ms" sounds reasonable to
me. I don't know how to compute it completely accurately as some SPI
implementations I've seen seem to like to inject little delays between
bytes for some reason, but a reasonable enough estimate should just be
spi_transfer_time_ms = (bits * 1000) / spi_clock_speed.

I would like to have the ability to configure it without a kernel
recompile, though. Would a kernel param (module_param) be acceptable for
this application?

You don't happen to know the WL12xx SPI clock speed off the top of your
head, do you?

Regards,
Greg

--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Brown March 19, 2015, 6:51 p.m. UTC | #7
On Thu, Mar 19, 2015 at 01:28:00PM -0400, Greg Knight wrote:

> Changing DMA_MIN_BYTES to, say, "dma_min_time_ms" sounds reasonable to
> me. I don't know how to compute it completely accurately as some SPI

You probably need both - there's often a hard limit where the FIFO size
in the hardware becomes a limit for DMA as well as a soft limit on top
of that for performance reasons.

> implementations I've seen seem to like to inject little delays between
> bytes for some reason, but a reasonable enough estimate should just be
> spi_transfer_time_ms = (bits * 1000) / spi_clock_speed.

> I would like to have the ability to configure it without a kernel
> recompile, though. Would a kernel param (module_param) be acceptable for
> this application?

Exposing it as a sysfs or debugfs file might make more sense if you were
going to do this.  Is that required after merge though?  It seems like
this is something where we've perhaps been tuning the wrong parameter
(transfer size instead of runtime), or need to investigate dynamic
tuning.
Peter Ujfalusi March 20, 2015, 12:58 p.m. UTC | #8
On 03/19/2015 07:28 PM, Greg Knight wrote:
> Changing DMA_MIN_BYTES to, say, "dma_min_time_ms" sounds reasonable to
> me. I don't know how to compute it completely accurately as some SPI
> implementations I've seen seem to like to inject little delays between
> bytes for some reason, but a reasonable enough estimate should just be
> spi_transfer_time_ms = (bits * 1000) / spi_clock_speed.
> 
> I would like to have the ability to configure it without a kernel
> recompile, though. Would a kernel param (module_param) be acceptable for
> this application?

Or sysfs interface as Mark suggested? But module_param should work as well.

> You don't happen to know the WL12xx SPI clock speed off the top of your
> head, do you?

In case of n900 it is 48000000 (board-rx51-peripherals.c). It is using 32bit
words over SPI.

Based on this and your experience I guess it is possible to come up with a
formula which satisfy both.

> Regards,
> Greg
>
Peter Ujfalusi March 20, 2015, 1:10 p.m. UTC | #9
On 03/19/2015 08:51 PM, Mark Brown wrote:
> On Thu, Mar 19, 2015 at 01:28:00PM -0400, Greg Knight wrote:
> 
>> Changing DMA_MIN_BYTES to, say, "dma_min_time_ms" sounds reasonable to
>> me. I don't know how to compute it completely accurately as some SPI
> 
> You probably need both - there's often a hard limit where the FIFO size
> in the hardware becomes a limit for DMA as well as a soft limit on top
> of that for performance reasons.

The FIFO is only going to be enabled when the DMA is used for transfer so we
should have some lower limit for the PIO/DMA threshold. The FIFO in McSPI is a
tricky one anyways, since it has only one FIFO but several channels and the
FIFO can be enabled for only one channel, if it is enabled for more channels
it is not going to be used by either channel.
Mark Brown March 22, 2015, 4:31 p.m. UTC | #10
On Fri, Mar 20, 2015 at 03:10:04PM +0200, Peter Ujfalusi wrote:
> On 03/19/2015 08:51 PM, Mark Brown wrote:

> > You probably need both - there's often a hard limit where the FIFO size
> > in the hardware becomes a limit for DMA as well as a soft limit on top
> > of that for performance reasons.

> The FIFO is only going to be enabled when the DMA is used for transfer so we
> should have some lower limit for the PIO/DMA threshold. The FIFO in McSPI is a

Right, that's pretty much what I'm trying to say.

> tricky one anyways, since it has only one FIFO but several channels and the
> FIFO can be enabled for only one channel, if it is enabled for more channels
> it is not going to be used by either channel.

Not sure I follow this - how does this whole multiple channels thing
work for SPI exactly?
Anton Habegger May 31, 2016, 5:03 a.m. UTC | #11
Hi

I'm interessted about the state of the patch? 
I have a similar situation with a am3352 device and 10kHz SPI 
transfer. (Kernel version 4.1) The number of bytes to transfer are 
in between of 3 and 30. The SPI  worker thread occupies the CPU 
with ~60% load. With a DMA_MIN_BYTES = 10, the CPU load will be 
reduced to 10%, which is still to much in my opinion.

Are there other workarounds or options to get rid of busy waits 
with omap2-mcspi?




--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 2b51699d1f7f05de45f0f0f065c37da81181f4eb Mon Sep 17 00:00:00 2001
From: Greg Knight <g.knight@symetrica.com>
Date: Mon, 2 Mar 2015 10:44:21 -0500
Subject: [PATCH 3/3] spi-omap2-mcspi: DMA_MIN_BYTES hashdef =>
 ti,dma-min-bytes device tree option

---
 drivers/spi/spi-omap2-mcspi.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/spi/spi-omap2-mcspi.c b/drivers/spi/spi-omap2-mcspi.c
index 70cd418..4ac1f3e 100644
--- a/drivers/spi/spi-omap2-mcspi.c
+++ b/drivers/spi/spi-omap2-mcspi.c
@@ -117,8 +117,7 @@  struct omap2_mcspi_dma {
 /* use PIO for small transfers, avoiding DMA setup/teardown overhead and
  * cache operations; better heuristics consider wordsize and bitrate.
  */
-#define DMA_MIN_BYTES			160
-
+#define DMA_MIN_BYTES_DEFAULT			160
 
 /*
  * Used for context save and restore, structure members to be updated whenever
@@ -141,6 +140,9 @@  struct omap2_mcspi {
 	struct omap2_mcspi_regs ctx;
 	int			fifo_depth;
 	unsigned int		pin_dir:1;
+
+	/* SPI transfer threshold over which we prefer DMA to PIO */
+	unsigned dma_min_bytes;
 };
 
 struct omap2_mcspi_cs {
@@ -1115,7 +1117,7 @@  static void omap2_mcspi_work(struct omap2_mcspi *mcspi, struct spi_message *m)
 			unsigned	count;
 
 			if ((mcspi_dma->dma_rx && mcspi_dma->dma_tx) &&
-			    (m->is_dma_mapped || t->len >= DMA_MIN_BYTES))
+			    (m->is_dma_mapped || t->len >= mcspi->dma_min_bytes))
 				omap2_mcspi_set_fifo(spi, t, 1);
 
 			omap2_mcspi_set_enable(spi, 1);
@@ -1126,7 +1128,7 @@  static void omap2_mcspi_work(struct omap2_mcspi *mcspi, struct spi_message *m)
 						+ OMAP2_MCSPI_TX0);
 
 			if ((mcspi_dma->dma_rx && mcspi_dma->dma_tx) &&
-			    (m->is_dma_mapped || t->len >= DMA_MIN_BYTES))
+			    (m->is_dma_mapped || t->len >= mcspi->dma_min_bytes))
 				count = omap2_mcspi_txrx_dma(spi, t);
 			else
 				count = omap2_mcspi_txrx_pio(spi, t);
@@ -1216,7 +1218,7 @@  static int omap2_mcspi_transfer_one_message(struct spi_master *master,
 			return -EINVAL;
 		}
 
-		if (m->is_dma_mapped || len < DMA_MIN_BYTES)
+		if (m->is_dma_mapped || len < mcspi->dma_min_bytes)
 			continue;
 
 		if (mcspi_dma->dma_tx && tx_buf != NULL) {
@@ -1331,10 +1333,12 @@  static int omap2_mcspi_probe(struct platform_device *pdev)
 
 	mcspi = spi_master_get_devdata(master);
 	mcspi->master = master;
+	mcspi->dma_min_bytes = DMA_MIN_BYTES_DEFAULT;
 
 	match = of_match_device(omap_mcspi_of_match, &pdev->dev);
 	if (match) {
 		u32 num_cs = 1; /* default number of chipselect */
+		u32 dma_min_bytes;
 		pdata = match->data;
 
 		of_property_read_u32(node, "ti,spi-num-cs", &num_cs);
@@ -1342,6 +1346,8 @@  static int omap2_mcspi_probe(struct platform_device *pdev)
 		master->bus_num = bus_num++;
 		if (of_get_property(node, "ti,pindir-d0-out-d1-in", NULL))
 			mcspi->pin_dir = MCSPI_PINDIR_D0_OUT_D1_IN;
+		if (!of_property_read_u32(node, "ti,dma-min-bytes", &dma_min_bytes))
+			mcspi->dma_min_bytes = (unsigned) dma_min_bytes;
 	} else {
 		pdata = dev_get_platdata(&pdev->dev);
 		master->num_chipselect = pdata->num_cs;
-- 
1.9.1