diff mbox

[V2] spi: bcm2835: enabling polling mode for transfers shorter than 30us

Message ID 2DB27253-8254-4098-ADE3-9AA22D52BE9D@martin.sperl.org (mailing list archive)
State New, archived
Headers show

Commit Message

Martin Sperl April 18, 2015, 1:31 p.m. UTC
> On 18.04.2015, at 14:27, Mark Brown <broonie@kernel.org> wrote:
> That's probably fine, though the 40ms is a bit on the long side.  The
> 30s timeout could use pulling in too, that's going to fail very badly if
> anything does go wronng.

Anything below 2 jiffies will give enough false positives during a kernel
recompile - the original code has had 2 jiffies as the effective minimum,
because the calculated delivery-time of max 30us is still orders of magnitudes
smaller than a single jiffy, but a reschedule can happen, which would be the
main reason why we have had triggered timeouts before.

SO IMO anything shorter than 2-3 jifies would need to use a new framework to
get access to high-resolution timers - and I do not know how to do that.

We can implement it as 3 jiffies or 30ms if that seems more acceptable.

Would look something like this:

I will need to test it and then I will submit it as a patch against
the 1s timeout patch (so what is already in for-next).

Martin--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Mark Brown April 18, 2015, 5:07 p.m. UTC | #1
On Sat, Apr 18, 2015 at 03:31:13PM +0200, Martin Sperl wrote:

> > On 18.04.2015, at 14:27, Mark Brown <broonie@kernel.org> wrote:
> > That's probably fine, though the 40ms is a bit on the long side.  The
> > 30s timeout could use pulling in too, that's going to fail very badly if
> > anything does go wronng.

> Anything below 2 jiffies will give enough false positives during a kernel
> recompile - the original code has had 2 jiffies as the effective minimum,
> because the calculated delivery-time of max 30us is still orders of magnitudes
> smaller than a single jiffy, but a reschedule can happen, which would be the
> main reason why we have had triggered timeouts before.

Sure, but with the fallback to a much longer sleeping delay that'd be
covered transparently anyway - it doesn't matter if we don't always
manage to busy wait under load, what's more important is that we don't
fail the operation as a whole.

Actually if it's just scheduling that's a concern then one final check
after the time expires should do the trick shouldn't it?

> SO IMO anything shorter than 2-3 jifies would need to use a new framework to
> get access to high-resolution timers - and I do not know how to do that.

hrtimer_ is the high resolution timer stuff, though I don't think it's
really what you're looking for, it's more about async callbacks IIRC.
Martin Sperl April 19, 2015, 9:44 a.m. UTC | #2
> On 18.04.2015, at 19:07, Mark Brown <broonie@kernel.org> wrote:
> 
> Sure, but with the fallback to a much longer sleeping delay that'd be
> covered transparently anyway - it doesn't matter if we don't always
> manage to busy wait under load, what's more important is that we don't
> fail the operation as a whole.
> 
> Actually if it's just scheduling that's a concern then one final check
> after the time expires should do the trick shouldn't it?

Well - I have tried now lots of things now and I have also started some
diagnostics on timeouts as well as trying to detect reschedules and then
report on the jiffies that have passed between start of loop and the check
and I get values like this:

[ 5867.943426] mcp251x spi32759.2: detected reschedule: 7 jiffies
[ 5868.628859] mcp251x spi32759.2: detected reschedule: 8 jiffies
[ 5869.867311] mcp251x spi32759.2: detected reschedule: 8 jiffies
[ 5870.139622] mcp251x spi32759.0: detected reschedule: 8 jiffies
[ 5870.272004] mcp251x spi32759.2: detected reschedule: 8 jiffies
[ 5871.352704] mcp251x spi32759.0: detected reschedule: 5 jiffies
[ 5871.469095] mcp251x spi32759.0: detected reschedule: 8 jiffies
[ 5908.783971] mcp251x spi32759.2: detected reschedule: 75 jiffies
[ 5955.831822] mcp251x spi32759.0: detected reschedule: 75 jiffies
[ 5977.380447] mcp251x spi32759.2: detected reschedule: 82 jiffies
[ 5977.602457] mcp251x spi32759.0: detected reschedule: 15 jiffies
[ 5980.093570] mcp251x spi32759.2: detected reschedule: 2 jiffies
[ 6006.559147] mcp251x spi32759.2: detected reschedule: 5 jiffies
[ 6009.030543] mcp251x spi32759.0: detected reschedule: 8 jiffies
[ 6016.872027] mcp251x spi32759.0: detected reschedule: 27 jiffies
[ 6018.808468] mcp251x spi32759.2: detected reschedule: 82 jiffies
[ 6021.586796] mcp251x spi32759.2: detected reschedule: 81 jiffies
[ 6024.411264] mcp251x spi32759.2: detected reschedule: 77 jiffies
[ 6033.868509] mcp251x spi32759.2: detected reschedule: 55 jiffies
[ 6035.914697] mcp251x spi32759.0: detected reschedule: 81 jiffies
[ 6051.216355] mcp251x spi32759.2: detected reschedule: 82 jiffies
[ 6051.801190] mcp251x spi32759.0: detected reschedule: 55 jiffies
[ 6052.075748] mcp251x spi32759.0: detected reschedule: 27 jiffies
[ 6054.004605] mcp251x spi32759.0: detected reschedule: 81 jiffies
[ 6068.815078] mcp251x spi32759.2: detected reschedule: 27 jiffies
[ 6072.394473] mcp251x spi32759.2: detected reschedule: 81 jiffies
[ 6081.257468] mcp251x spi32759.2: detected reschedule: 81 jiffies
[ 6085.751653] mcp251x spi32759.2: detected reschedule: 82 jiffies
[ 6111.253374] mcp251x spi32759.2: detected reschedule: 82 jiffies
[ 6119.573181] mcp251x spi32759.2: detected reschedule: 74 jiffies
[ 6131.105261] mcp251x spi32759.0: detected reschedule: 4 jiffies
[ 6131.960492] mcp251x spi32759.2: detected reschedule: 82 jiffies
[ 6132.808198] mcp251x spi32759.2: detected reschedule: 82 jiffies
[ 6132.958858] mcp251x spi32759.2: detected reschedule: 8 jiffies
[ 6137.992678] mcp251x spi32759.2: detected reschedule: 48 jiffies
[ 6144.553370] mcp251x spi32759.2: detected reschedule: 34 jiffies
[ 6146.819876] mcp251x spi32759.0: detected reschedule: 82 jiffies
[ 6146.969923] mcp251x spi32759.2: detected reschedule: 8 jiffies
[ 6152.984406] mcp251x spi32759.2: detected reschedule: 82 jiffies
[ 6170.475425] mcp251x spi32759.0: detected reschedule: 74 jiffies
[ 6174.954398] mcp251x spi32759.2: detected reschedule: 82 jiffies

That happens when running torture CAN message receive test and
concurrently taring the “*.o” files of the kernel - which produces
lots of IO...
But sometimes it also happens during other “minimal” activities - 
but that is a much rarer event.

This is really the worsted case, as even those typical 8000 irq/s
on an idle RPI sometimes drop down to 200 irqs/s (plus high block out
counts) - and then the system is no very responsive to put it mildly...

So the only solution seem to be: if a timeout of > 2 jiffies occurs
fall back to interrupts.

This way we are catering for the worsted case and at least our code
should not make some drivers fail due to missing error-handling 
in the driver when the system is overloaded and produces this kind
of delays.

In the hope that this is an acceptable approach I will post a patch
as soon as I have confirmed it to be working with  a much longer
torture-test that does not show any hiccups on the driver-side.

--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 37875cf..1bbfccd 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -68,8 +68,7 @@ 
 #define BCM2835_SPI_CS_CS_10		0x00000002
 #define BCM2835_SPI_CS_CS_01		0x00000001
 
-#define BCM2835_SPI_POLLING_LIMIT_US	30
-#define BCM2835_SPI_TIMEOUT_MS		30000
+#define BCM2835_SPI_POLLING_LIMIT_MS	30
 #define BCM2835_SPI_MODE_BITS	(SPI_CPOL | SPI_CPHA | SPI_CS_HIGH \
 				| SPI_NO_CS | SPI_3WIRE)
 
@@ -164,8 +163,9 @@  static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
 					 unsigned long xfer_time_us)
 {
 	struct bcm2835_spi *bs = spi_master_get_devdata(master);
-	/* set timeout to 1 second of maximum polling */
-	unsigned long timeout = jiffies + HZ;
+	/* set timeout and then fall back to interrupts */
+	unsigned long timeout = jiffies +
+		HZ * BCM2835_SPI_POLLING_LIMIT_MS / 1000;
 
 	/* enable HW block without interrupts */
 	bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA);
@@ -177,13 +177,12 @@  static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
 		/* fill in tx fifo as much as possible */
 		bcm2835_wr_fifo(bs);
 		/* if we still expect some data after the read,
-		 * check for a possible timeout
+		 * check for a possible timeout and if we reach that
+		 * then fallback to interrupt mode...
 		 */
 		if (bs->rx_len && time_after(jiffies, timeout)) {
-			/* Transfer complete - reset SPI HW */
-			bcm2835_spi_reset_hw(master);
-			/* and return timeout */
-			return -ETIMEDOUT;
+			return bcm2835_spi_transfer_one_irq(master, spi,
+							    tfr,cs);
 		}
 	}