Message ID | 2DB27253-8254-4098-ADE3-9AA22D52BE9D@martin.sperl.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Apr 18, 2015 at 03:31:13PM +0200, Martin Sperl wrote: > > On 18.04.2015, at 14:27, Mark Brown <broonie@kernel.org> wrote: > > That's probably fine, though the 40ms is a bit on the long side. The > > 30s timeout could use pulling in too, that's going to fail very badly if > > anything does go wronng. > Anything below 2 jiffies will give enough false positives during a kernel > recompile - the original code has had 2 jiffies as the effective minimum, > because the calculated delivery-time of max 30us is still orders of magnitudes > smaller than a single jiffy, but a reschedule can happen, which would be the > main reason why we have had triggered timeouts before. Sure, but with the fallback to a much longer sleeping delay that'd be covered transparently anyway - it doesn't matter if we don't always manage to busy wait under load, what's more important is that we don't fail the operation as a whole. Actually if it's just scheduling that's a concern then one final check after the time expires should do the trick shouldn't it? > SO IMO anything shorter than 2-3 jifies would need to use a new framework to > get access to high-resolution timers - and I do not know how to do that. hrtimer_ is the high resolution timer stuff, though I don't think it's really what you're looking for, it's more about async callbacks IIRC.
> On 18.04.2015, at 19:07, Mark Brown <broonie@kernel.org> wrote: > > Sure, but with the fallback to a much longer sleeping delay that'd be > covered transparently anyway - it doesn't matter if we don't always > manage to busy wait under load, what's more important is that we don't > fail the operation as a whole. > > Actually if it's just scheduling that's a concern then one final check > after the time expires should do the trick shouldn't it? Well - I have tried now lots of things now and I have also started some diagnostics on timeouts as well as trying to detect reschedules and then report on the jiffies that have passed between start of loop and the check and I get values like this: [ 5867.943426] mcp251x spi32759.2: detected reschedule: 7 jiffies [ 5868.628859] mcp251x spi32759.2: detected reschedule: 8 jiffies [ 5869.867311] mcp251x spi32759.2: detected reschedule: 8 jiffies [ 5870.139622] mcp251x spi32759.0: detected reschedule: 8 jiffies [ 5870.272004] mcp251x spi32759.2: detected reschedule: 8 jiffies [ 5871.352704] mcp251x spi32759.0: detected reschedule: 5 jiffies [ 5871.469095] mcp251x spi32759.0: detected reschedule: 8 jiffies [ 5908.783971] mcp251x spi32759.2: detected reschedule: 75 jiffies [ 5955.831822] mcp251x spi32759.0: detected reschedule: 75 jiffies [ 5977.380447] mcp251x spi32759.2: detected reschedule: 82 jiffies [ 5977.602457] mcp251x spi32759.0: detected reschedule: 15 jiffies [ 5980.093570] mcp251x spi32759.2: detected reschedule: 2 jiffies [ 6006.559147] mcp251x spi32759.2: detected reschedule: 5 jiffies [ 6009.030543] mcp251x spi32759.0: detected reschedule: 8 jiffies [ 6016.872027] mcp251x spi32759.0: detected reschedule: 27 jiffies [ 6018.808468] mcp251x spi32759.2: detected reschedule: 82 jiffies [ 6021.586796] mcp251x spi32759.2: detected reschedule: 81 jiffies [ 6024.411264] mcp251x spi32759.2: detected reschedule: 77 jiffies [ 6033.868509] mcp251x spi32759.2: detected reschedule: 55 jiffies [ 6035.914697] mcp251x spi32759.0: detected reschedule: 81 jiffies [ 6051.216355] mcp251x spi32759.2: detected reschedule: 82 jiffies [ 6051.801190] mcp251x spi32759.0: detected reschedule: 55 jiffies [ 6052.075748] mcp251x spi32759.0: detected reschedule: 27 jiffies [ 6054.004605] mcp251x spi32759.0: detected reschedule: 81 jiffies [ 6068.815078] mcp251x spi32759.2: detected reschedule: 27 jiffies [ 6072.394473] mcp251x spi32759.2: detected reschedule: 81 jiffies [ 6081.257468] mcp251x spi32759.2: detected reschedule: 81 jiffies [ 6085.751653] mcp251x spi32759.2: detected reschedule: 82 jiffies [ 6111.253374] mcp251x spi32759.2: detected reschedule: 82 jiffies [ 6119.573181] mcp251x spi32759.2: detected reschedule: 74 jiffies [ 6131.105261] mcp251x spi32759.0: detected reschedule: 4 jiffies [ 6131.960492] mcp251x spi32759.2: detected reschedule: 82 jiffies [ 6132.808198] mcp251x spi32759.2: detected reschedule: 82 jiffies [ 6132.958858] mcp251x spi32759.2: detected reschedule: 8 jiffies [ 6137.992678] mcp251x spi32759.2: detected reschedule: 48 jiffies [ 6144.553370] mcp251x spi32759.2: detected reschedule: 34 jiffies [ 6146.819876] mcp251x spi32759.0: detected reschedule: 82 jiffies [ 6146.969923] mcp251x spi32759.2: detected reschedule: 8 jiffies [ 6152.984406] mcp251x spi32759.2: detected reschedule: 82 jiffies [ 6170.475425] mcp251x spi32759.0: detected reschedule: 74 jiffies [ 6174.954398] mcp251x spi32759.2: detected reschedule: 82 jiffies That happens when running torture CAN message receive test and concurrently taring the “*.o” files of the kernel - which produces lots of IO... But sometimes it also happens during other “minimal” activities - but that is a much rarer event. This is really the worsted case, as even those typical 8000 irq/s on an idle RPI sometimes drop down to 200 irqs/s (plus high block out counts) - and then the system is no very responsive to put it mildly... So the only solution seem to be: if a timeout of > 2 jiffies occurs fall back to interrupts. This way we are catering for the worsted case and at least our code should not make some drivers fail due to missing error-handling in the driver when the system is overloaded and produces this kind of delays. In the hope that this is an acceptable approach I will post a patch as soon as I have confirmed it to be working with a much longer torture-test that does not show any hiccups on the driver-side. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c index 37875cf..1bbfccd 100644 --- a/drivers/spi/spi-bcm2835.c +++ b/drivers/spi/spi-bcm2835.c @@ -68,8 +68,7 @@ #define BCM2835_SPI_CS_CS_10 0x00000002 #define BCM2835_SPI_CS_CS_01 0x00000001 -#define BCM2835_SPI_POLLING_LIMIT_US 30 -#define BCM2835_SPI_TIMEOUT_MS 30000 +#define BCM2835_SPI_POLLING_LIMIT_MS 30 #define BCM2835_SPI_MODE_BITS (SPI_CPOL | SPI_CPHA | SPI_CS_HIGH \ | SPI_NO_CS | SPI_3WIRE) @@ -164,8 +163,9 @@ static int bcm2835_spi_transfer_one_poll(struct spi_master *master, unsigned long xfer_time_us) { struct bcm2835_spi *bs = spi_master_get_devdata(master); - /* set timeout to 1 second of maximum polling */ - unsigned long timeout = jiffies + HZ; + /* set timeout and then fall back to interrupts */ + unsigned long timeout = jiffies + + HZ * BCM2835_SPI_POLLING_LIMIT_MS / 1000; /* enable HW block without interrupts */ bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA); @@ -177,13 +177,12 @@ static int bcm2835_spi_transfer_one_poll(struct spi_master *master, /* fill in tx fifo as much as possible */ bcm2835_wr_fifo(bs); /* if we still expect some data after the read, - * check for a possible timeout + * check for a possible timeout and if we reach that + * then fallback to interrupt mode... */ if (bs->rx_len && time_after(jiffies, timeout)) { - /* Transfer complete - reset SPI HW */ - bcm2835_spi_reset_hw(master); - /* and return timeout */ - return -ETIMEDOUT; + return bcm2835_spi_transfer_one_irq(master, spi, + tfr,cs); } }