diff mbox series

[4/4] iio: ad7949: fix channels mixups

Message ID 20190912144310.7458-5-andrea.merello@gmail.com (mailing list archive)
State New, archived
Headers show
Series Fixes for ad7949 | expand

Commit Message

Andrea Merello Sept. 12, 2019, 2:43 p.m. UTC
Each time we need to read a sample the driver writes the CFG register
(setting the channel to be read in such register) and then it performs
another xfer to read the resulting value.

This does not work correctly because while writing the CFG register the
acquisition phase is ongoing using the _previous_ CFG settings. Then the
device performs the conversion during xfer delay on the voltage stored
duting the said acquisitaion phase. Finally the driver performs the read
(during the next acquisition phase, which is the one done with the right
settings) and it gets the last converted value, that is the wrong data.

In case the configuration is not actually changed, then we still get
correct data, but in case the configuration changes (and this happens e.g.
switching the MUX on another channel), we get wrong data (data from the
previously selected channel).

This patch fixes this by performing one more "dummy" transfer in order to
ending up in reading the data when it's really ready.

Signed-off-by: Andrea Merello <andrea.merello@gmail.com>
---
 drivers/iio/adc/ad7949.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

Comments

Alexandru Ardelean Sept. 13, 2019, 7:19 a.m. UTC | #1
On Thu, 2019-09-12 at 16:43 +0200, Andrea Merello wrote:
> [External]
> 
> Each time we need to read a sample the driver writes the CFG register
> (setting the channel to be read in such register) and then it performs
> another xfer to read the resulting value.
> 
> This does not work correctly because while writing the CFG register the
> acquisition phase is ongoing using the _previous_ CFG settings. Then the
> device performs the conversion during xfer delay on the voltage stored
> duting the said acquisitaion phase. Finally the driver performs the read
> (during the next acquisition phase, which is the one done with the right
> settings) and it gets the last converted value, that is the wrong data.
> 
> In case the configuration is not actually changed, then we still get
> correct data, but in case the configuration changes (and this happens e.g.
> switching the MUX on another channel), we get wrong data (data from the
> previously selected channel).
> 
> This patch fixes this by performing one more "dummy" transfer in order to
> ending up in reading the data when it's really ready.

So, at power-up this chip seems to need 2 dummy reads to discard data.
Which seems to happen in ad7949_spi_init()

One thing that maybe could be optimized (for the driver), is in `ad7949_spi_read_channel()` to use the current-channel &
not do a SPI write to change the channel if it doesn't change.

Datasheets (in general) are not always obvious about describing chip behavior for SW (or for writing a driver), but I
would suspect that if you are reading garbage data, it could be that the channel has changed.
This is true for some other ADCs.
And requires testing for this one.

Added Charles-Antoine, since he wrote the driver.
Shoud have added him on the other patches as well, but I just remembered.

Thanks
Alex

> 
> Signed-off-by: Andrea Merello <andrea.merello@gmail.com>
> ---
>  drivers/iio/adc/ad7949.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iio/adc/ad7949.c b/drivers/iio/adc/ad7949.c
> index 25d1e1b24257..b1dbe2075ca9 100644
> --- a/drivers/iio/adc/ad7949.c
> +++ b/drivers/iio/adc/ad7949.c
> @@ -85,6 +85,7 @@ static int ad7949_spi_read_channel(struct ad7949_adc_chip *ad7949_adc, int *val,
>  				   unsigned int channel)
>  {
>  	int ret;
> +	int i;
>  	int bits_per_word = ad7949_adc->resolution;
>  	int mask = GENMASK(ad7949_adc->resolution, 0);
>  	struct spi_message msg;
> @@ -97,12 +98,19 @@ static int ad7949_spi_read_channel(struct ad7949_adc_chip *ad7949_adc, int *val,
>  		},
>  	};
>  
> -	ret = ad7949_spi_write_cfg(ad7949_adc,
> -				   channel << AD7949_OFFSET_CHANNEL_SEL,
> -				   AD7949_MASK_CHANNEL_SEL);
> -	if (ret)
> -		return ret;
> +	/*
> +	 * 1: write CFG for sample 'n' and read garbage (sample n-2)
> +	 * 2: write something and read garbage (sample n-1)
> +	 */
> +	for (i = 0; i < 2; i++) {
> +		ret = ad7949_spi_write_cfg(ad7949_adc,
> +					   channel << AD7949_OFFSET_CHANNEL_SEL,
> +					   AD7949_MASK_CHANNEL_SEL);
> +		if (ret)
> +			return ret;
> +	}
>  
> +	/* 3: write something and read data for sample 'n' */
>  	ad7949_adc->buffer = 0;
>  	spi_message_init_with_transfers(&msg, tx, 1);
>  	ret = spi_sync(ad7949_adc->spi, &msg);
Andrea Merello Sept. 13, 2019, 8:30 a.m. UTC | #2
Il giorno ven 13 set 2019 alle ore 09:19 Ardelean, Alexandru
<alexandru.Ardelean@analog.com> ha scritto:
>
> On Thu, 2019-09-12 at 16:43 +0200, Andrea Merello wrote:
> > [External]
> >
> > Each time we need to read a sample the driver writes the CFG register
> > (setting the channel to be read in such register) and then it performs
> > another xfer to read the resulting value.
> >
> > This does not work correctly because while writing the CFG register the
> > acquisition phase is ongoing using the _previous_ CFG settings. Then the
> > device performs the conversion during xfer delay on the voltage stored
> > duting the said acquisitaion phase. Finally the driver performs the read
> > (during the next acquisition phase, which is the one done with the right
> > settings) and it gets the last converted value, that is the wrong data.
> >
> > In case the configuration is not actually changed, then we still get
> > correct data, but in case the configuration changes (and this happens e.g.
> > switching the MUX on another channel), we get wrong data (data from the
> > previously selected channel).
> >
> > This patch fixes this by performing one more "dummy" transfer in order to
> > ending up in reading the data when it's really ready.
>
> So, at power-up this chip seems to need 2 dummy reads to discard data.
> Which seems to happen in ad7949_spi_init()
>
> One thing that maybe could be optimized (for the driver), is in `ad7949_spi_read_channel()` to use the current-channel &
> not do a SPI write to change the channel if it doesn't change.
>
> Datasheets (in general) are not always obvious about describing chip behavior for SW (or for writing a driver), but I
> would suspect that if you are reading garbage data, it could be that the channel has changed.
> This is true for some other ADCs.
> And requires testing for this one.

Yes, it's exactly what I've seen here. If the channel does not change
then the AD is already in acquisition phase on the right channel (I
assume it's OK to keep it in such phase indefinitely), then we can
just trigger a new conversion (CNV low->high, that is a dummy xfer)
and then read the result in following xfer, as the driver already did.

I craft a V2 that performs the extra (3rd) spi xfer only if the
channel has to change.

> Added Charles-Antoine, since he wrote the driver.
> Shoud have added him on the other patches as well, but I just remembered.

I tried on my first answer, but apparently mails to his address bounce
back with a failure response..

> Thanks
> Alex
>
> >
> > Signed-off-by: Andrea Merello <andrea.merello@gmail.com>
> > ---
> >  drivers/iio/adc/ad7949.c | 18 +++++++++++++-----
> >  1 file changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/iio/adc/ad7949.c b/drivers/iio/adc/ad7949.c
> > index 25d1e1b24257..b1dbe2075ca9 100644
> > --- a/drivers/iio/adc/ad7949.c
> > +++ b/drivers/iio/adc/ad7949.c
> > @@ -85,6 +85,7 @@ static int ad7949_spi_read_channel(struct ad7949_adc_chip *ad7949_adc, int *val,
> >                                  unsigned int channel)
> >  {
> >       int ret;
> > +     int i;
> >       int bits_per_word = ad7949_adc->resolution;
> >       int mask = GENMASK(ad7949_adc->resolution, 0);
> >       struct spi_message msg;
> > @@ -97,12 +98,19 @@ static int ad7949_spi_read_channel(struct ad7949_adc_chip *ad7949_adc, int *val,
> >               },
> >       };
> >
> > -     ret = ad7949_spi_write_cfg(ad7949_adc,
> > -                                channel << AD7949_OFFSET_CHANNEL_SEL,
> > -                                AD7949_MASK_CHANNEL_SEL);
> > -     if (ret)
> > -             return ret;
> > +     /*
> > +      * 1: write CFG for sample 'n' and read garbage (sample n-2)
> > +      * 2: write something and read garbage (sample n-1)
> > +      */
> > +     for (i = 0; i < 2; i++) {
> > +             ret = ad7949_spi_write_cfg(ad7949_adc,
> > +                                        channel << AD7949_OFFSET_CHANNEL_SEL,
> > +                                        AD7949_MASK_CHANNEL_SEL);
> > +             if (ret)
> > +                     return ret;
> > +     }
> >
> > +     /* 3: write something and read data for sample 'n' */
> >       ad7949_adc->buffer = 0;
> >       spi_message_init_with_transfers(&msg, tx, 1);
> >       ret = spi_sync(ad7949_adc->spi, &msg);
Couret Charles-Antoine Sept. 13, 2019, 11:30 a.m. UTC | #3
Hi,

Le 13/09/2019 à 10:30, Andrea Merello a écrit :
> Il giorno ven 13 set 2019 alle ore 09:19 Ardelean, Alexandru
> <alexandru.Ardelean@analog.com> ha scritto:
>
>> So, at power-up this chip seems to need 2 dummy reads to discard data.
>> Which seems to happen in ad7949_spi_init()
>>
>> One thing that maybe could be optimized (for the driver), is in `ad7949_spi_read_channel()` to use the current-channel &
>> not do a SPI write to change the channel if it doesn't change.
>>
>> Datasheets (in general) are not always obvious about describing chip behavior for SW (or for writing a driver), but I
>> would suspect that if you are reading garbage data, it could be that the channel has changed.
>> This is true for some other ADCs.
>> And requires testing for this one.
> Yes, it's exactly what I've seen here. If the channel does not change
> then the AD is already in acquisition phase on the right channel (I
> assume it's OK to keep it in such phase indefinitely), then we can
> just trigger a new conversion (CNV low->high, that is a dummy xfer)
> and then read the result in following xfer, as the driver already did.
>
> I craft a V2 that performs the extra (3rd) spi xfer only if the
> channel has to change.

This design should be ok. I didn't implement in that way because not 
enough time to optimize the driver before release (I don't have access 
to the chip anymore) and for our workflow it was not relevant (we 
scanned all channels).


About your fix to read / write several times before reading the value 
after channel change seems not relevant. Did you try with the current 
implementation? Because when I developed the driver, I have always got 
the expected value for each channel with this design.


Just to be sure we are not adding useless steps.

>> Added Charles-Antoine, since he wrote the driver.
>> Shoud have added him on the other patches as well, but I just remembered.
> I tried on my first answer, but apparently mails to his address bounce
> back with a failure response..

But now it seems ok. Did you put the right email address?


Thank you for the copy.

Regards,

Charles-Antoine Couret
Andrea Merello Sept. 13, 2019, 11:40 a.m. UTC | #4
Il giorno ven 13 set 2019 alle ore 13:30 Couret Charles-Antoine
<charles-antoine.couret@essensium.com> ha scritto:
>
> Hi,
>
> Le 13/09/2019 à 10:30, Andrea Merello a écrit :
> > Il giorno ven 13 set 2019 alle ore 09:19 Ardelean, Alexandru
> > <alexandru.Ardelean@analog.com> ha scritto:
> >
> >> So, at power-up this chip seems to need 2 dummy reads to discard data.
> >> Which seems to happen in ad7949_spi_init()
> >>
> >> One thing that maybe could be optimized (for the driver), is in `ad7949_spi_read_channel()` to use the current-channel &
> >> not do a SPI write to change the channel if it doesn't change.
> >>
> >> Datasheets (in general) are not always obvious about describing chip behavior for SW (or for writing a driver), but I
> >> would suspect that if you are reading garbage data, it could be that the channel has changed.
> >> This is true for some other ADCs.
> >> And requires testing for this one.
> > Yes, it's exactly what I've seen here. If the channel does not change
> > then the AD is already in acquisition phase on the right channel (I
> > assume it's OK to keep it in such phase indefinitely), then we can
> > just trigger a new conversion (CNV low->high, that is a dummy xfer)
> > and then read the result in following xfer, as the driver already did.
> >
> > I craft a V2 that performs the extra (3rd) spi xfer only if the
> > channel has to change.
>
> This design should be ok. I didn't implement in that way because not
> enough time to optimize the driver before release (I don't have access
> to the chip anymore) and for our workflow it was not relevant (we
> scanned all channels).
>
>
> About your fix to read / write several times before reading the value
> after channel change seems not relevant. Did you try with the current
> implementation? Because when I developed the driver, I have always got
> the expected value for each channel with this design.

Yes, I did. But I experienced the said problems. I don't have idea
about why it worked for you and it didn't for me..

By scanning the channels in circular fashion, with the current
implementation, I would expect to get values off-by-one (i.e. you read
ch ...,0,1,2,3,4,0,1,2,3,4,... and you get value for ch
...,4,0,1,2,3,4,0,1,2,3,...).

>
> Just to be sure we are not adding useless steps.
>
> >> Added Charles-Antoine, since he wrote the driver.
> >> Shoud have added him on the other patches as well, but I just remembered.
> > I tried on my first answer, but apparently mails to his address bounce
> > back with a failure response..
>
> But now it seems ok. Did you put the right email address?

Sorry, my fault. I've done a mistake in copy-and-paste.

>
> Thank you for the copy.
>
> Regards,
>
> Charles-Antoine Couret
>
Andrea Merello Sept. 20, 2019, 7:45 a.m. UTC | #5
Il giorno ven 13 set 2019 alle ore 13:30 Couret Charles-Antoine
<charles-antoine.couret@essensium.com> ha scritto:

> >> One thing that maybe could be optimized (for the driver), is in `ad7949_spi_read_channel()` to use the current-channel &
> >> not do a SPI write to change the channel if it doesn't change.
> >>
> >> Datasheets (in general) are not always obvious about describing chip behavior for SW (or for writing a driver), but I
> >> would suspect that if you are reading garbage data, it could be that the channel has changed.
> >> This is true for some other ADCs.
> >> And requires testing for this one.
> > Yes, it's exactly what I've seen here. If the channel does not change
> > then the AD is already in acquisition phase on the right channel (I
> > assume it's OK to keep it in such phase indefinitely), then we can
> > just trigger a new conversion (CNV low->high, that is a dummy xfer)
> > and then read the result in following xfer, as the driver already did.
> >
> > I craft a V2 that performs the extra (3rd) spi xfer only if the
> > channel has to change.
>
> This design should be ok. I didn't implement in that way because not
> enough time to optimize the driver before release (I don't have access
> to the chip anymore) and for our workflow it was not relevant (we
> scanned all channels).

I was in the process of doing this, but, thinking again about it, I'm
not completely sure it is really OK:
Should we guarantee that the value we return as outcome of a IIO read
request (i.e. we access in_voltage_raw) is sampled _after_ the user
read request?

For example, the user might setup some other HW for the measure, or
somehow wait for the right moment for doing the measure, and then
perform the read from IIO, thus it's probably not OK if we convert a
value sampled just before the IIO read request.

If we skip the configuration rewrite when the channel doesn't change -
as discussed above - then we actually _terminate_ the acquisition when
the IIO read is triggered, that is we are converting the value sampled
right before the IIO read.

If this is OK then I'll go on, otherwise I think that we should always
do the three cycles (so that triggering IIO read always waits also for
a new acquisition phase)
Jonathan Cameron Sept. 21, 2019, 5:12 p.m. UTC | #6
On Fri, 20 Sep 2019 09:45:21 +0200
Andrea Merello <andrea.merello@gmail.com> wrote:

> Il giorno ven 13 set 2019 alle ore 13:30 Couret Charles-Antoine
> <charles-antoine.couret@essensium.com> ha scritto:
> 
> > >> One thing that maybe could be optimized (for the driver), is in `ad7949_spi_read_channel()` to use the current-channel &
> > >> not do a SPI write to change the channel if it doesn't change.
> > >>
> > >> Datasheets (in general) are not always obvious about describing chip behavior for SW (or for writing a driver), but I
> > >> would suspect that if you are reading garbage data, it could be that the channel has changed.
> > >> This is true for some other ADCs.
> > >> And requires testing for this one.  
> > > Yes, it's exactly what I've seen here. If the channel does not change
> > > then the AD is already in acquisition phase on the right channel (I
> > > assume it's OK to keep it in such phase indefinitely), then we can
> > > just trigger a new conversion (CNV low->high, that is a dummy xfer)
> > > and then read the result in following xfer, as the driver already did.
> > >
> > > I craft a V2 that performs the extra (3rd) spi xfer only if the
> > > channel has to change.  
> >
> > This design should be ok. I didn't implement in that way because not
> > enough time to optimize the driver before release (I don't have access
> > to the chip anymore) and for our workflow it was not relevant (we
> > scanned all channels).  
> 
> I was in the process of doing this, but, thinking again about it, I'm
> not completely sure it is really OK:
> Should we guarantee that the value we return as outcome of a IIO read
> request (i.e. we access in_voltage_raw) is sampled _after_ the user
> read request?
> 
> For example, the user might setup some other HW for the measure, or
> somehow wait for the right moment for doing the measure, and then
> perform the read from IIO, thus it's probably not OK if we convert a
> value sampled just before the IIO read request.

Absolutely.  MUX in front of the sensor is a fairly common thing to see.

> 
> If we skip the configuration rewrite when the channel doesn't change -
> as discussed above - then we actually _terminate_ the acquisition when
> the IIO read is triggered, that is we are converting the value sampled
> right before the IIO read.
> 
> If this is OK then I'll go on, otherwise I think that we should always
> do the three cycles (so that triggering IIO read always waits also for
> a new acquisition phase)
An excellent point.  I agree and suspect we may have this wrong in other
sensors.  The question gets more interesting if running in buffered mode
as we have had systems using a trigger generated by an external process.
I suppose in that case we just have to deal with the offset in the fifo
etc in userspace.

Maybe we should think about adding a note to be careful of that.  Not
really sure where we would note it though...

Thanks,

Jonathan
Andrea Merello Sept. 23, 2019, 8:21 a.m. UTC | #7
Il giorno sab 21 set 2019 alle ore 19:12 Jonathan Cameron
<jic23@kernel.org> ha scritto:

> >
> > If we skip the configuration rewrite when the channel doesn't change -
> > as discussed above - then we actually _terminate_ the acquisition when
> > the IIO read is triggered, that is we are converting the value sampled
> > right before the IIO read.
> >
> > If this is OK then I'll go on, otherwise I think that we should always
> > do the three cycles (so that triggering IIO read always waits also for
> > a new acquisition phase)

I had a discussion about this with a HW engineer, he said that it's
probably not necessary to perform a full extra cycle (i.e. SPI xfer +
udelay(2)), rather, since the HW is already in acquisition, it should
be enough to perform the udelay(2) to make sure the internal capacitor
settles (if we change channel of course we need also the SPI xfer to
update the CFG).

So indeed it seems to me that:
- if CFG (channel) changes: we need three full SPI xfer (actual SPI
xfer + delay(2))
- if CFG (channel) doesn't change: we need a delay(2) [*]- to
guarantee the user sees a value sampled after the IIO read, as
discussed - and two full SPI xfer (actual SPI xfer + delay(2))

.. Indeed I also wonder if it would be enough to cycle the CS, without
performing a full SPI xfer, in order to start the conversion.. But
given that this whole thing seems to me a bit complicated and unclear,
I would stick to the dummy cycle for now..

> An excellent point.  I agree and suspect we may have this wrong in other
> sensors.  The question gets more interesting if running in buffered mode
> as we have had systems using a trigger generated by an external process.
> I suppose in that case we just have to deal with the offset in the fifo
> etc in userspace.

Yes. I'm not familiar with IIO buffered mode, but for a streaming,
continuous, read mode I guess that the user would expect some latency
anyway (might be due to the ADC or the buffering mechanism itself or
whatever).

I didn't look into this buffered thing to see how it works, but maybe
we can skip the first udelay(2) [*] in the driver in case of buffered
access?

> Maybe we should think about adding a note to be careful of that.  Not
> really sure where we would note it though...

IMHO clarifying what the API guarantees and what it doesn't in IIO
DocBook could be helpful (I didn't see it, but I might have missed
it)..

So, we could state that a single shot read must return a value sampled
after the read has been shot, and that on the other hand, when using
buffered mode one should expect some latency.. But indeed, since you
said that we might have a number of IIO drivers that actually behave
in a different way, then I'm not sure that it's a good idea; maybe we
could just drop this requirement and allow (and document) that a
single shot IIO read could just _terminate_ the sampling and trigger
the conversion on the current sampled signal? (so also in this driver
we can just not care about this)..

> Thanks,
>
> Jonathan
>
>
Jonathan Cameron Oct. 5, 2019, 9:55 a.m. UTC | #8
On Mon, 23 Sep 2019 10:21:49 +0200
Andrea Merello <andrea.merello@gmail.com> wrote:

> Il giorno sab 21 set 2019 alle ore 19:12 Jonathan Cameron
> <jic23@kernel.org> ha scritto:
> 
> > >
> > > If we skip the configuration rewrite when the channel doesn't change -
> > > as discussed above - then we actually _terminate_ the acquisition when
> > > the IIO read is triggered, that is we are converting the value sampled
> > > right before the IIO read.
> > >
> > > If this is OK then I'll go on, otherwise I think that we should always
> > > do the three cycles (so that triggering IIO read always waits also for
> > > a new acquisition phase)  
> 
> I had a discussion about this with a HW engineer, he said that it's
> probably not necessary to perform a full extra cycle (i.e. SPI xfer +
> udelay(2)), rather, since the HW is already in acquisition, it should
> be enough to perform the udelay(2) to make sure the internal capacitor
> settles (if we change channel of course we need also the SPI xfer to
> update the CFG).
> 
> So indeed it seems to me that:
> - if CFG (channel) changes: we need three full SPI xfer (actual SPI
> xfer + delay(2))
> - if CFG (channel) doesn't change: we need a delay(2) [*]- to
> guarantee the user sees a value sampled after the IIO read, as
> discussed - and two full SPI xfer (actual SPI xfer + delay(2))
> 
> .. Indeed I also wonder if it would be enough to cycle the CS, without
> performing a full SPI xfer, in order to start the conversion.. But
> given that this whole thing seems to me a bit complicated and unclear,
> I would stick to the dummy cycle for now..
> 
> > An excellent point.  I agree and suspect we may have this wrong in other
> > sensors.  The question gets more interesting if running in buffered mode
> > as we have had systems using a trigger generated by an external process.
> > I suppose in that case we just have to deal with the offset in the fifo
> > etc in userspace.  
> 
> Yes. I'm not familiar with IIO buffered mode, but for a streaming,
> continuous, read mode I guess that the user would expect some latency
> anyway (might be due to the ADC or the buffering mechanism itself or
> whatever).

There are a few ugly corners.
1) They'd normally expect the timestamp to be as closely aligned as
possible with the reading in a given scan.  Perhaps we can think about
how to offset that if we know we are actually looking at the one before
last...
2) There are tightly coupled setups where we are switching a mux in
front of the sensor and driving a trigger off that action.
                                                _________
 _________           |--------MUX CNTRL--------|         |
|         |        __|__        ---TRIGGER-----|         |
| INPUT 1 |-------|     |     __|__            |         |
|_________|       |     |    |     |           |   HOST  |
 _________        | MUX |----| ADC |-----------|         |
|         |       |     |    |_____|           |         |
| INPUT 2 |-------|_____|                      |_________|   
|_________| 

This gets represented as a single 'device' that is a consumer
of the ADC channel, and also provides the trigger and handles
the mux control.

Once you have stitched it all together the expectation is that
for each 'scan':
1. Set MUX
2. Allow short settling time
3. Trigger the ADC via gpio or SPI triggered read etc.
4. ADC result comes back.
5. Goto 1.

The full set of inputs are sampled to make up one 'scan' in
the buffer of the container driver.

Now in theory you could interleave the flows so that you know
the data is coming 2 scan's later, but there isn't currently
a way for the 'container' driver to know that is happening,
so the assumption it makes is that everything is 'direct'.

We could add some means of reporting an offset from trigger
to sample into fifo as that would allow such a container
driver to adjust it's mux settings to take that into account.

Note that out container driver also often has it's own
capture trigger coming in so it can get really fiddly as
we don't have any way of knowing if we are in a round robin
situation or just taking a single 'scan'. In single scan
case we want to drop the excess reads from the end, in round robin
we want to keep them as they are the start of the next scan.

> 
> I didn't look into this buffered thing to see how it works, but maybe
> we can skip the first udelay(2) [*] in the driver in case of buffered
> access?
> 
> > Maybe we should think about adding a note to be careful of that.  Not
> > really sure where we would note it though...  
> 
> IMHO clarifying what the API guarantees and what it doesn't in IIO
> DocBook could be helpful (I didn't see it, but I might have missed
> it)..
I agree we should clarify it, but I'm still not totally sure what the
preferred case is! Perhaps best plan is to put forward a patch to add
a description of one of the choices as we can push people to review
that closely as it may mean devices don't comply with the ABI.

There is a 3rd option which is to add the option for devices to describe
what they are doing via a new sysfs attribute.  That may get us
around causing potential breakage in drivers that are already doing it
the 'wrong' way.

> 
> So, we could state that a single shot read must return a value sampled
> after the read has been shot, and that on the other hand, when using
> buffered mode one should expect some latency.. But indeed, since you
> said that we might have a number of IIO drivers that actually behave
> in a different way, then I'm not sure that it's a good idea; maybe we
> could just drop this requirement and allow (and document) that a
> single shot IIO read could just _terminate_ the sampling and trigger
> the conversion on the current sampled signal? (so also in this driver
> we can just not care about this)..

I don't think we need to worry about the difference between a sample
window stopping vs one starting at the trigger.  Right now we don't
even say how long that window is, so a naive assumption would be that
we model it as instantaneous.  For single shot either model is
fine to my mind. We get the nearest practical signal to the point
of reading.  The sysfs interface is slow enough that any fine
grained control of timing is likely to be garbage anyway :)
The only exception would be really slow sensors such as light sensors
where we might be talking 100s of msecs.  In those I'd argue
we do want the capture to start only after we ask.  I think we are
fine on existing drivers for those.

Thanks,

Jonathan

> 
> > Thanks,
> >
> > Jonathan
> >
> >
Jonathan Cameron Oct. 22, 2019, 8:56 a.m. UTC | #9
On Fri, 18 Oct 2019 15:30:17 +0200
Andrea Merello <andrea.merello@gmail.com> wrote:

> Il giorno sab 5 ott 2019 alle ore 11:55 Jonathan Cameron <jic23@kernel.org>
> ha scritto:
> >
> > On Mon, 23 Sep 2019 10:21:49 +0200
> > Andrea Merello <andrea.merello@gmail.com> wrote:
> >  
> > > Il giorno sab 21 set 2019 alle ore 19:12 Jonathan Cameron
> > > <jic23@kernel.org> ha scritto:
> > >  
> > > > >
> > > > > If we skip the configuration rewrite when the channel doesn't  
> change -
> > > > > as discussed above - then we actually _terminate_ the acquisition  
> when
> > > > > the IIO read is triggered, that is we are converting the value  
> sampled
> > > > > right before the IIO read.
> > > > >
> > > > > If this is OK then I'll go on, otherwise I think that we should  
> always
> > > > > do the three cycles (so that triggering IIO read always waits also  
> for
> > > > > a new acquisition phase)  
> > >
> > > I had a discussion about this with a HW engineer, he said that it's
> > > probably not necessary to perform a full extra cycle (i.e. SPI xfer +
> > > udelay(2)), rather, since the HW is already in acquisition, it should
> > > be enough to perform the udelay(2) to make sure the internal capacitor
> > > settles (if we change channel of course we need also the SPI xfer to
> > > update the CFG).
> > >
> > > So indeed it seems to me that:
> > > - if CFG (channel) changes: we need three full SPI xfer (actual SPI
> > > xfer + delay(2))
> > > - if CFG (channel) doesn't change: we need a delay(2) [*]- to
> > > guarantee the user sees a value sampled after the IIO read, as
> > > discussed - and two full SPI xfer (actual SPI xfer + delay(2))
> > >
> > > .. Indeed I also wonder if it would be enough to cycle the CS, without
> > > performing a full SPI xfer, in order to start the conversion.. But
> > > given that this whole thing seems to me a bit complicated and unclear,
> > > I would stick to the dummy cycle for now..
> > >  
> > > > An excellent point.  I agree and suspect we may have this wrong in  
> other
> > > > sensors.  The question gets more interesting if running in buffered  
> mode
> > > > as we have had systems using a trigger generated by an external  
> process.
> > > > I suppose in that case we just have to deal with the offset in the  
> fifo
> > > > etc in userspace.  
> > >
> > > Yes. I'm not familiar with IIO buffered mode, but for a streaming,
> > > continuous, read mode I guess that the user would expect some latency
> > > anyway (might be due to the ADC or the buffering mechanism itself or
> > > whatever).  
> >
> > There are a few ugly corners.
> > 1) They'd normally expect the timestamp to be as closely aligned as
> > possible with the reading in a given scan.  Perhaps we can think about
> > how to offset that if we know we are actually looking at the one before
> > last...  
> 
> Maybe we could sample the timestamp whenever we start a conversion (end of
> SPI XFER), then we use this information for timestamping the value we'll
> read at the next SPI xfer (that is the outcome of the said conversion).
> Maybe we can also subtract, say, half of the acquisition time to each
> timestemp to better center it on the actual acquisition window..

I originally had a plan to 'accurately' describe time offsets so that
we could deal with the difference between devices that are self clocked
(the interrupt is after the samples are done) and devices that are
capture on demand.  It never got implemented though as it seems no one
ever cared :)

As for shifting the timestamp to reflect and earlier triggered capture
(so running a small fifo for the timestamps), that seems like a
sensible approach to keeping timestamp and sample together.

> 
> > 2) There are tightly coupled setups where we are switching a mux in
> > front of the sensor and driving a trigger off that action.
> >                                                 _________
> >  _________           |--------MUX CNTRL--------|         |
> > |         |        __|__        ---TRIGGER-----|         |
> > | INPUT 1 |-------|     |     __|__            |         |
> > |_________|       |     |    |     |           |   HOST  |
> >  _________        | MUX |----| ADC |-----------|         |
> > |         |       |     |    |_____|           |         |
> > | INPUT 2 |-------|_____|                      |_________|
> > |_________|
> >
> > This gets represented as a single 'device' that is a consumer
> > of the ADC channel, and also provides the trigger and handles
> > the mux control.
> >
> > Once you have stitched it all together the expectation is that
> > for each 'scan':
> > 1. Set MUX
> > 2. Allow short settling time  
> 
> Indeed my concern about begin vs end of acquisition window wrt switching
> time of the external mux could possibly be worked-around by just increasing
> settling time here. I also indeed agree wrt what you say below about sysfs
> [*].
> 
> So probably we don't need to worry about this at all in the chip driver.
> 
> > 3. Trigger the ADC via gpio or SPI triggered read etc.
> > 4. ADC result comes back.
> > 5. Goto 1.
> >
> > The full set of inputs are sampled to make up one 'scan' in
> > the buffer of the container driver.
> >
> > Now in theory you could interleave the flows so that you know
> > the data is coming 2 scan's later, but there isn't currently
> > a way for the 'container' driver to know that is happening,
> > so the assumption it makes is that everything is 'direct'.
> >
> > We could add some means of reporting an offset from trigger
> > to sample into fifo as that would allow such a container
> > driver to adjust it's mux settings to take that into account.
> >
> > Note that out container driver also often has it's own
> > capture trigger coming in so it can get really fiddly as
> > we don't have any way of knowing if we are in a round robin
> > situation or just taking a single 'scan'. In single scan
> > case we want to drop the excess reads from the end, in round robin
> > we want to keep them as they are the start of the next scan.  
> 
> Yes, that's what we want. But it seems we cannot easily distinguish the two
> cases.
> 
> The only thing I can think about to handle this is to take into account the
> samples ageing, and use excess samples for the next scan vs throwing them
> away depending on that. But there's the problem of choosing a threshold..
> 
> BTW Maybe if this is handled in the ADC driver then it may also adjust
> things so that all is transparent to upper layers, even getting rid of the
> offset.

It could be done in the ADC driver, but the cost would be that it would
have to run sufficient samples to ensure we are always 'fresh'.  That
is going to kill the sampling rate.   As things stand we have no way
of letting the ADC driver know that this is even desired.

> 
> > >
> > > I didn't look into this buffered thing to see how it works, but maybe
> > > we can skip the first udelay(2) [*] in the driver in case of buffered
> > > access?
> > >  
> > > > Maybe we should think about adding a note to be careful of that.  Not
> > > > really sure where we would note it though...  
> > >
> > > IMHO clarifying what the API guarantees and what it doesn't in IIO
> > > DocBook could be helpful (I didn't see it, but I might have missed
> > > it)..  
> > I agree we should clarify it, but I'm still not totally sure what the
> > preferred case is! Perhaps best plan is to put forward a patch to add
> > a description of one of the choices as we can push people to review
> > that closely as it may mean devices don't comply with the ABI.
> >
> > There is a 3rd option which is to add the option for devices to describe
> > what they are doing via a new sysfs attribute.  That may get us
> > around causing potential breakage in drivers that are already doing it
> > the 'wrong' way.  
> 
> This would be implicit if we make them reporting the offset: a zero offset
> means they behave in the simple, plain, way..
> 
> We might also want to be able to set it, so that if we do sparse single-shot
> scans we can ask the driver to provide data in the way we expect it.

Certainly a fiddly corner case.  If this had always been present we could
just have made it up to the consumer to deal with the offset.  If it wants
a single shot reading, it would just have to do N repeated readings to
flush out the ADC pipeline.   Too late for that though.  So we would have
to default to handling in the driver, and allow for it to be set to 
an offset for high sampling rate users who know how to handle it.
(new userspace).


> 
> > >
> > > So, we could state that a single shot read must return a value sampled
> > > after the read has been shot, and that on the other hand, when using
> > > buffered mode one should expect some latency.. But indeed, since you
> > > said that we might have a number of IIO drivers that actually behave
> > > in a different way, then I'm not sure that it's a good idea; maybe we
> > > could just drop this requirement and allow (and document) that a
> > > single shot IIO read could just _terminate_ the sampling and trigger
> > > the conversion on the current sampled signal? (so also in this driver
> > > we can just not care about this)..  
> >
> > I don't think we need to worry about the difference between a sample
> > window stopping vs one starting at the trigger.  Right now we don't
> > even say how long that window is, so a naive assumption would be that
> > we model it as instantaneous.  For single shot either model is
> > fine to my mind. We get the nearest practical signal to the point
> > of reading.  The sysfs interface is slow enough that any fine
> > grained control of timing is likely to be garbage anyway :)  
> 
> [*]
> 
> > The only exception would be really slow sensors such as light sensors
> > where we might be talking 100s of msecs.  In those I'd argue
> > we do want the capture to start only after we ask.  I think we are
> > fine on existing drivers for those.  
> 
> BTW Indeed the original patch tried to cover also another aspect of the
> thing: when I read e.g. in_voltage2_raw from sysfs, I guess I really want
> to be sure that I'm reading (rasonably fresh) data from channel number 2. I
> guess
> that if another process did read in_voltage3_raw before me, I still want to
> be sure I'm getting data from ch 2, not 3.
> 
> IMO the offset thing make sense only on scans performed by buffered reads,
> in which you configure a scan sequence and you are somehow interested in
> getting data from all those channels (note: the driver targeted by the patch
> does support only raw_read()).

Absolutely agree.  This stuff only applies to more complex drivers doing
buffered mode in which we have some assumptions of 'continuous' sampling.

> 
> Also, getting back to the former example, I don't know how the container
> driver accesses the ADC driver (buffered vs raw_read), but I bet that, as a
> consumer of one ADC channel, it really wants samples from the said channel
> (internal MUX), and not from another.

Uses buffered mode, which is why triggers are involved. However, if it had
the information, the consumer could basically pass through the delay to
userspace.  So if it gets the data 2 samples late, it would tell userspace that
it will provide it two samples late as well.
> 
> So making sure the internal MUX is adjusted at every raw_read() IMO seems
> still the right thing to do.. Still, I'm unsure if the whole thing of
> excess reads makes sense here, or if we can just optimize only buffered
> reads.

Definitely only optimize buffered reads.  The sysfs ones are slow anyway
so no need to be clever!  We just burn samples to get the right thing.

I'm now completely lost on where we got to with the actual change in this
driver.  Seem's Charles wasn't convinced yet (or lost track of the thread).
Perhaps a repost with the example that you gave of the sequence you were
seeing?

Definitely went down a rabbit hole here! :)

Jonathan
> 
> > Thanks,
> >
> > Jonathan
> >  
> > >  
> > > > Thanks,
> > > >
> > > > Jonathan
> > > >
> > > >  
> >
Andrea Merello Nov. 4, 2019, 2:12 p.m. UTC | #10
Il giorno mar 22 ott 2019 alle ore 10:56 Jonathan Cameron
<jic23@kernel.org> ha scritto:
>
> On Fri, 18 Oct 2019 15:30:17 +0200
> Andrea Merello <andrea.merello@gmail.com> wrote:
>
> > Il giorno sab 5 ott 2019 alle ore 11:55 Jonathan Cameron <jic23@kernel.org>
> > ha scritto:
> > >
> > > On Mon, 23 Sep 2019 10:21:49 +0200
> > > Andrea Merello <andrea.merello@gmail.com> wrote:
> > >
> > > > Il giorno sab 21 set 2019 alle ore 19:12 Jonathan Cameron
> > > > <jic23@kernel.org> ha scritto:
> > > >
> > > > > >
> > > > > > If we skip the configuration rewrite when the channel doesn't
> > change -
> > > > > > as discussed above - then we actually _terminate_ the acquisition
> > when
> > > > > > the IIO read is triggered, that is we are converting the value
> > sampled
> > > > > > right before the IIO read.
> > > > > >
> > > > > > If this is OK then I'll go on, otherwise I think that we should
> > always
> > > > > > do the three cycles (so that triggering IIO read always waits also
> > for
> > > > > > a new acquisition phase)
> > > >
> > > > I had a discussion about this with a HW engineer, he said that it's
> > > > probably not necessary to perform a full extra cycle (i.e. SPI xfer +
> > > > udelay(2)), rather, since the HW is already in acquisition, it should
> > > > be enough to perform the udelay(2) to make sure the internal capacitor
> > > > settles (if we change channel of course we need also the SPI xfer to
> > > > update the CFG).
> > > >
> > > > So indeed it seems to me that:
> > > > - if CFG (channel) changes: we need three full SPI xfer (actual SPI
> > > > xfer + delay(2))
> > > > - if CFG (channel) doesn't change: we need a delay(2) [*]- to
> > > > guarantee the user sees a value sampled after the IIO read, as
> > > > discussed - and two full SPI xfer (actual SPI xfer + delay(2))
> > > >
> > > > .. Indeed I also wonder if it would be enough to cycle the CS, without
> > > > performing a full SPI xfer, in order to start the conversion.. But
> > > > given that this whole thing seems to me a bit complicated and unclear,
> > > > I would stick to the dummy cycle for now..
> > > >
> > > > > An excellent point.  I agree and suspect we may have this wrong in
> > other
> > > > > sensors.  The question gets more interesting if running in buffered
> > mode
> > > > > as we have had systems using a trigger generated by an external
> > process.
> > > > > I suppose in that case we just have to deal with the offset in the
> > fifo
> > > > > etc in userspace.
> > > >
> > > > Yes. I'm not familiar with IIO buffered mode, but for a streaming,
> > > > continuous, read mode I guess that the user would expect some latency
> > > > anyway (might be due to the ADC or the buffering mechanism itself or
> > > > whatever).
> > >
> > > There are a few ugly corners.
> > > 1) They'd normally expect the timestamp to be as closely aligned as
> > > possible with the reading in a given scan.  Perhaps we can think about
> > > how to offset that if we know we are actually looking at the one before
> > > last...
> >
> > Maybe we could sample the timestamp whenever we start a conversion (end of
> > SPI XFER), then we use this information for timestamping the value we'll
> > read at the next SPI xfer (that is the outcome of the said conversion).
> > Maybe we can also subtract, say, half of the acquisition time to each
> > timestemp to better center it on the actual acquisition window..
>
> I originally had a plan to 'accurately' describe time offsets so that
> we could deal with the difference between devices that are self clocked
> (the interrupt is after the samples are done) and devices that are
> capture on demand.  It never got implemented though as it seems no one
> ever cared :)
>
> As for shifting the timestamp to reflect and earlier triggered capture
> (so running a small fifo for the timestamps), that seems like a
> sensible approach to keeping timestamp and sample together.
>
> >
> > > 2) There are tightly coupled setups where we are switching a mux in
> > > front of the sensor and driving a trigger off that action.
> > >                                                 _________
> > >  _________           |--------MUX CNTRL--------|         |
> > > |         |        __|__        ---TRIGGER-----|         |
> > > | INPUT 1 |-------|     |     __|__            |         |
> > > |_________|       |     |    |     |           |   HOST  |
> > >  _________        | MUX |----| ADC |-----------|         |
> > > |         |       |     |    |_____|           |         |
> > > | INPUT 2 |-------|_____|                      |_________|
> > > |_________|
> > >
> > > This gets represented as a single 'device' that is a consumer
> > > of the ADC channel, and also provides the trigger and handles
> > > the mux control.
> > >
> > > Once you have stitched it all together the expectation is that
> > > for each 'scan':
> > > 1. Set MUX
> > > 2. Allow short settling time
> >
> > Indeed my concern about begin vs end of acquisition window wrt switching
> > time of the external mux could possibly be worked-around by just increasing
> > settling time here. I also indeed agree wrt what you say below about sysfs
> > [*].
> >
> > So probably we don't need to worry about this at all in the chip driver.
> >
> > > 3. Trigger the ADC via gpio or SPI triggered read etc.
> > > 4. ADC result comes back.
> > > 5. Goto 1.
> > >
> > > The full set of inputs are sampled to make up one 'scan' in
> > > the buffer of the container driver.
> > >
> > > Now in theory you could interleave the flows so that you know
> > > the data is coming 2 scan's later, but there isn't currently
> > > a way for the 'container' driver to know that is happening,
> > > so the assumption it makes is that everything is 'direct'.
> > >
> > > We could add some means of reporting an offset from trigger
> > > to sample into fifo as that would allow such a container
> > > driver to adjust it's mux settings to take that into account.
> > >
> > > Note that out container driver also often has it's own
> > > capture trigger coming in so it can get really fiddly as
> > > we don't have any way of knowing if we are in a round robin
> > > situation or just taking a single 'scan'. In single scan
> > > case we want to drop the excess reads from the end, in round robin
> > > we want to keep them as they are the start of the next scan.
> >
> > Yes, that's what we want. But it seems we cannot easily distinguish the two
> > cases.
> >
> > The only thing I can think about to handle this is to take into account the
> > samples ageing, and use excess samples for the next scan vs throwing them
> > away depending on that. But there's the problem of choosing a threshold..
> >
> > BTW Maybe if this is handled in the ADC driver then it may also adjust
> > things so that all is transparent to upper layers, even getting rid of the
> > offset.
>
> It could be done in the ADC driver, but the cost would be that it would
> have to run sufficient samples to ensure we are always 'fresh'.  That
> is going to kill the sampling rate.   As things stand we have no way
> of letting the ADC driver know that this is even desired.
>
> >
> > > >
> > > > I didn't look into this buffered thing to see how it works, but maybe
> > > > we can skip the first udelay(2) [*] in the driver in case of buffered
> > > > access?
> > > >
> > > > > Maybe we should think about adding a note to be careful of that.  Not
> > > > > really sure where we would note it though...
> > > >
> > > > IMHO clarifying what the API guarantees and what it doesn't in IIO
> > > > DocBook could be helpful (I didn't see it, but I might have missed
> > > > it)..
> > > I agree we should clarify it, but I'm still not totally sure what the
> > > preferred case is! Perhaps best plan is to put forward a patch to add
> > > a description of one of the choices as we can push people to review
> > > that closely as it may mean devices don't comply with the ABI.
> > >
> > > There is a 3rd option which is to add the option for devices to describe
> > > what they are doing via a new sysfs attribute.  That may get us
> > > around causing potential breakage in drivers that are already doing it
> > > the 'wrong' way.
> >
> > This would be implicit if we make them reporting the offset: a zero offset
> > means they behave in the simple, plain, way..
> >
> > We might also want to be able to set it, so that if we do sparse single-shot
> > scans we can ask the driver to provide data in the way we expect it.
>
> Certainly a fiddly corner case.  If this had always been present we could
> just have made it up to the consumer to deal with the offset.  If it wants
> a single shot reading, it would just have to do N repeated readings to
> flush out the ADC pipeline.   Too late for that though.  So we would have
> to default to handling in the driver, and allow for it to be set to
> an offset for high sampling rate users who know how to handle it.
> (new userspace).
>
>
> >
> > > >
> > > > So, we could state that a single shot read must return a value sampled
> > > > after the read has been shot, and that on the other hand, when using
> > > > buffered mode one should expect some latency.. But indeed, since you
> > > > said that we might have a number of IIO drivers that actually behave
> > > > in a different way, then I'm not sure that it's a good idea; maybe we
> > > > could just drop this requirement and allow (and document) that a
> > > > single shot IIO read could just _terminate_ the sampling and trigger
> > > > the conversion on the current sampled signal? (so also in this driver
> > > > we can just not care about this)..
> > >
> > > I don't think we need to worry about the difference between a sample
> > > window stopping vs one starting at the trigger.  Right now we don't
> > > even say how long that window is, so a naive assumption would be that
> > > we model it as instantaneous.  For single shot either model is
> > > fine to my mind. We get the nearest practical signal to the point
> > > of reading.  The sysfs interface is slow enough that any fine
> > > grained control of timing is likely to be garbage anyway :)
> >
> > [*]
> >
> > > The only exception would be really slow sensors such as light sensors
> > > where we might be talking 100s of msecs.  In those I'd argue
> > > we do want the capture to start only after we ask.  I think we are
> > > fine on existing drivers for those.
> >
> > BTW Indeed the original patch tried to cover also another aspect of the
> > thing: when I read e.g. in_voltage2_raw from sysfs, I guess I really want
> > to be sure that I'm reading (rasonably fresh) data from channel number 2. I
> > guess
> > that if another process did read in_voltage3_raw before me, I still want to
> > be sure I'm getting data from ch 2, not 3.
> >
> > IMO the offset thing make sense only on scans performed by buffered reads,
> > in which you configure a scan sequence and you are somehow interested in
> > getting data from all those channels (note: the driver targeted by the patch
> > does support only raw_read()).
>
> Absolutely agree.  This stuff only applies to more complex drivers doing
> buffered mode in which we have some assumptions of 'continuous' sampling.
>
> >
> > Also, getting back to the former example, I don't know how the container
> > driver accesses the ADC driver (buffered vs raw_read), but I bet that, as a
> > consumer of one ADC channel, it really wants samples from the said channel
> > (internal MUX), and not from another.
>
> Uses buffered mode, which is why triggers are involved. However, if it had
> the information, the consumer could basically pass through the delay to
> userspace.  So if it gets the data 2 samples late, it would tell userspace that
> it will provide it two samples late as well.
> >
> > So making sure the internal MUX is adjusted at every raw_read() IMO seems
> > still the right thing to do.. Still, I'm unsure if the whole thing of
> > excess reads makes sense here, or if we can just optimize only buffered
> > reads.
>
> Definitely only optimize buffered reads.  The sysfs ones are slow anyway
> so no need to be clever!  We just burn samples to get the right thing.
>
> I'm now completely lost on where we got to with the actual change in this
> driver.  Seem's Charles wasn't convinced yet (or lost track of the thread).
> Perhaps a repost with the example that you gave of the sequence you were
> seeing?
>
> Definitely went down a rabbit hole here! :)

Yes, Indeed I also got a bit out of focus here :) let me recap a bit:

The original patch aimed to just fix the sysfs read (the only thing
this driver currently supports) so that the resulting value really
belongs to the requested channel. On my board this was not the case,
because to make the internal mux to switch I needed an extra cycle, so
the patch made this happening.

As per my understanding (please correct me if I got something wrong)
then the outcome of our discussions about this is that we don't have
to care about i.e. start of acquisition window wrt end of the
acquisition window; so, ensuring that we are converting from the right
channel, and the data is reasonably fresh (i.e. it is not the outcome
of an extra read performed in an undefined past) should be OK.

As you said, Charles reported that he didn't face this issue. As per
your suggestion, I will repost with the example, to see if Charles or
someone else can reproduce the bug.

Apart of this, I'd say that most of what we have discussed indeed
don't apply to the current driver yet.

> Jonathan
> >
> > > Thanks,
> > >
> > > Jonathan
> > >
> > > >
> > > > > Thanks,
> > > > >
> > > > > Jonathan
> > > > >
> > > > >
> > >
>
Jonathan Cameron Nov. 9, 2019, 11:58 a.m. UTC | #11
On Mon, 4 Nov 2019 15:12:09 +0100
Andrea Merello <andrea.merello@gmail.com> wrote:

> Il giorno mar 22 ott 2019 alle ore 10:56 Jonathan Cameron
> <jic23@kernel.org> ha scritto:
> >
> > On Fri, 18 Oct 2019 15:30:17 +0200
> > Andrea Merello <andrea.merello@gmail.com> wrote:
> >  
> > > Il giorno sab 5 ott 2019 alle ore 11:55 Jonathan Cameron <jic23@kernel.org>
> > > ha scritto:  
> > > >
> > > > On Mon, 23 Sep 2019 10:21:49 +0200
> > > > Andrea Merello <andrea.merello@gmail.com> wrote:
> > > >  
> > > > > Il giorno sab 21 set 2019 alle ore 19:12 Jonathan Cameron
> > > > > <jic23@kernel.org> ha scritto:
> > > > >  
> > > > > > >
> > > > > > > If we skip the configuration rewrite when the channel doesn't  
> > > change -  
> > > > > > > as discussed above - then we actually _terminate_ the acquisition  
> > > when  
> > > > > > > the IIO read is triggered, that is we are converting the value  
> > > sampled  
> > > > > > > right before the IIO read.
> > > > > > >
> > > > > > > If this is OK then I'll go on, otherwise I think that we should  
> > > always  
> > > > > > > do the three cycles (so that triggering IIO read always waits also  
> > > for  
> > > > > > > a new acquisition phase)  
> > > > >
> > > > > I had a discussion about this with a HW engineer, he said that it's
> > > > > probably not necessary to perform a full extra cycle (i.e. SPI xfer +
> > > > > udelay(2)), rather, since the HW is already in acquisition, it should
> > > > > be enough to perform the udelay(2) to make sure the internal capacitor
> > > > > settles (if we change channel of course we need also the SPI xfer to
> > > > > update the CFG).
> > > > >
> > > > > So indeed it seems to me that:
> > > > > - if CFG (channel) changes: we need three full SPI xfer (actual SPI
> > > > > xfer + delay(2))
> > > > > - if CFG (channel) doesn't change: we need a delay(2) [*]- to
> > > > > guarantee the user sees a value sampled after the IIO read, as
> > > > > discussed - and two full SPI xfer (actual SPI xfer + delay(2))
> > > > >
> > > > > .. Indeed I also wonder if it would be enough to cycle the CS, without
> > > > > performing a full SPI xfer, in order to start the conversion.. But
> > > > > given that this whole thing seems to me a bit complicated and unclear,
> > > > > I would stick to the dummy cycle for now..
> > > > >  
> > > > > > An excellent point.  I agree and suspect we may have this wrong in  
> > > other  
> > > > > > sensors.  The question gets more interesting if running in buffered  
> > > mode  
> > > > > > as we have had systems using a trigger generated by an external  
> > > process.  
> > > > > > I suppose in that case we just have to deal with the offset in the  
> > > fifo  
> > > > > > etc in userspace.  
> > > > >
> > > > > Yes. I'm not familiar with IIO buffered mode, but for a streaming,
> > > > > continuous, read mode I guess that the user would expect some latency
> > > > > anyway (might be due to the ADC or the buffering mechanism itself or
> > > > > whatever).  
> > > >
> > > > There are a few ugly corners.
> > > > 1) They'd normally expect the timestamp to be as closely aligned as
> > > > possible with the reading in a given scan.  Perhaps we can think about
> > > > how to offset that if we know we are actually looking at the one before
> > > > last...  
> > >
> > > Maybe we could sample the timestamp whenever we start a conversion (end of
> > > SPI XFER), then we use this information for timestamping the value we'll
> > > read at the next SPI xfer (that is the outcome of the said conversion).
> > > Maybe we can also subtract, say, half of the acquisition time to each
> > > timestemp to better center it on the actual acquisition window..  
> >
> > I originally had a plan to 'accurately' describe time offsets so that
> > we could deal with the difference between devices that are self clocked
> > (the interrupt is after the samples are done) and devices that are
> > capture on demand.  It never got implemented though as it seems no one
> > ever cared :)
> >
> > As for shifting the timestamp to reflect and earlier triggered capture
> > (so running a small fifo for the timestamps), that seems like a
> > sensible approach to keeping timestamp and sample together.
> >  
> > >  
> > > > 2) There are tightly coupled setups where we are switching a mux in
> > > > front of the sensor and driving a trigger off that action.
> > > >                                                 _________
> > > >  _________           |--------MUX CNTRL--------|         |
> > > > |         |        __|__        ---TRIGGER-----|         |
> > > > | INPUT 1 |-------|     |     __|__            |         |
> > > > |_________|       |     |    |     |           |   HOST  |
> > > >  _________        | MUX |----| ADC |-----------|         |
> > > > |         |       |     |    |_____|           |         |
> > > > | INPUT 2 |-------|_____|                      |_________|
> > > > |_________|
> > > >
> > > > This gets represented as a single 'device' that is a consumer
> > > > of the ADC channel, and also provides the trigger and handles
> > > > the mux control.
> > > >
> > > > Once you have stitched it all together the expectation is that
> > > > for each 'scan':
> > > > 1. Set MUX
> > > > 2. Allow short settling time  
> > >
> > > Indeed my concern about begin vs end of acquisition window wrt switching
> > > time of the external mux could possibly be worked-around by just increasing
> > > settling time here. I also indeed agree wrt what you say below about sysfs
> > > [*].
> > >
> > > So probably we don't need to worry about this at all in the chip driver.
> > >  
> > > > 3. Trigger the ADC via gpio or SPI triggered read etc.
> > > > 4. ADC result comes back.
> > > > 5. Goto 1.
> > > >
> > > > The full set of inputs are sampled to make up one 'scan' in
> > > > the buffer of the container driver.
> > > >
> > > > Now in theory you could interleave the flows so that you know
> > > > the data is coming 2 scan's later, but there isn't currently
> > > > a way for the 'container' driver to know that is happening,
> > > > so the assumption it makes is that everything is 'direct'.
> > > >
> > > > We could add some means of reporting an offset from trigger
> > > > to sample into fifo as that would allow such a container
> > > > driver to adjust it's mux settings to take that into account.
> > > >
> > > > Note that out container driver also often has it's own
> > > > capture trigger coming in so it can get really fiddly as
> > > > we don't have any way of knowing if we are in a round robin
> > > > situation or just taking a single 'scan'. In single scan
> > > > case we want to drop the excess reads from the end, in round robin
> > > > we want to keep them as they are the start of the next scan.  
> > >
> > > Yes, that's what we want. But it seems we cannot easily distinguish the two
> > > cases.
> > >
> > > The only thing I can think about to handle this is to take into account the
> > > samples ageing, and use excess samples for the next scan vs throwing them
> > > away depending on that. But there's the problem of choosing a threshold..
> > >
> > > BTW Maybe if this is handled in the ADC driver then it may also adjust
> > > things so that all is transparent to upper layers, even getting rid of the
> > > offset.  
> >
> > It could be done in the ADC driver, but the cost would be that it would
> > have to run sufficient samples to ensure we are always 'fresh'.  That
> > is going to kill the sampling rate.   As things stand we have no way
> > of letting the ADC driver know that this is even desired.
> >  
> > >  
> > > > >
> > > > > I didn't look into this buffered thing to see how it works, but maybe
> > > > > we can skip the first udelay(2) [*] in the driver in case of buffered
> > > > > access?
> > > > >  
> > > > > > Maybe we should think about adding a note to be careful of that.  Not
> > > > > > really sure where we would note it though...  
> > > > >
> > > > > IMHO clarifying what the API guarantees and what it doesn't in IIO
> > > > > DocBook could be helpful (I didn't see it, but I might have missed
> > > > > it)..  
> > > > I agree we should clarify it, but I'm still not totally sure what the
> > > > preferred case is! Perhaps best plan is to put forward a patch to add
> > > > a description of one of the choices as we can push people to review
> > > > that closely as it may mean devices don't comply with the ABI.
> > > >
> > > > There is a 3rd option which is to add the option for devices to describe
> > > > what they are doing via a new sysfs attribute.  That may get us
> > > > around causing potential breakage in drivers that are already doing it
> > > > the 'wrong' way.  
> > >
> > > This would be implicit if we make them reporting the offset: a zero offset
> > > means they behave in the simple, plain, way..
> > >
> > > We might also want to be able to set it, so that if we do sparse single-shot
> > > scans we can ask the driver to provide data in the way we expect it.  
> >
> > Certainly a fiddly corner case.  If this had always been present we could
> > just have made it up to the consumer to deal with the offset.  If it wants
> > a single shot reading, it would just have to do N repeated readings to
> > flush out the ADC pipeline.   Too late for that though.  So we would have
> > to default to handling in the driver, and allow for it to be set to
> > an offset for high sampling rate users who know how to handle it.
> > (new userspace).
> >
> >  
> > >  
> > > > >
> > > > > So, we could state that a single shot read must return a value sampled
> > > > > after the read has been shot, and that on the other hand, when using
> > > > > buffered mode one should expect some latency.. But indeed, since you
> > > > > said that we might have a number of IIO drivers that actually behave
> > > > > in a different way, then I'm not sure that it's a good idea; maybe we
> > > > > could just drop this requirement and allow (and document) that a
> > > > > single shot IIO read could just _terminate_ the sampling and trigger
> > > > > the conversion on the current sampled signal? (so also in this driver
> > > > > we can just not care about this)..  
> > > >
> > > > I don't think we need to worry about the difference between a sample
> > > > window stopping vs one starting at the trigger.  Right now we don't
> > > > even say how long that window is, so a naive assumption would be that
> > > > we model it as instantaneous.  For single shot either model is
> > > > fine to my mind. We get the nearest practical signal to the point
> > > > of reading.  The sysfs interface is slow enough that any fine
> > > > grained control of timing is likely to be garbage anyway :)  
> > >
> > > [*]
> > >  
> > > > The only exception would be really slow sensors such as light sensors
> > > > where we might be talking 100s of msecs.  In those I'd argue
> > > > we do want the capture to start only after we ask.  I think we are
> > > > fine on existing drivers for those.  
> > >
> > > BTW Indeed the original patch tried to cover also another aspect of the
> > > thing: when I read e.g. in_voltage2_raw from sysfs, I guess I really want
> > > to be sure that I'm reading (rasonably fresh) data from channel number 2. I
> > > guess
> > > that if another process did read in_voltage3_raw before me, I still want to
> > > be sure I'm getting data from ch 2, not 3.
> > >
> > > IMO the offset thing make sense only on scans performed by buffered reads,
> > > in which you configure a scan sequence and you are somehow interested in
> > > getting data from all those channels (note: the driver targeted by the patch
> > > does support only raw_read()).  
> >
> > Absolutely agree.  This stuff only applies to more complex drivers doing
> > buffered mode in which we have some assumptions of 'continuous' sampling.
> >  
> > >
> > > Also, getting back to the former example, I don't know how the container
> > > driver accesses the ADC driver (buffered vs raw_read), but I bet that, as a
> > > consumer of one ADC channel, it really wants samples from the said channel
> > > (internal MUX), and not from another.  
> >
> > Uses buffered mode, which is why triggers are involved. However, if it had
> > the information, the consumer could basically pass through the delay to
> > userspace.  So if it gets the data 2 samples late, it would tell userspace that
> > it will provide it two samples late as well.  
> > >
> > > So making sure the internal MUX is adjusted at every raw_read() IMO seems
> > > still the right thing to do.. Still, I'm unsure if the whole thing of
> > > excess reads makes sense here, or if we can just optimize only buffered
> > > reads.  
> >
> > Definitely only optimize buffered reads.  The sysfs ones are slow anyway
> > so no need to be clever!  We just burn samples to get the right thing.
> >
> > I'm now completely lost on where we got to with the actual change in this
> > driver.  Seem's Charles wasn't convinced yet (or lost track of the thread).
> > Perhaps a repost with the example that you gave of the sequence you were
> > seeing?
> >
> > Definitely went down a rabbit hole here! :)  
> 
> Yes, Indeed I also got a bit out of focus here :) let me recap a bit:
> 
> The original patch aimed to just fix the sysfs read (the only thing
> this driver currently supports) so that the resulting value really
> belongs to the requested channel. On my board this was not the case,
> because to make the internal mux to switch I needed an extra cycle, so
> the patch made this happening.
> 
> As per my understanding (please correct me if I got something wrong)
> then the outcome of our discussions about this is that we don't have
> to care about i.e. start of acquisition window wrt end of the
> acquisition window; so, ensuring that we are converting from the right
> channel, and the data is reasonably fresh (i.e. it is not the outcome
> of an extra read performed in an undefined past) should be OK.
I don't think reasonably fresh is good enough.  It needs to occur
'after' the request is made to read it.   Thus userspace nasty controls
of an external mux could set the mux and then trigger the read, safe
in the assumption that if they leave the mux alone until they get
a value they are safe.

> 
> As you said, Charles reported that he didn't face this issue. As per
> your suggestion, I will repost with the example, to see if Charles or
> someone else can reproduce the bug.
That would be great.  Certainly curious that it's not repeating consistently.
> 
> Apart of this, I'd say that most of what we have discussed indeed
> don't apply to the current driver yet.
True enough ;)

Thanks,

Jonathan
> 
> > Jonathan  
> > >  
> > > > Thanks,
> > > >
> > > > Jonathan
> > > >  
> > > > >  
> > > > > > Thanks,
> > > > > >
> > > > > > Jonathan
> > > > > >
> > > > > >  
> > > >  
> >
Couret Charles-Antoine Nov. 12, 2019, 3:09 p.m. UTC | #12
Le 04/11/2019 à 15:12, Andrea Merello a écrit :
>> Il giorno mar 22 ott 2019 alle ore 10:56 Jonathan Cameron
>> <jic23@kernel.org> ha scritto:
>> As you said, Charles reported that he didn't face this issue. As per
>> your suggestion, I will repost with the example, to see if Charles or
>> someone else can reproduce the bug.
>>
>> Apart of this, I'd say that most of what we have discussed indeed
>> don't apply to the current driver yet.
> As you said, Charles reported that he didn't face this issue. As per
> your suggestion, I will repost with the example, to see if Charles or
> someone else can reproduce the bug.

Hi guys,

Sorry for the delay. Yes, when I developed the driver I was able to 
switch from any to any channel without issues with this driver. But from 
my understanding, there are different modes for this device and those 
modes can impact the timings to perform the sampling + getting the 
result. Maybe it can explain this situation?

Unfortunately I don't have access to this chip anymore so I can't 
reproduce it again. And obviously I can't confirm your changes. If on 
your side the change works fine, I don't see any reason to refuse it.


Regards,

Charles-Antoine Couret
diff mbox series

Patch

diff --git a/drivers/iio/adc/ad7949.c b/drivers/iio/adc/ad7949.c
index 25d1e1b24257..b1dbe2075ca9 100644
--- a/drivers/iio/adc/ad7949.c
+++ b/drivers/iio/adc/ad7949.c
@@ -85,6 +85,7 @@  static int ad7949_spi_read_channel(struct ad7949_adc_chip *ad7949_adc, int *val,
 				   unsigned int channel)
 {
 	int ret;
+	int i;
 	int bits_per_word = ad7949_adc->resolution;
 	int mask = GENMASK(ad7949_adc->resolution, 0);
 	struct spi_message msg;
@@ -97,12 +98,19 @@  static int ad7949_spi_read_channel(struct ad7949_adc_chip *ad7949_adc, int *val,
 		},
 	};
 
-	ret = ad7949_spi_write_cfg(ad7949_adc,
-				   channel << AD7949_OFFSET_CHANNEL_SEL,
-				   AD7949_MASK_CHANNEL_SEL);
-	if (ret)
-		return ret;
+	/*
+	 * 1: write CFG for sample 'n' and read garbage (sample n-2)
+	 * 2: write something and read garbage (sample n-1)
+	 */
+	for (i = 0; i < 2; i++) {
+		ret = ad7949_spi_write_cfg(ad7949_adc,
+					   channel << AD7949_OFFSET_CHANNEL_SEL,
+					   AD7949_MASK_CHANNEL_SEL);
+		if (ret)
+			return ret;
+	}
 
+	/* 3: write something and read data for sample 'n' */
 	ad7949_adc->buffer = 0;
 	spi_message_init_with_transfers(&msg, tx, 1);
 	ret = spi_sync(ad7949_adc->spi, &msg);