iio_compute_scan_bytes does not seem to account for alignment if first channel uses more storagebits than its successors
diff mbox series

Message ID ff5a3ea4-4d15-5be3-9cb8-9fd7c716e2e6@plating.de
State New
Headers show
Series
  • iio_compute_scan_bytes does not seem to account for alignment if first channel uses more storagebits than its successors
Related show

Commit Message

Lars Möllendorf Nov. 29, 2019, 2:30 p.m. UTC
Hi,

I have written a custom kernel module implementing the IIO device API
backed by an IIO triggered buffer.

My IIO device provides 3 channels + timestamp. The sizes of the channels are

index  | iio_chan_spec.scan_type.storagebits
-------|------------------------------------------------
   0   |  32
   1   |  16
   2   |  16

If I select channel 0 (32bit) and one of channel 1 or 2 (16bit)
indio_dev.scan_bytes and iio_buffer.bytes_per_datum have a value of 6
Byte which does not account for any alignment.


After having a closer look at  `iio_compute_scan_bytes` which is
responsible for calculating both, `indio_dev.scan_bytes` and
`iio_buffer.bytes_per_datum` it seems to me that the order of channels
matter:

```c
	/* How much space will the demuxed element take? */
	for_each_set_bit(i, mask,
			 indio_dev->masklength) {
		length = iio_storage_bytes_for_si(indio_dev, i);
		bytes = ALIGN(bytes, length);
		bytes += length;
	}
```

I understand that in case the length of each scan element is smaller
then the length of the successive scan elements, this algorithm works
because it aligns the current element to its own length. But if, as in
my case, the length of channel 0's scan elements  is greater then the
size of the samples of the consecutive channels no alignment seems to be
taken into account. Do I miss something here?

To make the algorithm work for any case the greatest length of all
enabled scan elements could be used for alignment, e.g.:

```diff
```

But in my case the latter would bloat the buffer from 16 Byte to 4*16 =
64 Byte per scan if all channels are selected and timestamp is active.

For now, I will work around this by using 32 storagebits for all my
channels. This gives my 4 Bytes of overhead per scan if all elements are
selected and additional 2 Byte if timestamp is active.

In "Why do you align the buffer pointer to a multiple of the size of the
current scan element in iio_buffer_foreach_sample()?" on
https://github.com/analogdevicesinc/libiio/issues/324 I have been
pointed to this mailing list.

Regards,
Lars M.

Comments

Lars-Peter Clausen Nov. 29, 2019, 5:23 p.m. UTC | #1
On 11/29/19 3:30 PM, Lars Möllendorf wrote:
> Hi,
> 
> I have written a custom kernel module implementing the IIO device API
> backed by an IIO triggered buffer.
> 
> My IIO device provides 3 channels + timestamp. The sizes of the channels are
> 
> index  | iio_chan_spec.scan_type.storagebits
> -------|------------------------------------------------
>    0   |  32
>    1   |  16
>    2   |  16
> 
> If I select channel 0 (32bit) and one of channel 1 or 2 (16bit)
> indio_dev.scan_bytes and iio_buffer.bytes_per_datum have a value of 6
> Byte which does not account for any alignment.
> 
> 
> After having a closer look at  `iio_compute_scan_bytes` which is
> responsible for calculating both, `indio_dev.scan_bytes` and
> `iio_buffer.bytes_per_datum` it seems to me that the order of channels
> matter:
> 
> ```c
> 	/* How much space will the demuxed element take? */
> 	for_each_set_bit(i, mask,
> 			 indio_dev->masklength) {
> 		length = iio_storage_bytes_for_si(indio_dev, i);
> 		bytes = ALIGN(bytes, length);
> 		bytes += length;
> 	}
> ```
> 
> I understand that in case the length of each scan element is smaller
> then the length of the successive scan elements, this algorithm works
> because it aligns the current element to its own length. But if, as in
> my case, the length of channel 0's scan elements  is greater then the
> size of the samples of the consecutive channels no alignment seems to be
> taken into account. Do I miss something here?
[...]
> But in my case the latter would bloat the buffer from 16 Byte to 4*16 =
> 64 Byte per scan if all channels are selected and timestamp is active.
> 
> For now, I will work around this by using 32 storagebits for all my
> channels. This gives my 4 Bytes of overhead per scan if all elements are
> selected and additional 2 Byte if timestamp is active.
> 
> In "Why do you align the buffer pointer to a multiple of the size of the
> current scan element in iio_buffer_foreach_sample()?" on
> https://github.com/analogdevicesinc/libiio/issues/324 I have been
> pointed to this mailing list.

Hi Lars,

The way this is supposed to work is that each element is aligned to its
own natural alignment. What seems to be missing at the moment is that
the total length is also aligned to the size of the first element, so
that alignment is preserved for multiple consecutive samples. I feel
like we had that at some point, but maybe I'm misremembering.

E.g. putting something like

 unsigned int first_index = find_first_bit(mask, indio_dev->masklength);
 length = iio_storage_bytes_for_si(indio_dev, first_index);
 bytes = ALIGN(bytes, length);

below the loop should do the trick I believe.

- Lars
Jonathan Cameron Dec. 1, 2019, 12:03 p.m. UTC | #2
On Fri, 29 Nov 2019 18:23:37 +0100
Lars-Peter Clausen <lars@metafoo.de> wrote:

> On 11/29/19 3:30 PM, Lars Möllendorf wrote:
> > Hi,
> > 
> > I have written a custom kernel module implementing the IIO device API
> > backed by an IIO triggered buffer.
> > 
> > My IIO device provides 3 channels + timestamp. The sizes of the channels are
> > 
> > index  | iio_chan_spec.scan_type.storagebits
> > -------|------------------------------------------------
> >    0   |  32
> >    1   |  16
> >    2   |  16
> > 
> > If I select channel 0 (32bit) and one of channel 1 or 2 (16bit)
> > indio_dev.scan_bytes and iio_buffer.bytes_per_datum have a value of 6
> > Byte which does not account for any alignment.
> > 
> > 
> > After having a closer look at  `iio_compute_scan_bytes` which is
> > responsible for calculating both, `indio_dev.scan_bytes` and
> > `iio_buffer.bytes_per_datum` it seems to me that the order of channels
> > matter:
> > 
> > ```c
> > 	/* How much space will the demuxed element take? */
> > 	for_each_set_bit(i, mask,
> > 			 indio_dev->masklength) {
> > 		length = iio_storage_bytes_for_si(indio_dev, i);
> > 		bytes = ALIGN(bytes, length);
> > 		bytes += length;
> > 	}
> > ```
> > 
> > I understand that in case the length of each scan element is smaller
> > then the length of the successive scan elements, this algorithm works
> > because it aligns the current element to its own length. But if, as in
> > my case, the length of channel 0's scan elements  is greater then the
> > size of the samples of the consecutive channels no alignment seems to be
> > taken into account. Do I miss something here?  
> [...]
> > But in my case the latter would bloat the buffer from 16 Byte to 4*16 =
> > 64 Byte per scan if all channels are selected and timestamp is active.
> > 
> > For now, I will work around this by using 32 storagebits for all my
> > channels. This gives my 4 Bytes of overhead per scan if all elements are
> > selected and additional 2 Byte if timestamp is active.
> > 
> > In "Why do you align the buffer pointer to a multiple of the size of the
> > current scan element in iio_buffer_foreach_sample()?" on
> > https://github.com/analogdevicesinc/libiio/issues/324 I have been
> > pointed to this mailing list.  
> 
> Hi Lars,
> 
> The way this is supposed to work is that each element is aligned to its
> own natural alignment. What seems to be missing at the moment is that
> the total length is also aligned to the size of the first element, so
> that alignment is preserved for multiple consecutive samples. I feel
> like we had that at some point, but maybe I'm misremembering.
> 
> E.g. putting something like
> 
>  unsigned int first_index = find_first_bit(mask, indio_dev->masklength);
>  length = iio_storage_bytes_for_si(indio_dev, first_index);
>  bytes = ALIGN(bytes, length);
> 
> below the loop should do the trick I believe.

Good find by the way.  Not sure how we never hit this before as there
are definitely sensors out there with similar mixes to you have.
Maybe they are always used with the timestamp enabled?  Anyhow, no
matter it's clearly wrong.

Lars trick doesn't work either (I think) as we can have

u16
u16 (might be padding or real channel)
u32
u16

which should be 4 byte aligned.

I think we need to keep track of the largest element present during
the loop and use that length afterwards to pad out the end, but only
the end rather than every element.  In theory we can also have
larger elements than the timestamp, so should not do special handling
for that, it's just another element.

I think that ends up effectively a combination of the two suggestions?

Jonathan

> 
> - Lars
>
Jonathan Cameron Dec. 1, 2019, 12:10 p.m. UTC | #3
On Fri, 29 Nov 2019 18:23:37 +0100
Lars-Peter Clausen <lars@metafoo.de> wrote:

> On 11/29/19 3:30 PM, Lars Möllendorf wrote:
> > Hi,
> > 
> > I have written a custom kernel module implementing the IIO device API
> > backed by an IIO triggered buffer.
> > 
> > My IIO device provides 3 channels + timestamp. The sizes of the channels are
> > 
> > index  | iio_chan_spec.scan_type.storagebits
> > -------|------------------------------------------------
> >    0   |  32
> >    1   |  16
> >    2   |  16
> > 
> > If I select channel 0 (32bit) and one of channel 1 or 2 (16bit)
> > indio_dev.scan_bytes and iio_buffer.bytes_per_datum have a value of 6
> > Byte which does not account for any alignment.
> > 
> > 
> > After having a closer look at  `iio_compute_scan_bytes` which is
> > responsible for calculating both, `indio_dev.scan_bytes` and
> > `iio_buffer.bytes_per_datum` it seems to me that the order of channels
> > matter:
> > 
> > ```c
> > 	/* How much space will the demuxed element take? */
> > 	for_each_set_bit(i, mask,
> > 			 indio_dev->masklength) {
> > 		length = iio_storage_bytes_for_si(indio_dev, i);
> > 		bytes = ALIGN(bytes, length);
> > 		bytes += length;
> > 	}
> > ```
> > 
> > I understand that in case the length of each scan element is smaller
> > then the length of the successive scan elements, this algorithm works
> > because it aligns the current element to its own length. But if, as in
> > my case, the length of channel 0's scan elements  is greater then the
> > size of the samples of the consecutive channels no alignment seems to be
> > taken into account. Do I miss something here?  
> [...]
> > But in my case the latter would bloat the buffer from 16 Byte to 4*16 =
> > 64 Byte per scan if all channels are selected and timestamp is active.
> > 
> > For now, I will work around this by using 32 storagebits for all my
> > channels. This gives my 4 Bytes of overhead per scan if all elements are
> > selected and additional 2 Byte if timestamp is active.
> > 
> > In "Why do you align the buffer pointer to a multiple of the size of the
> > current scan element in iio_buffer_foreach_sample()?" on
> > https://github.com/analogdevicesinc/libiio/issues/324 I have been
> > pointed to this mailing list.  
> 
> Hi Lars,
I managed to crash my email client whilst sending a first reply to this
so no idea if anyone got it!
> 
> The way this is supposed to work is that each element is aligned to its
> own natural alignment. What seems to be missing at the moment is that
> the total length is also aligned to the size of the first element, so
> that alignment is preserved for multiple consecutive samples. I feel
> like we had that at some point, but maybe I'm misremembering.

It's more than possible we broke this in a cleanup at some stage
and a bit surprising it hasn't caused problems before as we definitely
have other sensors with similar channel patterns. I guess maybe they
always have the timestamp enabled.

Good spot by the way.

> 
> E.g. putting something like
> 
>  unsigned int first_index = find_first_bit(mask, indio_dev->masklength);
>  length = iio_storage_bytes_for_si(indio_dev, first_index);
>  bytes = ALIGN(bytes, length);
> 
> below the loop should do the trick I believe.
It would work for Lars case, but I think a combination of both approaches
is needed to handle cases like

u16
u16 (may be padding)
u32
u16

We need to realign to the largest sized element whereever it occurs in the
channels list.  So find that whilst computing the individual alignments,
and apply it only once at the end.

Thanks,

Jonathan

> 
> - Lars
>
Lars-Peter Clausen Dec. 1, 2019, 12:29 p.m. UTC | #4
On 12/1/19 1:10 PM, Jonathan Cameron wrote:
>>
>> E.g. putting something like
>>
>>  unsigned int first_index = find_first_bit(mask, indio_dev->masklength);
>>  length = iio_storage_bytes_for_si(indio_dev, first_index);
>>  bytes = ALIGN(bytes, length);
>>
>> below the loop should do the trick I believe.
> It would work for Lars case, but I think a combination of both approaches
> is needed to handle cases like
> 
> u16
> u16 (may be padding)
> u32
> u16
> 
> We need to realign to the largest sized element whereever it occurs in the
> channels list.  So find that whilst computing the individual alignments,
> and apply it only once at the end.
> 

Ah, yes. We need the same rules as for the alignment or padding of a
struct, which mean align the size to size of the largest element.

Patch
diff mbox series

diff --git a/drivers/iio/industrialio-buffer.c
b/drivers/iio/industrialio-buffer.c
index 5d05c38c4ba9..3d2c4e26d818 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -546,16 +546,18 @@  static int iio_compute_scan_bytes(struct iio_dev
*indio_dev,
 				const unsigned long *mask, bool timestamp)
 {
 	unsigned bytes = 0;
-	int length, i;
+	int length, i, align = 0, count = 0;

 	/* How much space will the demuxed element take? */
 	for_each_set_bit(i, mask,
 			 indio_dev->masklength) {
 		length = iio_storage_bytes_for_si(indio_dev, i);
-		bytes = ALIGN(bytes, length);
-		bytes += length;
+		align = max(align, length);
+		count++;
 	}

+	bytes = count * align;
+
 	if (timestamp) {
 		length = iio_storage_bytes_for_timestamp(indio_dev);
 		bytes = ALIGN(bytes, length);

```

And if the storage bytes for timestamp have to be taken into account for
the alignment of the scan elements (what seems not to be the case as it
is currently implemented), then:

```diff
diff --git a/drivers/iio/industrialio-buffer.c
b/drivers/iio/industrialio-buffer.c
index 5d05c38c4ba9..59aee2ea4e19 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -546,21 +546,23 @@  static int iio_compute_scan_bytes(struct iio_dev
*indio_dev,
 				const unsigned long *mask, bool timestamp)
 {
 	unsigned bytes = 0;
-	int length, i;
+	int length, i, align = 0, count = 0;
+
+	if (timestamp) {
+		align = iio_storage_bytes_for_timestamp(indio_dev);
+		count++;
+	}

 	/* How much space will the demuxed element take? */
 	for_each_set_bit(i, mask,
 			 indio_dev->masklength) {
 		length = iio_storage_bytes_for_si(indio_dev, i);
-		bytes = ALIGN(bytes, length);
-		bytes += length;
+		align = max(align, length);
+		count++;
 	}

-	if (timestamp) {
-		length = iio_storage_bytes_for_timestamp(indio_dev);
-		bytes = ALIGN(bytes, length);
-		bytes += length;
-	}
+	bytes = count * align;
+
 	return bytes;
 }