diff mbox series

[RFC,v2,12/12] gpiolib: Add fast processing path to bitmap API functions

Message ID 20180806222918.12644-13-jmkrzyszt@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC,v2,01/12] mtd: rawnand: ams-delta: Assign mtd->dev.parent, not mtd->owner | expand

Commit Message

Janusz Krzysztofik Aug. 6, 2018, 10:29 p.m. UTC
Certain GPIO descriptor arrays returned by gpio_get_array() may contain
information on a single GPIO chip driving array member pins in hardware
order.  In such cases, bitmaps of values can be passed directly to the
chip callback functions without wasting time on iterations.

Add respective code to gpiod_get/set_array_bitmap_complex() functions.

Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
---
 Documentation/driver-api/gpio/consumer.rst |  6 ++++++
 drivers/gpio/gpiolib.c                     | 14 ++++++++++++++
 2 files changed, 20 insertions(+)

Comments

Linus Walleij Aug. 6, 2018, 11:43 p.m. UTC | #1
On Tue, Aug 7, 2018 at 12:29 AM Janusz Krzysztofik <jmkrzyszt@gmail.com> wrote:

Hi Janusz!

> Certain GPIO descriptor arrays returned by gpio_get_array() may contain
> information on a single GPIO chip driving array member pins in hardware
> order.  In such cases, bitmaps of values can be passed directly to the
> chip callback functions without wasting time on iterations.
>
> Add respective code to gpiod_get/set_array_bitmap_complex() functions.
>
> Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>

I think it would be disappointing to leave all the users of the old
array API without the performance improvement. I think we need to
deal with this in a way such that everyone can benefit from it.

Also it is kludgy if users (consumers) would need to handle the case
where all lines are on the same chip separately, through the bitmap
API.

What we need is an API that:

- All drivers handling arrays can use (including current users).

- Enables speed-up if the lines are all on the same chip/register.

- Doesn't require consumers to know if they are all on the same
  chip or not.

This means a deep API with a small surface.

How do we achieve this the best way?

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Janusz Krzysztofik Aug. 7, 2018, 5:29 p.m. UTC | #2
On Tuesday, August 7, 2018 1:43:56 AM CEST Linus Walleij wrote:
> On Tue, Aug 7, 2018 at 12:29 AM Janusz Krzysztofik <jmkrzyszt@gmail.com> 
wrote:
> 
> Hi Janusz!
> 
> > Certain GPIO descriptor arrays returned by gpio_get_array() may contain
> > information on a single GPIO chip driving array member pins in hardware
> > order.  In such cases, bitmaps of values can be passed directly to the
> > chip callback functions without wasting time on iterations.
> >
> > Add respective code to gpiod_get/set_array_bitmap_complex() functions.
> >
> > Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
> 
> I think it would be disappointing to leave all the users of the old
> array API without the performance improvement. I think we need to
> deal with this in a way such that everyone can benefit from it.

There are a few issues to be resolved:

1) array size limited by bitmap size:
    - are we ready to limit array size to a single bitmap for all users?
    - if not, how can we pass a bitmap of an arbitrary size?
    - if as an array of bitmaps, is that still clear enough and easy to use?
    - other ideas?

2) arbitrary array support:
     - are we ready to drop that?
     - if not, do we agree to require all users to pack their arbitrary arrays 
        inside the gpio_descs structure?

Maybe more.

> Also it is kludgy if users (consumers) would need to handle the case
> where all lines are on the same chip separately, through the bitmap
> API.

Not true as long as array size fits (arbitrary arrays can be packed by users), 
but I see your point.

> What we need is an API that:
> 
> - All drivers handling arrays can use (including current users).
> 
> - Enables speed-up if the lines are all on the same chip/register.
> 
> - Doesn't require consumers to know if they are all on the same
>   chip or not.
> 
> This means a deep API with a small surface.
> 
> How do we achieve this the best way?

I think widely accepted solutions to those two issues I've mentioned above can 
give the answer.

Thanks,
Janusz


--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boris Brezillon Aug. 7, 2018, 5:47 p.m. UTC | #3
Hi Janusz,

On Tue, 07 Aug 2018 19:29:53 +0200
Janusz Krzysztofik <jmkrzyszt@gmail.com> wrote:

> On Tuesday, August 7, 2018 1:43:56 AM CEST Linus Walleij wrote:
> > On Tue, Aug 7, 2018 at 12:29 AM Janusz Krzysztofik <jmkrzyszt@gmail.com>   
> wrote:
> > 
> > Hi Janusz!
> >   
> > > Certain GPIO descriptor arrays returned by gpio_get_array() may contain
> > > information on a single GPIO chip driving array member pins in hardware
> > > order.  In such cases, bitmaps of values can be passed directly to the
> > > chip callback functions without wasting time on iterations.
> > >
> > > Add respective code to gpiod_get/set_array_bitmap_complex() functions.
> > >
> > > Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>  
> > 
> > I think it would be disappointing to leave all the users of the old
> > array API without the performance improvement. I think we need to
> > deal with this in a way such that everyone can benefit from it.  

I agree with Linus on that one. When I initially proposed the gpio
bitbanging API I had something more advanced in mind where the GPIO
framework would be responsible for toggling the GPIOs on its own when
it's given an array of bytes to transmit (this way you avoid going
back and forth between the GPIO user and the GPIO framework). But this
approach would clearly be more invasive than what you propose
here (turning the int array into a bitmap and optimizing). So, if we go
for the "int array -> bitmap" approach I think all users should be
converted so that we end up with a single API.

> 
> There are a few issues to be resolved:
> 
> 1) array size limited by bitmap size:
>     - are we ready to limit array size to a single bitmap for all users?
>     - if not, how can we pass a bitmap of an arbitrary size?
>     - if as an array of bitmaps, is that still clear enough and easy to use?
>     - other ideas?

What we call a bitmap is an array of unsigned longs each entry
containing NBITS_PER_LONG bits, so yes, it's an arbitrary size (see the
bitmap API here [1]).

> 
> 2) arbitrary array support:
>      - are we ready to drop that?
>      - if not, do we agree to require all users to pack their arbitrary arrays 
>         inside the gpio_descs structure?

I could only find one user, and it's the core itself (for the ioctl),
so that shouldn't be too hard to convert all users. Did you find more.

> 
> Maybe more.
> 
> > Also it is kludgy if users (consumers) would need to handle the case
> > where all lines are on the same chip separately, through the bitmap
> > API.  
> 
> Not true as long as array size fits (arbitrary arrays can be packed by users), 
> but I see your point.

I think the API should be the same and the framework should decide to
take the fast path if all gpios belong to the same chip (which AFAICT
is already the case, except it's putting the result in an int array
instead of a bitmap)

> 
> > What we need is an API that:
> > 
> > - All drivers handling arrays can use (including current users).
> > 
> > - Enables speed-up if the lines are all on the same chip/register.
> > 
> > - Doesn't require consumers to know if they are all on the same
> >   chip or not.
> > 
> > This means a deep API with a small surface.
> > 
> > How do we achieve this the best way?  
> 
> I think widely accepted solutions to those two issues I've mentioned above can 
> give the answer.

I'd still like to see how far we are from the initial perfs (the one
poking the GPIO regs directly) with this approach, and what's the
improvement compared to the int array solution we already have in place.

Regards,

Boris

[1]https://elixir.bootlin.com/linux/v4.18-rc8/source/include/linux/bitmap.h
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Walleij Aug. 10, 2018, 10:55 a.m. UTC | #4
On Tue, Aug 7, 2018 at 7:47 PM Boris Brezillon
<boris.brezillon@bootlin.com> wrote:
> Janusz Krzysztofik <jmkrzyszt@gmail.com> wrote:
>
> > On Tuesday, August 7, 2018 1:43:56 AM CEST Linus Walleij wrote:
> > > On Tue, Aug 7, 2018 at 12:29 AM Janusz Krzysztofik <jmkrzyszt@gmail.com>
> > wrote:
> > >
> > > Hi Janusz!
> > >
> > > > Certain GPIO descriptor arrays returned by gpio_get_array() may contain
> > > > information on a single GPIO chip driving array member pins in hardware
> > > > order.  In such cases, bitmaps of values can be passed directly to the
> > > > chip callback functions without wasting time on iterations.
> > > >
> > > > Add respective code to gpiod_get/set_array_bitmap_complex() functions.
> > > >
> > > > Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
> > >
> > > I think it would be disappointing to leave all the users of the old
> > > array API without the performance improvement. I think we need to
> > > deal with this in a way such that everyone can benefit from it.
>
> I agree with Linus on that one. When I initially proposed the gpio
> bitbanging API I had something more advanced in mind where the GPIO
> framework would be responsible for toggling the GPIOs on its own when
> it's given an array of bytes to transmit (this way you avoid going
> back and forth between the GPIO user and the GPIO framework). But this
> approach would clearly be more invasive than what you propose
> here (turning the int array into a bitmap and optimizing). So, if we go
> for the "int array -> bitmap" approach I think all users should be
> converted so that we end up with a single API.

I thought about this the recent days and something must have gone
wrong in the development process of the array API because this
was the (mine atleast) intention all the time.

If we look at the GPIOchip driver API it looks like this:

        int                     (*get_multiple)(struct gpio_chip *chip,
                                                unsigned long *mask,
                                                unsigned long *bits);
        void                    (*set_multiple)(struct gpio_chip *chip,
                                                unsigned long *mask,
                                                unsigned long *bits);

So there is nothing hindering the drivers from optimizing a call
here into a single register write, which is what e.g. the gpio-mmio.c
driver does: if the hardware has a dedicated register for clearing
and setting lines, it will simply just write the register with
whatever is passed in, also cache the current value so it doesn't
need to read back the register every time.

When an array comes down to gpiod_set_array_value_complex()
it loops over the descriptors in order to handle e.g. open drain
settings separately. Then the remainder (lines that should just
be set 1/0) is pushed to the .set_multiple() callback if they
are on the same chip.

This is assuming:

1. The CPU is not the bottleneck so we can do a bit
  of complex looping over structs etc in each write.

2. We want to perform as much in a single register write
  as possible to avoid I/O and glitches as all lines (e.g.
  clock and data) get written at the same time, if possible.
  (No skew.)

It seems Janusz has problems with assumption (1) and therefore
is trying to optimize the read/write path. This can be done if all
descs are on the same chip and none of them is using open drain
or open source.

To keep track of "quick path" the array needs to have a state.
So a magic "cookie" or something like that needs to be passed
to the array API.

I would suggest that struct gpiod_descs contain a magic cookie
returned from [devm_]gpiod_get_array[_optional]()  that can be
passed along to get/set array operations or left as NULL to just
fall back to the default:

void gpiod_set_array_value(unsigned int array_size,
                           struct gpio_desc **desc_array, int *value_array,
                           struct gpiod_array_cookie *cookie);

Cookie is just a dummy name, I don't know what makes most
sense. It reflects a state for the entire array.

If this cookie exist in some struct gpio_chip state variable, it
informs gpiolib that this array:

- Has all descriptors in the same gpiochip
- Has no open drain or open source-flagged lines

It can thenbypass the complex check and just write the values
directly by calling down into .set_multiple().

Maybe this cookie could just be a bool that is true when the above
is fulfilled. But it's best if that is hidden from the consumers
I guess, they shouldn't try to half-guess if the criteria is true,
gpiolib should do that.

The current users would have to be augmented to store the
cookie and pass it along when getting/setting arrays, but it would
be pretty simple and straight-forward compared to adding a new
API.

Yours,
Linus Walleij
diff mbox series

Patch

diff --git a/Documentation/driver-api/gpio/consumer.rst b/Documentation/driver-api/gpio/consumer.rst
index bec4eab3b87c..b82f134dc352 100644
--- a/Documentation/driver-api/gpio/consumer.rst
+++ b/Documentation/driver-api/gpio/consumer.rst
@@ -409,6 +409,12 @@  descriptor arrays, only those of type struct gpio_descs returned by
 gpiod_get_array() and its variants.  Supported array size is limited to the size
 of the bitmap, i.e., sizeof(unsigned long).
 
+If the .chip member of the array structure, filled in by gpiod_get_array() in
+certain circumstances, contains a valid GPIO chip descriptor, the raw variants
+of the functions can take fast processing paths, passing bitmap arguments
+directly to the chip callback functions.  That allows for utilization of GPIO
+banks as data I/O ports without much loss of performance.
+
 
 GPIOs mapped to IRQs
 --------------------
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 5b541364dee0..bf95f2964bc5 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2846,6 +2846,12 @@  int gpiod_get_array_bitmap_complex(bool raw, bool can_sleep,
 	if (array->ndescs > sizeof(*bits))
 		return -EINVAL;
 
+	if (raw && !IS_ERR_OR_NULL(array->chip)) {
+		unsigned long mask = (1ULL << array->ndescs) - 1;
+
+		return gpio_chip_get_multiple(array->chip, &mask, bits);
+	}
+
 	i = gpiod_get_array_value_complex(raw, can_sleep, array->ndescs,
 					  array->desc, value_array);
 	if (i)
@@ -3156,6 +3162,14 @@  int gpiod_set_array_bitmap_complex(bool raw, bool can_sleep,
 	if (array->ndescs > sizeof(*bits))
 		return -EINVAL;
 
+	if (raw && !IS_ERR_OR_NULL(array->chip)) {
+		unsigned long mask = (1ULL << array->ndescs) - 1;
+
+		gpio_chip_set_multiple(array->chip, &mask, bits);
+
+		return 0;
+	}
+
 	for (i = 0; i < array->ndescs; i++)
 		value_array[i] = test_bit(i, bits);