diff mbox series

[v2,5/7] iio: inkern: copy/release available info from producer

Message ID 20241007-iio-read-avail-release-v2-5-245002d5869e@gmail.com (mailing list archive)
State Changes Requested
Headers show
Series iio: fix possible race condition during access of available info lists | expand

Commit Message

Matteo Martelli Oct. 7, 2024, 8:37 a.m. UTC
Consumers need to call the read_avail_release_resource after reading the
available info. To call the release with info_exists locked, copy the
available info from the producer and immediately call its release
callback. With this change, users of iio_read_avail_channel_raw() and
iio_read_avail_channel_attribute() must free the copied avail info after
calling them.

Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com>
---
 drivers/iio/inkern.c         | 64 +++++++++++++++++++++++++++++++++-----------
 include/linux/iio/consumer.h |  4 +--
 2 files changed, 50 insertions(+), 18 deletions(-)

Comments

Nuno Sá Oct. 7, 2024, 3:15 p.m. UTC | #1
On Mon, 2024-10-07 at 10:37 +0200, Matteo Martelli wrote:
> Consumers need to call the read_avail_release_resource after reading the
> available info. To call the release with info_exists locked, copy the
> available info from the producer and immediately call its release
> callback. With this change, users of iio_read_avail_channel_raw() and
> iio_read_avail_channel_attribute() must free the copied avail info after
> calling them.
> 
> Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com>
> ---
>  drivers/iio/inkern.c         | 64 +++++++++++++++++++++++++++++++++-----------
>  include/linux/iio/consumer.h |  4 +--
>  2 files changed, 50 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/iio/inkern.c b/drivers/iio/inkern.c
> index
> 7f325b3ed08fae6674245312cf8f57bb151006c0..cc65ef79451e5aa2cea447e168007a447ffc0d91
> 100644
> --- a/drivers/iio/inkern.c
> +++ b/drivers/iio/inkern.c
> @@ -760,9 +760,25 @@ static int iio_channel_read_avail(struct iio_channel *chan,
>  	if (!iio_channel_has_available(chan->channel, info))
>  		return -EINVAL;
>  
> -	if (iio_info->read_avail)
> -		return iio_info->read_avail(chan->indio_dev, chan->channel,
> -					    vals, type, length, info);
> +	if (iio_info->read_avail) {
> +		const int *vals_tmp;
> +		int ret;
> +
> +		ret = iio_info->read_avail(chan->indio_dev, chan->channel,
> +					   &vals_tmp, type, length, info);
> +		if (ret < 0)
> +			return ret;
> +
> +		*vals = kmemdup_array(vals_tmp, *length, sizeof(int), GFP_KERNEL);
> +		if (!*vals)
> +			return -ENOMEM;
> +

Not a big deal but I would likely prefer to avoid yet another copy. If I'm
understanding things correctly, I would rather create an inkern wrapper API like 
iio_channel_read_avail_release_resource() - maybe something with a smaller name :).
Hence, the lifetime of the data would be only controlled by the producer of it. It
would also produce a smaller diff (I think). I just find it a bit confusing that we
duplicate the data in here and the producer also duplicates it on the ->read_avail()
call. Another advantage I see is that often the available data is indeed const in
which case no kmemdup_array() is needed at all.

- Nuno Sá
Matteo Martelli Oct. 8, 2024, 6:47 a.m. UTC | #2
Quoting Nuno Sá (2024-10-07 17:15:13)
> On Mon, 2024-10-07 at 10:37 +0200, Matteo Martelli wrote:
> > Consumers need to call the read_avail_release_resource after reading the
> > available info. To call the release with info_exists locked, copy the
> > available info from the producer and immediately call its release
> > callback. With this change, users of iio_read_avail_channel_raw() and
> > iio_read_avail_channel_attribute() must free the copied avail info after
> > calling them.
> > 
> > Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com>
> > ---
> >  drivers/iio/inkern.c         | 64 +++++++++++++++++++++++++++++++++-----------
> >  include/linux/iio/consumer.h |  4 +--
> >  2 files changed, 50 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/iio/inkern.c b/drivers/iio/inkern.c
> > index
> > 7f325b3ed08fae6674245312cf8f57bb151006c0..cc65ef79451e5aa2cea447e168007a447ffc0d91
> > 100644
> > --- a/drivers/iio/inkern.c
> > +++ b/drivers/iio/inkern.c
> > @@ -760,9 +760,25 @@ static int iio_channel_read_avail(struct iio_channel *chan,
> >       if (!iio_channel_has_available(chan->channel, info))
> >               return -EINVAL;
> >  
> > -     if (iio_info->read_avail)
> > -             return iio_info->read_avail(chan->indio_dev, chan->channel,
> > -                                         vals, type, length, info);
> > +     if (iio_info->read_avail) {
> > +             const int *vals_tmp;
> > +             int ret;
> > +
> > +             ret = iio_info->read_avail(chan->indio_dev, chan->channel,
> > +                                        &vals_tmp, type, length, info);
> > +             if (ret < 0)
> > +                     return ret;
> > +
> > +             *vals = kmemdup_array(vals_tmp, *length, sizeof(int), GFP_KERNEL);
> > +             if (!*vals)
> > +                     return -ENOMEM;
> > +
> 
> Not a big deal but I would likely prefer to avoid yet another copy. If I'm
> understanding things correctly, I would rather create an inkern wrapper API like 
> iio_channel_read_avail_release_resource() - maybe something with a smaller name :).
> Hence, the lifetime of the data would be only controlled by the producer of it. It
> would also produce a smaller diff (I think). I just find it a bit confusing that we
> duplicate the data in here and the producer also duplicates it on the ->read_avail()
> call. Another advantage I see is that often the available data is indeed const in
> which case no kmemdup_array() is needed at all.


If I understand correctly your suggestion you would leave the inkern
iio_channel_read_avail() untouched, then add a new inkern wrapper, something
like iio_channel_read_avail_release_resource(), that would call the producer's
read_avail_release_resource(). The consumer would invoke this new wrapper in its
own read_avail_release_resource() avoiding the additional copy. The call stack
would look something like the following:

iio_read_channel_info_avail() {
    consumer->read_avail() {
        iio_read_avail_channel_raw() {
            iio_channel_read_avail() {
                producer->read_avail() {
                    kmemdup_array();
                }
            }
        }
    }

    iio_format_list();

    consumer->read_avail_release_resource() {
        iio_read_avail_channel_release_resource() {
            producer->read_avail_release_resource() {
                kfree();
            }
        }
    }
}


I was going with the simpler solution you described, but my concern with it was
that the info_exists_lock mutex would be unlocked between a iio_channel_read_avail()
call and its corresponding iio_channel_read_avail_release_resource() call.
To my understanding, this could potentially allow for the device to be
unregistered between the two calls and result in a memleak of the avail buffer
allocated by the producer.

However, I have been trying to reproduce a similar case by adding a delay
between the consumer->read_avail() and the
consumer->read_avail_release_resources(), and by unbinding the driver during
that delay, thus with the info_exists_lock mutex unlocked. In this case the
driver is not unregistered until the iio_read_channel_info_avail() function
completes, likely because of some other lock on the sysfs file after the call of
cdev_device_del() in iio_device_unregister().

Are there are other cases in which the device could be unregistered between the
two calls? If the info_exists_lock mutex is not necessary for this read_avail()
flow then I could switch it to the simpler solution without the additional consumer
copy, but at that point I would question why the info_exists_lock mutex is being
locked in iio_read_avail_channel_raw().

For some additional context see also my previous conversation with Jonathan on
the subject [1]. I followed Jonathan's suggestion to keep the implementation
simple by letting the consumer to always copy the producer buffer, but I could
also consider different solutions.

Regarding the release function names being too long, I totally agree and I would also
shorten the iio_info read_avail_release_resource() callback if that remains
clear: something like read_avail_release_res() or just read_avail_release()?

Link: https://lore.kernel.org/linux-iio/20240810105411.705cb225@jic23-huawei/ [1]

> 
> - Nuno Sá
> 
> 

Thanks,
Matteo Martelli
Nuno Sá Oct. 8, 2024, 7:29 a.m. UTC | #3
On Tue, 2024-10-08 at 08:47 +0200, Matteo Martelli wrote:
> Quoting Nuno Sá (2024-10-07 17:15:13)
> > On Mon, 2024-10-07 at 10:37 +0200, Matteo Martelli wrote:
> > > Consumers need to call the read_avail_release_resource after reading the
> > > available info. To call the release with info_exists locked, copy the
> > > available info from the producer and immediately call its release
> > > callback. With this change, users of iio_read_avail_channel_raw() and
> > > iio_read_avail_channel_attribute() must free the copied avail info after
> > > calling them.
> > > 
> > > Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com>
> > > ---
> > >  drivers/iio/inkern.c         | 64 +++++++++++++++++++++++++++++++++------
> > > -----
> > >  include/linux/iio/consumer.h |  4 +--
> > >  2 files changed, 50 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/drivers/iio/inkern.c b/drivers/iio/inkern.c
> > > index
> > > 7f325b3ed08fae6674245312cf8f57bb151006c0..cc65ef79451e5aa2cea447e168007a44
> > > 7ffc0d91
> > > 100644
> > > --- a/drivers/iio/inkern.c
> > > +++ b/drivers/iio/inkern.c
> > > @@ -760,9 +760,25 @@ static int iio_channel_read_avail(struct iio_channel
> > > *chan,
> > >       if (!iio_channel_has_available(chan->channel, info))
> > >               return -EINVAL;
> > >  
> > > -     if (iio_info->read_avail)
> > > -             return iio_info->read_avail(chan->indio_dev, chan->channel,
> > > -                                         vals, type, length, info);
> > > +     if (iio_info->read_avail) {
> > > +             const int *vals_tmp;
> > > +             int ret;
> > > +
> > > +             ret = iio_info->read_avail(chan->indio_dev, chan->channel,
> > > +                                        &vals_tmp, type, length, info);
> > > +             if (ret < 0)
> > > +                     return ret;
> > > +
> > > +             *vals = kmemdup_array(vals_tmp, *length, sizeof(int),
> > > GFP_KERNEL);
> > > +             if (!*vals)
> > > +                     return -ENOMEM;
> > > +
> > 
> > Not a big deal but I would likely prefer to avoid yet another copy. If I'm
> > understanding things correctly, I would rather create an inkern wrapper API
> > like 
> > iio_channel_read_avail_release_resource() - maybe something with a smaller
> > name :).
> > Hence, the lifetime of the data would be only controlled by the producer of
> > it. It
> > would also produce a smaller diff (I think). I just find it a bit confusing
> > that we
> > duplicate the data in here and the producer also duplicates it on the -
> > >read_avail()
> > call. Another advantage I see is that often the available data is indeed
> > const in
> > which case no kmemdup_array() is needed at all.
> 
> 
> If I understand correctly your suggestion you would leave the inkern
> iio_channel_read_avail() untouched, then add a new inkern wrapper, something
> like iio_channel_read_avail_release_resource(), that would call the producer's
> read_avail_release_resource(). The consumer would invoke this new wrapper in
> its
> own read_avail_release_resource() avoiding the additional copy. The call stack
> would look something like the following:
> 
> iio_read_channel_info_avail() {
>     consumer->read_avail() {
>         iio_read_avail_channel_raw() {
>             iio_channel_read_avail() {
>                 producer->read_avail() {
>                     kmemdup_array();
>                 }
>             }
>         }
>     }
> 
>     iio_format_list();
> 
>     consumer->read_avail_release_resource() {
>         iio_read_avail_channel_release_resource() {
>             producer->read_avail_release_resource() {
>                 kfree();
>             }
>         }
>     }
> }

Yeah, exactly what came to mind...

> 
> 
> I was going with the simpler solution you described, but my concern with it
> was
> that the info_exists_lock mutex would be unlocked between a
> iio_channel_read_avail()
> call and its corresponding iio_channel_read_avail_release_resource() call.
> To my understanding, this could potentially allow for the device to be
> unregistered between the two calls and result in a memleak of the avail buffer
> allocated by the producer.
> 
> However, I have been trying to reproduce a similar case by adding a delay
> between the consumer->read_avail() and the
> consumer->read_avail_release_resources(), and by unbinding the driver during
> that delay, thus with the info_exists_lock mutex unlocked. In this case the
> driver is not unregistered until the iio_read_channel_info_avail() function
> completes, likely because of some other lock on the sysfs file after the call
> of
> cdev_device_del() in iio_device_unregister().
> 

Yes, you need to have some sync point at the kernfs level otherwise we could
always be handling a sysfs attr while the device is being removed under our
feet. But I'm not sure what you're trying to do... IIUC, the problem might come
if have:

consumer->read_avail_channel_attribute()
	producer->info_lock()
	producer->read_avail()
		producer->kmalloc()

...
// producer unbound
...
consumer->read_avail_release()
	return -ENODEV;

// producer->kmalloc() never get's freed...

The above is your problem right? And I think it should be a valid one since
between ->read_avail_channel_attribute() and read_avail_release() there's
nothing preventing the producer from being unregistered...

If I'm not missing nothing one solution would be for the producer to do
devm_kmalloc() and devm_kfree() on read_avail() and release_resources() but at
that point I'm not sure it's better than what you have since it's odd enough for
being missed in reviews...

Anyways, I'm fine with this approach but then I would likely have a comment on
this extra allocation explaining what is being protected with it as it's not
straight to realize the subtle race with the producer being gone between calls.

> Are there are other cases in which the device could be unregistered between
> the
> two calls? If the info_exists_lock mutex is not necessary for this
> read_avail()
> flow then I could switch it to the simpler solution without the additional
> consumer
> copy, but at that point I would question why the info_exists_lock mutex is
> being
> locked in iio_read_avail_channel_raw().
> 
> For some additional context see also my previous conversation with Jonathan on
> the subject [1]. I followed Jonathan's suggestion to keep the implementation
> simple by letting the consumer to always copy the producer buffer, but I could
> also consider different solutions.
> 
> Regarding the release function names being too long, I totally agree and I
> would also
> shorten the iio_info read_avail_release_resource() callback if that remains
> clear: something like read_avail_release_res() or just read_avail_release()?
> 
> Link:
> https://lore.kernel.org/linux-iio/20240810105411.705cb225@jic23-huawei/ [1]
> 

Yups, I should have checked v1...

- Nuno Sá
>
Matteo Martelli Oct. 8, 2024, 8:03 a.m. UTC | #4
Quoting Nuno Sá (2024-10-08 09:29:14)
> On Tue, 2024-10-08 at 08:47 +0200, Matteo Martelli wrote:
> > Quoting Nuno Sá (2024-10-07 17:15:13)
> > > On Mon, 2024-10-07 at 10:37 +0200, Matteo Martelli wrote:
> > > > Consumers need to call the read_avail_release_resource after reading the
> > > > available info. To call the release with info_exists locked, copy the
> > > > available info from the producer and immediately call its release
> > > > callback. With this change, users of iio_read_avail_channel_raw() and
> > > > iio_read_avail_channel_attribute() must free the copied avail info after
> > > > calling them.
> > > > 
> > > > Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com>
> > > > ---
> > > >  drivers/iio/inkern.c         | 64 +++++++++++++++++++++++++++++++++------
> > > > -----
> > > >  include/linux/iio/consumer.h |  4 +--
> > > >  2 files changed, 50 insertions(+), 18 deletions(-)
> > > > 
> > > > diff --git a/drivers/iio/inkern.c b/drivers/iio/inkern.c
> > > > index
> > > > 7f325b3ed08fae6674245312cf8f57bb151006c0..cc65ef79451e5aa2cea447e168007a44
> > > > 7ffc0d91
> > > > 100644
> > > > --- a/drivers/iio/inkern.c
> > > > +++ b/drivers/iio/inkern.c
> > > > @@ -760,9 +760,25 @@ static int iio_channel_read_avail(struct iio_channel
> > > > *chan,
> > > >       if (!iio_channel_has_available(chan->channel, info))
> > > >               return -EINVAL;
> > > >  
> > > > -     if (iio_info->read_avail)
> > > > -             return iio_info->read_avail(chan->indio_dev, chan->channel,
> > > > -                                         vals, type, length, info);
> > > > +     if (iio_info->read_avail) {
> > > > +             const int *vals_tmp;
> > > > +             int ret;
> > > > +
> > > > +             ret = iio_info->read_avail(chan->indio_dev, chan->channel,
> > > > +                                        &vals_tmp, type, length, info);
> > > > +             if (ret < 0)
> > > > +                     return ret;
> > > > +
> > > > +             *vals = kmemdup_array(vals_tmp, *length, sizeof(int),
> > > > GFP_KERNEL);
> > > > +             if (!*vals)
> > > > +                     return -ENOMEM;
> > > > +
> > > 
> > > Not a big deal but I would likely prefer to avoid yet another copy. If I'm
> > > understanding things correctly, I would rather create an inkern wrapper API
> > > like 
> > > iio_channel_read_avail_release_resource() - maybe something with a smaller
> > > name :).
> > > Hence, the lifetime of the data would be only controlled by the producer of
> > > it. It
> > > would also produce a smaller diff (I think). I just find it a bit confusing
> > > that we
> > > duplicate the data in here and the producer also duplicates it on the -
> > > >read_avail()
> > > call. Another advantage I see is that often the available data is indeed
> > > const in
> > > which case no kmemdup_array() is needed at all.
> > 
> > 
> > If I understand correctly your suggestion you would leave the inkern
> > iio_channel_read_avail() untouched, then add a new inkern wrapper, something
> > like iio_channel_read_avail_release_resource(), that would call the producer's
> > read_avail_release_resource(). The consumer would invoke this new wrapper in
> > its
> > own read_avail_release_resource() avoiding the additional copy. The call stack
> > would look something like the following:
> > 
> > iio_read_channel_info_avail() {
> >     consumer->read_avail() {
> >         iio_read_avail_channel_raw() {
> >             iio_channel_read_avail() {
> >                 producer->read_avail() {
> >                     kmemdup_array();
> >                 }
> >             }
> >         }
> >     }
> > 
> >     iio_format_list();
> > 
> >     consumer->read_avail_release_resource() {
> >         iio_read_avail_channel_release_resource() {
> >             producer->read_avail_release_resource() {
> >                 kfree();
> >             }
> >         }
> >     }
> > }
> 
> Yeah, exactly what came to mind...
> 
> > 
> > 
> > I was going with the simpler solution you described, but my concern with it
> > was
> > that the info_exists_lock mutex would be unlocked between a
> > iio_channel_read_avail()
> > call and its corresponding iio_channel_read_avail_release_resource() call.
> > To my understanding, this could potentially allow for the device to be
> > unregistered between the two calls and result in a memleak of the avail buffer
> > allocated by the producer.
> > 
> > However, I have been trying to reproduce a similar case by adding a delay
> > between the consumer->read_avail() and the
> > consumer->read_avail_release_resources(), and by unbinding the driver during
> > that delay, thus with the info_exists_lock mutex unlocked. In this case the
> > driver is not unregistered until the iio_read_channel_info_avail() function
> > completes, likely because of some other lock on the sysfs file after the call
> > of
> > cdev_device_del() in iio_device_unregister().
> > 
> 
> Yes, you need to have some sync point at the kernfs level otherwise we could
> always be handling a sysfs attr while the device is being removed under our
> feet. But I'm not sure what you're trying to do... IIUC, the problem might come
> if have:
> 
> consumer->read_avail_channel_attribute()
>         producer->info_lock()
>         producer->read_avail()
>                 producer->kmalloc()
> 
> ...
> // producer unbound
> ...
> consumer->read_avail_release()
>         return -ENODEV;
> 
> // producer->kmalloc() never get's freed...
> 
> The above is your problem right? And I think it should be a valid one since
> between ->read_avail_channel_attribute() and read_avail_release() there's
> nothing preventing the producer from being unregistered...

Yes, that's the problem.

> 
> If I'm not missing nothing one solution would be for the producer to do
> devm_kmalloc() and devm_kfree() on read_avail() and release_resources() but at
> that point I'm not sure it's better than what you have since it's odd enough for
> being missed in reviews...

I honestly didn't think of this and it would in fact prevent the
additional copy. But I agree that it could be missed in new drivers,
maybe a comment in the iio_info read_avail_release_resource() callback
declaration would help?

> 
> Anyways, I'm fine with this approach but then I would likely have a comment on
> this extra allocation explaining what is being protected with it as it's not
> straight to realize the subtle race with the producer being gone between calls.
> 
> > Are there are other cases in which the device could be unregistered between
> > the
> > two calls? If the info_exists_lock mutex is not necessary for this
> > read_avail()
> > flow then I could switch it to the simpler solution without the additional
> > consumer
> > copy, but at that point I would question why the info_exists_lock mutex is
> > being
> > locked in iio_read_avail_channel_raw().
> > 
> > For some additional context see also my previous conversation with Jonathan on
> > the subject [1]. I followed Jonathan's suggestion to keep the implementation
> > simple by letting the consumer to always copy the producer buffer, but I could
> > also consider different solutions.
> > 
> > Regarding the release function names being too long, I totally agree and I
> > would also
> > shorten the iio_info read_avail_release_resource() callback if that remains
> > clear: something like read_avail_release_res() or just read_avail_release()?
> > 
> > Link:
> > https://lore.kernel.org/linux-iio/20240810105411.705cb225@jic23-huawei/ [1]
> > 
> 
> Yups, I should have checked v1...

Just to clarify, that link is not the v1 of this patch set but a
previous conversation during the pac1921 driver implementation.

> 
> - Nuno Sá

Thanks,
Matteo Martelli
Nuno Sá Oct. 8, 2024, 12:37 p.m. UTC | #5
On Tue, 2024-10-08 at 10:03 +0200, Matteo Martelli wrote:
> Quoting Nuno Sá (2024-10-08 09:29:14)
> > On Tue, 2024-10-08 at 08:47 +0200, Matteo Martelli wrote:
> > > Quoting Nuno Sá (2024-10-07 17:15:13)
> > > > On Mon, 2024-10-07 at 10:37 +0200, Matteo Martelli wrote:
> > > > > Consumers need to call the read_avail_release_resource after reading
> > > > > the
> > > > > available info. To call the release with info_exists locked, copy the
> > > > > available info from the producer and immediately call its release
> > > > > callback. With this change, users of iio_read_avail_channel_raw() and
> > > > > iio_read_avail_channel_attribute() must free the copied avail info
> > > > > after
> > > > > calling them.
> > > > > 
> > > > > Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com>
> > > > > ---
> > > > >  drivers/iio/inkern.c         | 64 +++++++++++++++++++++++++++++++++--
> > > > > ----
> > > > > -----
> > > > >  include/linux/iio/consumer.h |  4 +--
> > > > >  2 files changed, 50 insertions(+), 18 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/iio/inkern.c b/drivers/iio/inkern.c
> > > > > index
> > > > > 7f325b3ed08fae6674245312cf8f57bb151006c0..cc65ef79451e5aa2cea447e16800
> > > > > 7a44
> > > > > 7ffc0d91
> > > > > 100644
> > > > > --- a/drivers/iio/inkern.c
> > > > > +++ b/drivers/iio/inkern.c
> > > > > @@ -760,9 +760,25 @@ static int iio_channel_read_avail(struct
> > > > > iio_channel
> > > > > *chan,
> > > > >       if (!iio_channel_has_available(chan->channel, info))
> > > > >               return -EINVAL;
> > > > >  
> > > > > -     if (iio_info->read_avail)
> > > > > -             return iio_info->read_avail(chan->indio_dev, chan-
> > > > > >channel,
> > > > > -                                         vals, type, length, info);
> > > > > +     if (iio_info->read_avail) {
> > > > > +             const int *vals_tmp;
> > > > > +             int ret;
> > > > > +
> > > > > +             ret = iio_info->read_avail(chan->indio_dev, chan-
> > > > > >channel,
> > > > > +                                        &vals_tmp, type, length,
> > > > > info);
> > > > > +             if (ret < 0)
> > > > > +                     return ret;
> > > > > +
> > > > > +             *vals = kmemdup_array(vals_tmp, *length, sizeof(int),
> > > > > GFP_KERNEL);
> > > > > +             if (!*vals)
> > > > > +                     return -ENOMEM;
> > > > > +
> > > > 
> > > > Not a big deal but I would likely prefer to avoid yet another copy. If
> > > > I'm
> > > > understanding things correctly, I would rather create an inkern wrapper
> > > > API
> > > > like 
> > > > iio_channel_read_avail_release_resource() - maybe something with a
> > > > smaller
> > > > name :).
> > > > Hence, the lifetime of the data would be only controlled by the producer
> > > > of
> > > > it. It
> > > > would also produce a smaller diff (I think). I just find it a bit
> > > > confusing
> > > > that we
> > > > duplicate the data in here and the producer also duplicates it on the -
> > > > > read_avail()
> > > > call. Another advantage I see is that often the available data is indeed
> > > > const in
> > > > which case no kmemdup_array() is needed at all.
> > > 
> > > 
> > > If I understand correctly your suggestion you would leave the inkern
> > > iio_channel_read_avail() untouched, then add a new inkern wrapper,
> > > something
> > > like iio_channel_read_avail_release_resource(), that would call the
> > > producer's
> > > read_avail_release_resource(). The consumer would invoke this new wrapper
> > > in
> > > its
> > > own read_avail_release_resource() avoiding the additional copy. The call
> > > stack
> > > would look something like the following:
> > > 
> > > iio_read_channel_info_avail() {
> > >     consumer->read_avail() {
> > >         iio_read_avail_channel_raw() {
> > >             iio_channel_read_avail() {
> > >                 producer->read_avail() {
> > >                     kmemdup_array();
> > >                 }
> > >             }
> > >         }
> > >     }
> > > 
> > >     iio_format_list();
> > > 
> > >     consumer->read_avail_release_resource() {
> > >         iio_read_avail_channel_release_resource() {
> > >             producer->read_avail_release_resource() {
> > >                 kfree();
> > >             }
> > >         }
> > >     }
> > > }
> > 
> > Yeah, exactly what came to mind...
> > 
> > > 
> > > 
> > > I was going with the simpler solution you described, but my concern with
> > > it
> > > was
> > > that the info_exists_lock mutex would be unlocked between a
> > > iio_channel_read_avail()
> > > call and its corresponding iio_channel_read_avail_release_resource() call.
> > > To my understanding, this could potentially allow for the device to be
> > > unregistered between the two calls and result in a memleak of the avail
> > > buffer
> > > allocated by the producer.
> > > 
> > > However, I have been trying to reproduce a similar case by adding a delay
> > > between the consumer->read_avail() and the
> > > consumer->read_avail_release_resources(), and by unbinding the driver
> > > during
> > > that delay, thus with the info_exists_lock mutex unlocked. In this case
> > > the
> > > driver is not unregistered until the iio_read_channel_info_avail()
> > > function
> > > completes, likely because of some other lock on the sysfs file after the
> > > call
> > > of
> > > cdev_device_del() in iio_device_unregister().
> > > 
> > 
> > Yes, you need to have some sync point at the kernfs level otherwise we could
> > always be handling a sysfs attr while the device is being removed under our
> > feet. But I'm not sure what you're trying to do... IIUC, the problem might
> > come
> > if have:
> > 
> > consumer->read_avail_channel_attribute()
> >         producer->info_lock()
> >         producer->read_avail()
> >                 producer->kmalloc()
> > 
> > ...
> > // producer unbound
> > ...
> > consumer->read_avail_release()
> >         return -ENODEV;
> > 
> > // producer->kmalloc() never get's freed...
> > 
> > The above is your problem right? And I think it should be a valid one since
> > between ->read_avail_channel_attribute() and read_avail_release() there's
> > nothing preventing the producer from being unregistered...
> 
> Yes, that's the problem.
> 
> > 
> > If I'm not missing nothing one solution would be for the producer to do
> > devm_kmalloc() and devm_kfree() on read_avail() and release_resources() but
> > at
> > that point I'm not sure it's better than what you have since it's odd enough
> > for
> > being missed in reviews...
> 
> I honestly didn't think of this and it would in fact prevent the
> additional copy. But I agree that it could be missed in new drivers,
> maybe a comment in the iio_info read_avail_release_resource() callback
> declaration would help?
> > 
At this point I would say whatever you or Jonathan prefer :)

- Nuno Sá
>
Matteo Martelli Oct. 9, 2024, 6:30 p.m. UTC | #6
Quoting Nuno Sá (2024-10-08 14:37:22)
> On Tue, 2024-10-08 at 10:03 +0200, Matteo Martelli wrote:
> > Quoting Nuno Sá (2024-10-08 09:29:14)
> > > On Tue, 2024-10-08 at 08:47 +0200, Matteo Martelli wrote:
> > > > Quoting Nuno Sá (2024-10-07 17:15:13)
> > > > > On Mon, 2024-10-07 at 10:37 +0200, Matteo Martelli wrote:
> > > > > > Consumers need to call the read_avail_release_resource after reading
> > > > > > the
> > > > > > available info. To call the release with info_exists locked, copy the
> > > > > > available info from the producer and immediately call its release
> > > > > > callback. With this change, users of iio_read_avail_channel_raw() and
> > > > > > iio_read_avail_channel_attribute() must free the copied avail info
> > > > > > after
> > > > > > calling them.
> > > > > > 
> > > > > > Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com>
> > > > > > ---
> > > > > >  drivers/iio/inkern.c         | 64 +++++++++++++++++++++++++++++++++--
> > > > > > ----
> > > > > > -----
> > > > > >  include/linux/iio/consumer.h |  4 +--
> > > > > >  2 files changed, 50 insertions(+), 18 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/iio/inkern.c b/drivers/iio/inkern.c
> > > > > > index
> > > > > > 7f325b3ed08fae6674245312cf8f57bb151006c0..cc65ef79451e5aa2cea447e16800
> > > > > > 7a44
> > > > > > 7ffc0d91
> > > > > > 100644
> > > > > > --- a/drivers/iio/inkern.c
> > > > > > +++ b/drivers/iio/inkern.c
> > > > > > @@ -760,9 +760,25 @@ static int iio_channel_read_avail(struct
> > > > > > iio_channel
> > > > > > *chan,
> > > > > >       if (!iio_channel_has_available(chan->channel, info))
> > > > > >               return -EINVAL;
> > > > > >  
> > > > > > -     if (iio_info->read_avail)
> > > > > > -             return iio_info->read_avail(chan->indio_dev, chan-
> > > > > > >channel,
> > > > > > -                                         vals, type, length, info);
> > > > > > +     if (iio_info->read_avail) {
> > > > > > +             const int *vals_tmp;
> > > > > > +             int ret;
> > > > > > +
> > > > > > +             ret = iio_info->read_avail(chan->indio_dev, chan-
> > > > > > >channel,
> > > > > > +                                        &vals_tmp, type, length,
> > > > > > info);
> > > > > > +             if (ret < 0)
> > > > > > +                     return ret;
> > > > > > +
> > > > > > +             *vals = kmemdup_array(vals_tmp, *length, sizeof(int),
> > > > > > GFP_KERNEL);
> > > > > > +             if (!*vals)
> > > > > > +                     return -ENOMEM;
> > > > > > +
> > > > > 
> > > > > Not a big deal but I would likely prefer to avoid yet another copy. If
> > > > > I'm
> > > > > understanding things correctly, I would rather create an inkern wrapper
> > > > > API
> > > > > like 
> > > > > iio_channel_read_avail_release_resource() - maybe something with a
> > > > > smaller
> > > > > name :).
> > > > > Hence, the lifetime of the data would be only controlled by the producer
> > > > > of
> > > > > it. It
> > > > > would also produce a smaller diff (I think). I just find it a bit
> > > > > confusing
> > > > > that we
> > > > > duplicate the data in here and the producer also duplicates it on the -
> > > > > > read_avail()
> > > > > call. Another advantage I see is that often the available data is indeed
> > > > > const in
> > > > > which case no kmemdup_array() is needed at all.
> > > > 
> > > > 
> > > > If I understand correctly your suggestion you would leave the inkern
> > > > iio_channel_read_avail() untouched, then add a new inkern wrapper,
> > > > something
> > > > like iio_channel_read_avail_release_resource(), that would call the
> > > > producer's
> > > > read_avail_release_resource(). The consumer would invoke this new wrapper
> > > > in
> > > > its
> > > > own read_avail_release_resource() avoiding the additional copy. The call
> > > > stack
> > > > would look something like the following:
> > > > 
> > > > iio_read_channel_info_avail() {
> > > >     consumer->read_avail() {
> > > >         iio_read_avail_channel_raw() {
> > > >             iio_channel_read_avail() {
> > > >                 producer->read_avail() {
> > > >                     kmemdup_array();
> > > >                 }
> > > >             }
> > > >         }
> > > >     }
> > > > 
> > > >     iio_format_list();
> > > > 
> > > >     consumer->read_avail_release_resource() {
> > > >         iio_read_avail_channel_release_resource() {
> > > >             producer->read_avail_release_resource() {
> > > >                 kfree();
> > > >             }
> > > >         }
> > > >     }
> > > > }
> > > 
> > > Yeah, exactly what came to mind...
> > > 
> > > > 
> > > > 
> > > > I was going with the simpler solution you described, but my concern with
> > > > it
> > > > was
> > > > that the info_exists_lock mutex would be unlocked between a
> > > > iio_channel_read_avail()
> > > > call and its corresponding iio_channel_read_avail_release_resource() call.
> > > > To my understanding, this could potentially allow for the device to be
> > > > unregistered between the two calls and result in a memleak of the avail
> > > > buffer
> > > > allocated by the producer.
> > > > 
> > > > However, I have been trying to reproduce a similar case by adding a delay
> > > > between the consumer->read_avail() and the
> > > > consumer->read_avail_release_resources(), and by unbinding the driver
> > > > during
> > > > that delay, thus with the info_exists_lock mutex unlocked. In this case
> > > > the
> > > > driver is not unregistered until the iio_read_channel_info_avail()
> > > > function
> > > > completes, likely because of some other lock on the sysfs file after the
> > > > call
> > > > of
> > > > cdev_device_del() in iio_device_unregister().
> > > > 
> > > 
> > > Yes, you need to have some sync point at the kernfs level otherwise we could
> > > always be handling a sysfs attr while the device is being removed under our
> > > feet. But I'm not sure what you're trying to do... IIUC, the problem might
> > > come
> > > if have:
> > > 
> > > consumer->read_avail_channel_attribute()
> > >         producer->info_lock()
> > >         producer->read_avail()
> > >                 producer->kmalloc()
> > > 
> > > ...
> > > // producer unbound
> > > ...
> > > consumer->read_avail_release()
> > >         return -ENODEV;
> > > 
> > > // producer->kmalloc() never get's freed...
> > > 
> > > The above is your problem right? And I think it should be a valid one since
> > > between ->read_avail_channel_attribute() and read_avail_release() there's
> > > nothing preventing the producer from being unregistered...
> > 
> > Yes, that's the problem.
> > 
> > > 
> > > If I'm not missing nothing one solution would be for the producer to do
> > > devm_kmalloc() and devm_kfree() on read_avail() and release_resources() but
> > > at
> > > that point I'm not sure it's better than what you have since it's odd enough
> > > for
> > > being missed in reviews...
> > 
> > I honestly didn't think of this and it would in fact prevent the
> > additional copy. But I agree that it could be missed in new drivers,
> > maybe a comment in the iio_info read_avail_release_resource() callback
> > declaration would help?
> > > 
> At this point I would say whatever you or Jonathan prefer :)
> 

I run some quick tests with this approach and haven't found any issue so
far. I would personally switch to this approach as it would be much
simpler and easier to understand, and since the avail lists are const
for most of the current drivers I would not expect many new drivers
needing a dynamic available list. However, I will wait Jonathan feedback
first.

About the release wrapper name: even though "release_resource" looks a
common suffix for this kind of pattern,
iio_read_avail_channel_release_resource() seems in fact extremely long
and I would go for something like iio_read_avail_channel_release(). At
that point I would also shorten the iio_info release function name for
consistency, like read_avail_release_resource() => read_avail_release().
I hope such names would be clear enough though. Any feedback on this?

> - Nuno Sá

Thanks,
Matteo Martelli
Jonathan Cameron Oct. 12, 2024, 3:47 p.m. UTC | #7
On Wed, 09 Oct 2024 20:30:15 +0200
Matteo Martelli <matteomartelli3@gmail.com> wrote:

> Quoting Nuno Sá (2024-10-08 14:37:22)
> > On Tue, 2024-10-08 at 10:03 +0200, Matteo Martelli wrote:  
> > > Quoting Nuno Sá (2024-10-08 09:29:14)  
> > > > On Tue, 2024-10-08 at 08:47 +0200, Matteo Martelli wrote:  
> > > > > Quoting Nuno Sá (2024-10-07 17:15:13)  
> > > > > > On Mon, 2024-10-07 at 10:37 +0200, Matteo Martelli wrote:  
> > > > > > > Consumers need to call the read_avail_release_resource after reading
> > > > > > > the
> > > > > > > available info. To call the release with info_exists locked, copy the
> > > > > > > available info from the producer and immediately call its release
> > > > > > > callback. With this change, users of iio_read_avail_channel_raw() and
> > > > > > > iio_read_avail_channel_attribute() must free the copied avail info
> > > > > > > after
> > > > > > > calling them.
> > > > > > > 
> > > > > > > Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com>
> > > > > > > ---
> > > > > > >  drivers/iio/inkern.c         | 64 +++++++++++++++++++++++++++++++++--
> > > > > > > ----
> > > > > > > -----
> > > > > > >  include/linux/iio/consumer.h |  4 +--
> > > > > > >  2 files changed, 50 insertions(+), 18 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/iio/inkern.c b/drivers/iio/inkern.c
> > > > > > > index
> > > > > > > 7f325b3ed08fae6674245312cf8f57bb151006c0..cc65ef79451e5aa2cea447e16800
> > > > > > > 7a44
> > > > > > > 7ffc0d91
> > > > > > > 100644
> > > > > > > --- a/drivers/iio/inkern.c
> > > > > > > +++ b/drivers/iio/inkern.c
> > > > > > > @@ -760,9 +760,25 @@ static int iio_channel_read_avail(struct
> > > > > > > iio_channel
> > > > > > > *chan,
> > > > > > >       if (!iio_channel_has_available(chan->channel, info))
> > > > > > >               return -EINVAL;
> > > > > > >  
> > > > > > > -     if (iio_info->read_avail)
> > > > > > > -             return iio_info->read_avail(chan->indio_dev, chan-  
> > > > > > > >channel,  
> > > > > > > -                                         vals, type, length, info);
> > > > > > > +     if (iio_info->read_avail) {
> > > > > > > +             const int *vals_tmp;
> > > > > > > +             int ret;
> > > > > > > +
> > > > > > > +             ret = iio_info->read_avail(chan->indio_dev, chan-  
> > > > > > > >channel,  
> > > > > > > +                                        &vals_tmp, type, length,
> > > > > > > info);
> > > > > > > +             if (ret < 0)
> > > > > > > +                     return ret;
> > > > > > > +
> > > > > > > +             *vals = kmemdup_array(vals_tmp, *length, sizeof(int),
> > > > > > > GFP_KERNEL);
> > > > > > > +             if (!*vals)
> > > > > > > +                     return -ENOMEM;
> > > > > > > +  
> > > > > > 
> > > > > > Not a big deal but I would likely prefer to avoid yet another copy. If
> > > > > > I'm
> > > > > > understanding things correctly, I would rather create an inkern wrapper
> > > > > > API
> > > > > > like 
> > > > > > iio_channel_read_avail_release_resource() - maybe something with a
> > > > > > smaller
> > > > > > name :).
> > > > > > Hence, the lifetime of the data would be only controlled by the producer
> > > > > > of
> > > > > > it. It
> > > > > > would also produce a smaller diff (I think). I just find it a bit
> > > > > > confusing
> > > > > > that we
> > > > > > duplicate the data in here and the producer also duplicates it on the -  
> > > > > > > read_avail()  
> > > > > > call. Another advantage I see is that often the available data is indeed
> > > > > > const in
> > > > > > which case no kmemdup_array() is needed at all.  
> > > > > 
> > > > > 
> > > > > If I understand correctly your suggestion you would leave the inkern
> > > > > iio_channel_read_avail() untouched, then add a new inkern wrapper,
> > > > > something
> > > > > like iio_channel_read_avail_release_resource(), that would call the
> > > > > producer's
> > > > > read_avail_release_resource(). The consumer would invoke this new wrapper
> > > > > in
> > > > > its
> > > > > own read_avail_release_resource() avoiding the additional copy. The call
> > > > > stack
> > > > > would look something like the following:
> > > > > 
> > > > > iio_read_channel_info_avail() {

Here you are talking about the core code that produces a string.
But you've done that in reply to v5 which is about inkern interfaces
that don't work in strings...



> > > > >     consumer->read_avail() {
> > > > >         iio_read_avail_channel_raw() {
> > > > >             iio_channel_read_avail() {
> > > > >                 producer->read_avail() {
> > > > >                     kmemdup_array();
> > > > >                 }
> > > > >             }
> > > > >         }
> > > > >     }
> > > > > 
> > > > >     iio_format_list();
That's effectively making the safe copy that is handed back to the
caller. So this is fine.
> > > > > 
> > > > >     consumer->read_avail_release_resource() {
> > > > >         iio_read_avail_channel_release_resource() {
> > > > >             producer->read_avail_release_resource() {
> > > > >                 kfree();
> > > > >             }
> > > > >         }
> > > > >     }
> > > > > }  
> > > > 
> > > > Yeah, exactly what came to mind...

I'm very confused what scope of comments here is. Maybe the easiest thing is to send the code.


> > > >   
> > > > > 
> > > > > 
> > > > > I was going with the simpler solution you described, but my concern with
> > > > > it
> > > > > was
> > > > > that the info_exists_lock mutex would be unlocked between a
> > > > > iio_channel_read_avail()
> > > > > call and its corresponding iio_channel_read_avail_release_resource() call.
> > > > > To my understanding, this could potentially allow for the device to be
> > > > > unregistered between the two calls and result in a memleak of the avail
> > > > > buffer
> > > > > allocated by the producer.

Yes. That's why we have to free the produced copy under the exist_lock
(and take yet another copy).

> > > > > 
> > > > > However, I have been trying to reproduce a similar case by adding a delay
> > > > > between the consumer->read_avail() and the
> > > > > consumer->read_avail_release_resources(), and by unbinding the driver
> > > > > during
> > > > > that delay, thus with the info_exists_lock mutex unlocked. In this case
> > > > > the
> > > > > driver is not unregistered until the iio_read_channel_info_avail()
> > > > > function
> > > > > completes, likely because of some other lock on the sysfs file after the
> > > > > call
> > > > > of
> > > > > cdev_device_del() in iio_device_unregister().
> > > > >   
> > > > 
> > > > Yes, you need to have some sync point at the kernfs level otherwise we could
> > > > always be handling a sysfs attr while the device is being removed under our
> > > > feet. But I'm not sure what you're trying to do... IIUC, the problem might
> > > > come
> > > > if have:
> > > > 
> > > > consumer->read_avail_channel_attribute()
> > > >         producer->info_lock()
> > > >         producer->read_avail()
> > > >                 producer->kmalloc()
> > > > 
> > > > ...
> > > > // producer unbound
> > > > ...
> > > > consumer->read_avail_release()
> > > >         return -ENODEV;
> > > > 
> > > > // producer->kmalloc() never get's freed...
> > > > 
> > > > The above is your problem right? And I think it should be a valid one since
> > > > between ->read_avail_channel_attribute() and read_avail_release() there's
> > > > nothing preventing the producer from being unregistered...  
> > > 
> > > Yes, that's the problem.
> > >   
> > > > 
> > > > If I'm not missing nothing one solution would be for the producer to do
> > > > devm_kmalloc() and devm_kfree() on read_avail() and release_resources() but
> > > > at
> > > > that point I'm not sure it's better than what you have since it's odd enough
> > > > for
> > > > being missed in reviews...  

If it's an allocation to keep a copy for an active consumer then that is nasty
as the lifetime is completely untidied.  Effectively you are garbage collecting.

> > > 
> > > I honestly didn't think of this and it would in fact prevent the
> > > additional copy. But I agree that it could be missed in new drivers,
> > > maybe a comment in the iio_info read_avail_release_resource() callback
> > > declaration would help?  
> > > >   
> > At this point I would say whatever you or Jonathan prefer :)
> >   
> 
> I run some quick tests with this approach and haven't found any issue so
> far. 

I can't see what is preventing the race you describe where the
release callback is swept out by a driver unbind of the provider.
So unless we can show that is safe I don't see a way to avoid a consumer copy.

Long shot, Lars-Peter I think you fixed up most of the previous races in this
code, don't suppose you remember how it works?

> I would personally switch to this approach as it would be much
> simpler and easier to understand, and since the avail lists are const
> for most of the current drivers I would not expect many new drivers
> needing a dynamic available list. However, I will wait Jonathan feedback
> first.
The idea here is almost no one actually makes a copy in the producer.
The consumer copy is a necessity to my thinking because we are effectively
passing the ownership of the data.  Unfortunately we have no idea how
the producer would free it so we need to create our own copy.

Key here is this a very rare operation.   No one polls available
data at high frequency, it's a read once kind of thing typically.
+ these are normally really short lists in the cases we actually use
(so far I 'think' they have always been the 3 value range form, not
 a list of potential values).

Jonathan


> 
> About the release wrapper name: even though "release_resource" looks a
> common suffix for this kind of pattern,
> iio_read_avail_channel_release_resource() seems in fact extremely long
> and I would go for something like iio_read_avail_channel_release(). At
> that point I would also shorten the iio_info release function name for
> consistency, like read_avail_release_resource() => read_avail_release().
> I hope such names would be clear enough though. Any feedback on this?
> 
> > - Nuno Sá  
> 
> Thanks,
> Matteo Martelli
Matteo Martelli Oct. 12, 2024, 11:09 p.m. UTC | #8
Quoting Jonathan Cameron (2024-10-12 17:47:32)
> On Wed, 09 Oct 2024 20:30:15 +0200
> Matteo Martelli <matteomartelli3@gmail.com> wrote:
> 
> > Quoting Nuno Sá (2024-10-08 14:37:22)
> > > On Tue, 2024-10-08 at 10:03 +0200, Matteo Martelli wrote:  
> > > > Quoting Nuno Sá (2024-10-08 09:29:14)  
> > > > > On Tue, 2024-10-08 at 08:47 +0200, Matteo Martelli wrote:  
> > > > > > Quoting Nuno Sá (2024-10-07 17:15:13)  
> > > > > > > On Mon, 2024-10-07 at 10:37 +0200, Matteo Martelli wrote:  
> > > > > > > > Consumers need to call the read_avail_release_resource after reading
> > > > > > > > the
> > > > > > > > available info. To call the release with info_exists locked, copy the
> > > > > > > > available info from the producer and immediately call its release
> > > > > > > > callback. With this change, users of iio_read_avail_channel_raw() and
> > > > > > > > iio_read_avail_channel_attribute() must free the copied avail info
> > > > > > > > after
> > > > > > > > calling them.
> > > > > > > > 
> > > > > > > > Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com>
> > > > > > > > ---
> > > > > > > >  drivers/iio/inkern.c         | 64 +++++++++++++++++++++++++++++++++--
> > > > > > > > ----
> > > > > > > > -----
> > > > > > > >  include/linux/iio/consumer.h |  4 +--
> > > > > > > >  2 files changed, 50 insertions(+), 18 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/iio/inkern.c b/drivers/iio/inkern.c
> > > > > > > > index
> > > > > > > > 7f325b3ed08fae6674245312cf8f57bb151006c0..cc65ef79451e5aa2cea447e16800
> > > > > > > > 7a44
> > > > > > > > 7ffc0d91
> > > > > > > > 100644
> > > > > > > > --- a/drivers/iio/inkern.c
> > > > > > > > +++ b/drivers/iio/inkern.c
> > > > > > > > @@ -760,9 +760,25 @@ static int iio_channel_read_avail(struct
> > > > > > > > iio_channel
> > > > > > > > *chan,
> > > > > > > >       if (!iio_channel_has_available(chan->channel, info))
> > > > > > > >               return -EINVAL;
> > > > > > > >  
> > > > > > > > -     if (iio_info->read_avail)
> > > > > > > > -             return iio_info->read_avail(chan->indio_dev, chan-  
> > > > > > > > >channel,  
> > > > > > > > -                                         vals, type, length, info);
> > > > > > > > +     if (iio_info->read_avail) {
> > > > > > > > +             const int *vals_tmp;
> > > > > > > > +             int ret;
> > > > > > > > +
> > > > > > > > +             ret = iio_info->read_avail(chan->indio_dev, chan-  
> > > > > > > > >channel,  
> > > > > > > > +                                        &vals_tmp, type, length,
> > > > > > > > info);
> > > > > > > > +             if (ret < 0)
> > > > > > > > +                     return ret;
> > > > > > > > +
> > > > > > > > +             *vals = kmemdup_array(vals_tmp, *length, sizeof(int),
> > > > > > > > GFP_KERNEL);
> > > > > > > > +             if (!*vals)
> > > > > > > > +                     return -ENOMEM;
> > > > > > > > +  
> > > > > > > 
> > > > > > > Not a big deal but I would likely prefer to avoid yet another copy. If
> > > > > > > I'm
> > > > > > > understanding things correctly, I would rather create an inkern wrapper
> > > > > > > API
> > > > > > > like 
> > > > > > > iio_channel_read_avail_release_resource() - maybe something with a
> > > > > > > smaller
> > > > > > > name :).
> > > > > > > Hence, the lifetime of the data would be only controlled by the producer
> > > > > > > of
> > > > > > > it. It
> > > > > > > would also produce a smaller diff (I think). I just find it a bit
> > > > > > > confusing
> > > > > > > that we
> > > > > > > duplicate the data in here and the producer also duplicates it on the -  
> > > > > > > > read_avail()  
> > > > > > > call. Another advantage I see is that often the available data is indeed
> > > > > > > const in
> > > > > > > which case no kmemdup_array() is needed at all.  
> > > > > > 
> > > > > > 
> > > > > > If I understand correctly your suggestion you would leave the inkern
> > > > > > iio_channel_read_avail() untouched, then add a new inkern wrapper,
> > > > > > something
> > > > > > like iio_channel_read_avail_release_resource(), that would call the
> > > > > > producer's
> > > > > > read_avail_release_resource(). The consumer would invoke this new wrapper
> > > > > > in
> > > > > > its
> > > > > > own read_avail_release_resource() avoiding the additional copy. The call
> > > > > > stack
> > > > > > would look something like the following:
> > > > > > 
> > > > > > iio_read_channel_info_avail() {
> 
> Here you are talking about the core code that produces a string.
> But you've done that in reply to v5 which is about inkern interfaces
> that don't work in strings...
> 
> 
> 
> > > > > >     consumer->read_avail() {
> > > > > >         iio_read_avail_channel_raw() {
> > > > > >             iio_channel_read_avail() {
> > > > > >                 producer->read_avail() {
> > > > > >                     kmemdup_array();
> > > > > >                 }
> > > > > >             }
> > > > > >         }
> > > > > >     }
> > > > > > 
> > > > > >     iio_format_list();
> That's effectively making the safe copy that is handed back to the
> caller. So this is fine.
> > > > > > 
> > > > > >     consumer->read_avail_release_resource() {
> > > > > >         iio_read_avail_channel_release_resource() {
> > > > > >             producer->read_avail_release_resource() {
> > > > > >                 kfree();
> > > > > >             }
> > > > > >         }
> > > > > >     }
> > > > > > }  
> > > > > 
> > > > > Yeah, exactly what came to mind...
> 
> I'm very confused what scope of comments here is. Maybe the easiest thing is to send the code.
> 

The function call graph example and the related comments above were just
to show the full flow of a read_avail() call on a consumer driver in
case an inkern release_resource() wrapper was used instead of an
additional consumer copy. This just to confirm that I was on the same
page as what Nuno suggested.

> 
> > > > >   
> > > > > > 
> > > > > > 
> > > > > > I was going with the simpler solution you described, but my concern with
> > > > > > it
> > > > > > was
> > > > > > that the info_exists_lock mutex would be unlocked between a
> > > > > > iio_channel_read_avail()
> > > > > > call and its corresponding iio_channel_read_avail_release_resource() call.
> > > > > > To my understanding, this could potentially allow for the device to be
> > > > > > unregistered between the two calls and result in a memleak of the avail
> > > > > > buffer
> > > > > > allocated by the producer.
> 
> Yes. That's why we have to free the produced copy under the exist_lock
> (and take yet another copy).
> 
> > > > > > 
> > > > > > However, I have been trying to reproduce a similar case by adding a delay
> > > > > > between the consumer->read_avail() and the
> > > > > > consumer->read_avail_release_resources(), and by unbinding the driver
> > > > > > during
> > > > > > that delay, thus with the info_exists_lock mutex unlocked. In this case
> > > > > > the
> > > > > > driver is not unregistered until the iio_read_channel_info_avail()
> > > > > > function
> > > > > > completes, likely because of some other lock on the sysfs file after the
> > > > > > call
> > > > > > of
> > > > > > cdev_device_del() in iio_device_unregister().
> > > > > >   
> > > > > 
> > > > > Yes, you need to have some sync point at the kernfs level otherwise we could
> > > > > always be handling a sysfs attr while the device is being removed under our
> > > > > feet. But I'm not sure what you're trying to do... IIUC, the problem might
> > > > > come
> > > > > if have:
> > > > > 
> > > > > consumer->read_avail_channel_attribute()
> > > > >         producer->info_lock()
> > > > >         producer->read_avail()
> > > > >                 producer->kmalloc()
> > > > > 
> > > > > ...
> > > > > // producer unbound
> > > > > ...
> > > > > consumer->read_avail_release()
> > > > >         return -ENODEV;
> > > > > 
> > > > > // producer->kmalloc() never get's freed...
> > > > > 
> > > > > The above is your problem right? And I think it should be a valid one since
> > > > > between ->read_avail_channel_attribute() and read_avail_release() there's
> > > > > nothing preventing the producer from being unregistered...  
> > > > 
> > > > Yes, that's the problem.
> > > >   
> > > > > 
> > > > > If I'm not missing nothing one solution would be for the producer to do
> > > > > devm_kmalloc() and devm_kfree() on read_avail() and release_resources() but
> > > > > at
> > > > > that point I'm not sure it's better than what you have since it's odd enough
> > > > > for
> > > > > being missed in reviews...  
> 
> If it's an allocation to keep a copy for an active consumer then that is nasty
> as the lifetime is completely untidied.  Effectively you are garbage collecting.
> 
> > > > 
> > > > I honestly didn't think of this and it would in fact prevent the
> > > > additional copy. But I agree that it could be missed in new drivers,
> > > > maybe a comment in the iio_info read_avail_release_resource() callback
> > > > declaration would help?  
> > > > >   
> > > At this point I would say whatever you or Jonathan prefer :)
> > >   
> > 
> > I run some quick tests with this approach and haven't found any issue so
> > far. 
> 
> I can't see what is preventing the race you describe where the
> release callback is swept out by a driver unbind of the provider.
> So unless we can show that is safe I don't see a way to avoid a consumer copy.
> 
> Long shot, Lars-Peter I think you fixed up most of the previous races in this
> code, don't suppose you remember how it works?
> 
> > I would personally switch to this approach as it would be much
> > simpler and easier to understand, and since the avail lists are const
> > for most of the current drivers I would not expect many new drivers
> > needing a dynamic available list. However, I will wait Jonathan feedback
> > first.
> The idea here is almost no one actually makes a copy in the producer.
> The consumer copy is a necessity to my thinking because we are effectively
> passing the ownership of the data.  Unfortunately we have no idea how
> the producer would free it so we need to create our own copy.
> 

You are indeed right. During the last comment reviews I focused on the
fact that using a devm_kfree() in the producer's read_avail_release()
would prevent the memleak but I completely missed that it would not
prevent the race condition: without the consumer copy, the consumer or
core code could still try to access a freed buffer after the producer
driver gets unregistered.

Thus I will keep this patch with the consumer copy as it is. I am soon
going to send a next version of this patch series to address the other
findings during this version.

> Key here is this a very rare operation.   No one polls available
> data at high frequency, it's a read once kind of thing typically.
> + these are normally really short lists in the cases we actually use
> (so far I 'think' they have always been the 3 value range form, not
>  a list of potential values).
> 
> Jonathan
> 
> 

Thanks,
Matteo Martelli
Nuno Sá Oct. 14, 2024, 6:39 a.m. UTC | #9
On Sun, 2024-10-13 at 01:09 +0200, Matteo Martelli wrote:
> Quoting Jonathan Cameron (2024-10-12 17:47:32)
> > On Wed, 09 Oct 2024 20:30:15 +0200
> > Matteo Martelli <matteomartelli3@gmail.com> wrote:
> > 
> > > Quoting Nuno Sá (2024-10-08 14:37:22)
> > > > On Tue, 2024-10-08 at 10:03 +0200, Matteo Martelli wrote:  
> > > > > Quoting Nuno Sá (2024-10-08 09:29:14)  
> > > > > > On Tue, 2024-10-08 at 08:47 +0200, Matteo Martelli wrote:  
> > > > > > > Quoting Nuno Sá (2024-10-07 17:15:13)  
> > > > > > > > On Mon, 2024-10-07 at 10:37 +0200, Matteo Martelli wrote:  
> > > > > > > > > Consumers need to call the read_avail_release_resource after
> > > > > > > > > reading
> > > > > > > > > the
> > > > > > > > > available info. To call the release with info_exists locked,
> > > > > > > > > copy the
> > > > > > > > > available info from the producer and immediately call its
> > > > > > > > > release
> > > > > > > > > callback. With this change, users of
> > > > > > > > > iio_read_avail_channel_raw() and
> > > > > > > > > iio_read_avail_channel_attribute() must free the copied avail
> > > > > > > > > info
> > > > > > > > > after
> > > > > > > > > calling them.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/iio/inkern.c         | 64
> > > > > > > > > +++++++++++++++++++++++++++++++++--
> > > > > > > > > ----
> > > > > > > > > -----
> > > > > > > > >  include/linux/iio/consumer.h |  4 +--
> > > > > > > > >  2 files changed, 50 insertions(+), 18 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/iio/inkern.c b/drivers/iio/inkern.c
> > > > > > > > > index
> > > > > > > > > 7f325b3ed08fae6674245312cf8f57bb151006c0..cc65ef79451e5aa2cea4
> > > > > > > > > 47e16800
> > > > > > > > > 7a44
> > > > > > > > > 7ffc0d91
> > > > > > > > > 100644
> > > > > > > > > --- a/drivers/iio/inkern.c
> > > > > > > > > +++ b/drivers/iio/inkern.c
> > > > > > > > > @@ -760,9 +760,25 @@ static int iio_channel_read_avail(struct
> > > > > > > > > iio_channel
> > > > > > > > > *chan,
> > > > > > > > >       if (!iio_channel_has_available(chan->channel, info))
> > > > > > > > >               return -EINVAL;
> > > > > > > > >  
> > > > > > > > > -     if (iio_info->read_avail)
> > > > > > > > > -             return iio_info->read_avail(chan->indio_dev,
> > > > > > > > > chan-  
> > > > > > > > > > channel,  
> > > > > > > > > -                                         vals, type, length,
> > > > > > > > > info);
> > > > > > > > > +     if (iio_info->read_avail) {
> > > > > > > > > +             const int *vals_tmp;
> > > > > > > > > +             int ret;
> > > > > > > > > +
> > > > > > > > > +             ret = iio_info->read_avail(chan->indio_dev,
> > > > > > > > > chan-  
> > > > > > > > > > channel,  
> > > > > > > > > +                                        &vals_tmp, type,
> > > > > > > > > length,
> > > > > > > > > info);
> > > > > > > > > +             if (ret < 0)
> > > > > > > > > +                     return ret;
> > > > > > > > > +
> > > > > > > > > +             *vals = kmemdup_array(vals_tmp, *length,
> > > > > > > > > sizeof(int),
> > > > > > > > > GFP_KERNEL);
> > > > > > > > > +             if (!*vals)
> > > > > > > > > +                     return -ENOMEM;
> > > > > > > > > +  
> > > > > > > > 
> > > > > > > > Not a big deal but I would likely prefer to avoid yet another
> > > > > > > > copy. If
> > > > > > > > I'm
> > > > > > > > understanding things correctly, I would rather create an inkern
> > > > > > > > wrapper
> > > > > > > > API
> > > > > > > > like 
> > > > > > > > iio_channel_read_avail_release_resource() - maybe something with
> > > > > > > > a
> > > > > > > > smaller
> > > > > > > > name :).
> > > > > > > > Hence, the lifetime of the data would be only controlled by the
> > > > > > > > producer
> > > > > > > > of
> > > > > > > > it. It
> > > > > > > > would also produce a smaller diff (I think). I just find it a
> > > > > > > > bit
> > > > > > > > confusing
> > > > > > > > that we
> > > > > > > > duplicate the data in here and the producer also duplicates it
> > > > > > > > on the -  
> > > > > > > > > read_avail()  
> > > > > > > > call. Another advantage I see is that often the available data
> > > > > > > > is indeed
> > > > > > > > const in
> > > > > > > > which case no kmemdup_array() is needed at all.  
> > > > > > > 
> > > > > > > 
> > > > > > > If I understand correctly your suggestion you would leave the
> > > > > > > inkern
> > > > > > > iio_channel_read_avail() untouched, then add a new inkern wrapper,
> > > > > > > something
> > > > > > > like iio_channel_read_avail_release_resource(), that would call
> > > > > > > the
> > > > > > > producer's
> > > > > > > read_avail_release_resource(). The consumer would invoke this new
> > > > > > > wrapper
> > > > > > > in
> > > > > > > its
> > > > > > > own read_avail_release_resource() avoiding the additional copy.
> > > > > > > The call
> > > > > > > stack
> > > > > > > would look something like the following:
> > > > > > > 
> > > > > > > iio_read_channel_info_avail() {
> > 
> > Here you are talking about the core code that produces a string.
> > But you've done that in reply to v5 which is about inkern interfaces
> > that don't work in strings...
> > 
> > 
> > 
> > > > > > >     consumer->read_avail() {
> > > > > > >         iio_read_avail_channel_raw() {
> > > > > > >             iio_channel_read_avail() {
> > > > > > >                 producer->read_avail() {
> > > > > > >                     kmemdup_array();
> > > > > > >                 }
> > > > > > >             }
> > > > > > >         }
> > > > > > >     }
> > > > > > > 
> > > > > > >     iio_format_list();
> > That's effectively making the safe copy that is handed back to the
> > caller. So this is fine.
> > > > > > > 
> > > > > > >     consumer->read_avail_release_resource() {
> > > > > > >         iio_read_avail_channel_release_resource() {
> > > > > > >             producer->read_avail_release_resource() {
> > > > > > >                 kfree();
> > > > > > >             }
> > > > > > >         }
> > > > > > >     }
> > > > > > > }  
> > > > > > 
> > > > > > Yeah, exactly what came to mind...
> > 
> > I'm very confused what scope of comments here is. Maybe the easiest thing is
> > to send the code.
> > 
> 
> The function call graph example and the related comments above were just
> to show the full flow of a read_avail() call on a consumer driver in
> case an inkern release_resource() wrapper was used instead of an
> additional consumer copy. This just to confirm that I was on the same
> page as what Nuno suggested.
> 
> > 
> > > > > >   
> > > > > > > 
> > > > > > > 
> > > > > > > I was going with the simpler solution you described, but my
> > > > > > > concern with
> > > > > > > it
> > > > > > > was
> > > > > > > that the info_exists_lock mutex would be unlocked between a
> > > > > > > iio_channel_read_avail()
> > > > > > > call and its corresponding
> > > > > > > iio_channel_read_avail_release_resource() call.
> > > > > > > To my understanding, this could potentially allow for the device
> > > > > > > to be
> > > > > > > unregistered between the two calls and result in a memleak of the
> > > > > > > avail
> > > > > > > buffer
> > > > > > > allocated by the producer.
> > 
> > Yes. That's why we have to free the produced copy under the exist_lock
> > (and take yet another copy).
> > 
> > > > > > > 
> > > > > > > However, I have been trying to reproduce a similar case by adding
> > > > > > > a delay
> > > > > > > between the consumer->read_avail() and the
> > > > > > > consumer->read_avail_release_resources(), and by unbinding the
> > > > > > > driver
> > > > > > > during
> > > > > > > that delay, thus with the info_exists_lock mutex unlocked. In this
> > > > > > > case
> > > > > > > the
> > > > > > > driver is not unregistered until the iio_read_channel_info_avail()
> > > > > > > function
> > > > > > > completes, likely because of some other lock on the sysfs file
> > > > > > > after the
> > > > > > > call
> > > > > > > of
> > > > > > > cdev_device_del() in iio_device_unregister().
> > > > > > >   
> > > > > > 
> > > > > > Yes, you need to have some sync point at the kernfs level otherwise
> > > > > > we could
> > > > > > always be handling a sysfs attr while the device is being removed
> > > > > > under our
> > > > > > feet. But I'm not sure what you're trying to do... IIUC, the problem
> > > > > > might
> > > > > > come
> > > > > > if have:
> > > > > > 
> > > > > > consumer->read_avail_channel_attribute()
> > > > > >         producer->info_lock()
> > > > > >         producer->read_avail()
> > > > > >                 producer->kmalloc()
> > > > > > 
> > > > > > ...
> > > > > > // producer unbound
> > > > > > ...
> > > > > > consumer->read_avail_release()
> > > > > >         return -ENODEV;
> > > > > > 
> > > > > > // producer->kmalloc() never get's freed...
> > > > > > 
> > > > > > The above is your problem right? And I think it should be a valid
> > > > > > one since
> > > > > > between ->read_avail_channel_attribute() and read_avail_release()
> > > > > > there's
> > > > > > nothing preventing the producer from being unregistered...  
> > > > > 
> > > > > Yes, that's the problem.
> > > > >   
> > > > > > 
> > > > > > If I'm not missing nothing one solution would be for the producer to
> > > > > > do
> > > > > > devm_kmalloc() and devm_kfree() on read_avail() and
> > > > > > release_resources() but
> > > > > > at
> > > > > > that point I'm not sure it's better than what you have since it's
> > > > > > odd enough
> > > > > > for
> > > > > > being missed in reviews...  
> > 
> > If it's an allocation to keep a copy for an active consumer then that is
> > nasty
> > as the lifetime is completely untidied.  Effectively you are garbage
> > collecting.
> > 
> > > > > 
> > > > > I honestly didn't think of this and it would in fact prevent the
> > > > > additional copy. But I agree that it could be missed in new drivers,
> > > > > maybe a comment in the iio_info read_avail_release_resource() callback
> > > > > declaration would help?  
> > > > > >   
> > > > At this point I would say whatever you or Jonathan prefer :)
> > > >   
> > > 
> > > I run some quick tests with this approach and haven't found any issue so
> > > far. 
> > 
> > I can't see what is preventing the race you describe where the
> > release callback is swept out by a driver unbind of the provider.
> > So unless we can show that is safe I don't see a way to avoid a consumer
> > copy.
> > 
> > Long shot, Lars-Peter I think you fixed up most of the previous races in
> > this
> > code, don't suppose you remember how it works?
> > 
> > > I would personally switch to this approach as it would be much
> > > simpler and easier to understand, and since the avail lists are const
> > > for most of the current drivers I would not expect many new drivers
> > > needing a dynamic available list. However, I will wait Jonathan feedback
> > > first.
> > The idea here is almost no one actually makes a copy in the producer.
> > The consumer copy is a necessity to my thinking because we are effectively
> > passing the ownership of the data.  Unfortunately we have no idea how
> > the producer would free it so we need to create our own copy.
> > 
> 
> You are indeed right. During the last comment reviews I focused on the
> fact that using a devm_kfree() in the producer's read_avail_release()
> would prevent the memleak but I completely missed that it would not
> prevent the race condition: without the consumer copy, the consumer or
> core code could still try to access a freed buffer after the producer
> driver gets unregistered.
> 

Yeah, this would be indeed an issue... Good catch!

- Nuno Sá
diff mbox series

Patch

diff --git a/drivers/iio/inkern.c b/drivers/iio/inkern.c
index 7f325b3ed08fae6674245312cf8f57bb151006c0..cc65ef79451e5aa2cea447e168007a447ffc0d91 100644
--- a/drivers/iio/inkern.c
+++ b/drivers/iio/inkern.c
@@ -760,9 +760,25 @@  static int iio_channel_read_avail(struct iio_channel *chan,
 	if (!iio_channel_has_available(chan->channel, info))
 		return -EINVAL;
 
-	if (iio_info->read_avail)
-		return iio_info->read_avail(chan->indio_dev, chan->channel,
-					    vals, type, length, info);
+	if (iio_info->read_avail) {
+		const int *vals_tmp;
+		int ret;
+
+		ret = iio_info->read_avail(chan->indio_dev, chan->channel,
+					   &vals_tmp, type, length, info);
+		if (ret < 0)
+			return ret;
+
+		*vals = kmemdup_array(vals_tmp, *length, sizeof(int), GFP_KERNEL);
+		if (!*vals)
+			return -ENOMEM;
+
+		if (iio_info->read_avail_release_resource)
+			iio_info->read_avail_release_resource(
+				chan->indio_dev, chan->channel, vals_tmp, info);
+
+		return ret;
+	}
 	return -EINVAL;
 }
 
@@ -789,9 +805,11 @@  int iio_read_avail_channel_raw(struct iio_channel *chan,
 	ret = iio_read_avail_channel_attribute(chan, vals, &type, length,
 					       IIO_CHAN_INFO_RAW);
 
-	if (ret >= 0 && type != IIO_VAL_INT)
+	if (ret >= 0 && type != IIO_VAL_INT) {
 		/* raw values are assumed to be IIO_VAL_INT */
+		kfree(*vals);
 		ret = -EINVAL;
+	}
 
 	return ret;
 }
@@ -820,24 +838,31 @@  static int iio_channel_read_max(struct iio_channel *chan,
 			if (val2)
 				*val2 = vals[5];
 		}
-		return 0;
+		ret = 0;
+		break;
 
 	case IIO_AVAIL_LIST:
-		if (length <= 0)
-			return -EINVAL;
+		if (length <= 0) {
+			ret = -EINVAL;
+			goto out;
+		}
 		switch (*type) {
 		case IIO_VAL_INT:
 			*val = max_array(vals, length);
+			ret = 0;
 			break;
 		default:
 			/* TODO: learn about max for other iio values */
-			return -EINVAL;
+			ret = -EINVAL;
 		}
-		return 0;
+		break;
 
 	default:
-		return -EINVAL;
+		ret = -EINVAL;
 	}
+out:
+	kfree(vals);
+	return ret;
 }
 
 int iio_read_max_channel_raw(struct iio_channel *chan, int *val)
@@ -876,24 +901,31 @@  static int iio_channel_read_min(struct iio_channel *chan,
 			if (val2)
 				*val2 = vals[1];
 		}
-		return 0;
+		ret = 0;
+		break;
 
 	case IIO_AVAIL_LIST:
-		if (length <= 0)
-			return -EINVAL;
+		if (length <= 0) {
+			ret = -EINVAL;
+			goto out;
+		}
 		switch (*type) {
 		case IIO_VAL_INT:
 			*val = min_array(vals, length);
+			ret = 0;
 			break;
 		default:
 			/* TODO: learn about min for other iio values */
-			return -EINVAL;
+			ret = -EINVAL;
 		}
-		return 0;
+		break;
 
 	default:
-		return -EINVAL;
+		ret = -EINVAL;
 	}
+out:
+	kfree(vals);
+	return ret;
 }
 
 int iio_read_min_channel_raw(struct iio_channel *chan, int *val)
diff --git a/include/linux/iio/consumer.h b/include/linux/iio/consumer.h
index 333d1d8ccb37f387fe531577ac5e0bfc7f752cec..e3e268d2574b3e01c9412449d90d627de7efcd84 100644
--- a/include/linux/iio/consumer.h
+++ b/include/linux/iio/consumer.h
@@ -316,7 +316,7 @@  int iio_read_min_channel_raw(struct iio_channel *chan, int *val);
 /**
  * iio_read_avail_channel_raw() - read available raw values from a given channel
  * @chan:		The channel being queried.
- * @vals:		Available values read back.
+ * @vals:		Available values read back. Must be freed after use.
  * @length:		Number of entries in vals.
  *
  * Returns an error code, IIO_AVAIL_RANGE or IIO_AVAIL_LIST.
@@ -334,7 +334,7 @@  int iio_read_avail_channel_raw(struct iio_channel *chan,
 /**
  * iio_read_avail_channel_attribute() - read available channel attribute values
  * @chan:		The channel being queried.
- * @vals:		Available values read back.
+ * @vals:		Available values read back. Must be freed after use.
  * @type:		Type of values read back.
  * @length:		Number of entries in vals.
  * @attribute:		info attribute to be read back.