mbox series

[v6,0/5] Introduce the Counter character device interface

Message ID cover.1606075915.git.vilhelm.gray@gmail.com (mailing list archive)
Headers show
Series Introduce the Counter character device interface | expand

Message

William Breathitt Gray Nov. 22, 2020, 8:29 p.m. UTC
Changes in v6:
 - Consolidated the value member of struct counter_event down to a
   single u64; u64 should be capable of representing all component
   values
 - Removed extension width sysfs attributes; no longer needed when value
   is always u64
 - Implemented COUNTER_COMPONENT_DUMMY to allow timestamp grabs without
   component data reads
 - Implemented events_config() callback; called during
   COUNTER_CLEAR_WATCHES_IOCTL and COUNTER_LOAD_WATCHES_IOCTL in order
   to allow devices a chance to adjust (enable/disable IRQ, etc.) for
   the new events configuration requested by the user
 - Simplified example code in Documentation by removing confusing use of
   poll() call
 - Removed redundant ida_simple_remove() from counter_register()
 - Renamed devm_counter_unreg() to devm_counter_unregister()
 - Renamed functions in counter-sysfs.c to be clearer
 - Fixed miscellaneous typos throughout files
 - Added more kernel doc comments; I've left some defines without
   comments if they seemed obvious -- but please let me know if further
   documentation is needed
 - Refactored quad8_irq_handler() to use WARN_ONCE() instead of
   returning on error; this should prevent interrupts from entering an
   endless loop
 - General refactoring and additional comments for clarity
 - Returns EOPNOTSUPP instead of EFAULT now if a Counter watch is added
   for unsupported component
 - Renamed COUNTER_SET_WATCH_IOCTL TO COUNTER_ADD_WATCH_IOCTL to make
   the use clear
 - Reimplemented the parent and id members of struct counter_component
   as __u8 instead of __u64; it's unlikely we'll ever have a device that
   supports more than 255 components
 - Reimplement __u64 variables in include/uapi/linux/counter.h as
   __aligned_u64 to prevent 32-bit vs 64-bit alignment issues
 - Fixed return value bug in counter_comp_u8_store(); enums set to a
   value with index > 0 should now work correctly
 - Fixed spectre issues in counter-chrdev.c
 - Removed redundant get_device() call from counter_register()
 - Moved put_device() to after the events_list is freed lest we leak
   memory

I'm skipping the introduction blurb because it was just a rehashing of
information included in the documentation patches within this patchset.
Instead I will focus this cover letter on discussions about this
patchset and the userspace interface implications.

1. Should standard Counter component data types be defined as u8 or u32?

   Many standard Counter component types such COUNTER_COMP_SIGNAL_LEVEL
   have standard values defined (e.g. COUNTER_SIGNAL_LEVEL_LOW and
   COUNTER_SIGNAL_LEVEL_HIGH). These values are currently handled by the
   Counter subsystem code as u8 data types.

   If u32 is used for these values instead, C enum structures could be
   used by driver authors to implicitly cast these values via the driver
   callback parameters.

   This question is primarily addressed to David Lechner. I'm somewhat
   confused about how this setup would look in device drivers. I've gone
   ahead and refactored the code to support u32 enums, and pushed it to
   a separate branch on my repository called counter_chrdev_v6_u32_enum:
   https://gitlab.com/vilhelmgray/iio/-/tree/counter_chrdev_v6_u32_enum

   Please check it out and let me know what you think. Is this the
   support you had in mind? I'm curious to see an example of how would
   your driver callback functions would look in this case. Is everything
   works out fine, then I'll submit this branch as v7 of this patchset.

2. How should we handle "raw" timestamps?

   Ahmad Fatoum brought up the possibility of returning "raw" timestamps
   similar to what the network stack offers (see the network stack
   SOF_TIMESTAMPING_{RAW,SYS}_HARDWARE support).

   I'm not very familiar with the networking stack code, but if I
   understand correctly the SOF_TIMESTAMPING_RAW_HARDWARE timestamps are
   values returned from the device. If so, I suspect we would be able to
   support these "raw" timestamps by defining them as Counter Extensions
   and returning them in struct counter_event elements similar to the
   other Extension values.

William Breathitt Gray (5):
  counter: Internalize sysfs interface code
  docs: counter: Update to reflect sysfs internalization
  counter: Add character device interface
  docs: counter: Document character device interface
  counter: 104-quad-8: Add IRQ support for the ACCES 104-QUAD-8

 Documentation/ABI/testing/sysfs-bus-counter   |   18 +-
 .../ABI/testing/sysfs-bus-counter-104-quad-8  |   32 +
 Documentation/driver-api/generic-counter.rst  |  411 ++++-
 .../userspace-api/ioctl/ioctl-number.rst      |    1 +
 MAINTAINERS                                   |    2 +-
 drivers/counter/104-quad-8.c                  |  778 +++++----
 drivers/counter/Kconfig                       |    6 +-
 drivers/counter/Makefile                      |    1 +
 drivers/counter/counter-chrdev.c              |  476 ++++++
 drivers/counter/counter-chrdev.h              |   16 +
 drivers/counter/counter-core.c                |  183 ++
 drivers/counter/counter-sysfs.c               |  806 +++++++++
 drivers/counter/counter-sysfs.h               |   13 +
 drivers/counter/counter.c                     | 1496 -----------------
 drivers/counter/ftm-quaddec.c                 |   60 +-
 drivers/counter/microchip-tcb-capture.c       |  114 +-
 drivers/counter/stm32-lptimer-cnt.c           |  175 +-
 drivers/counter/stm32-timer-cnt.c             |  145 +-
 drivers/counter/ti-eqep.c                     |  224 +--
 include/linux/counter.h                       |  676 ++++----
 include/linux/counter_enum.h                  |   45 -
 include/uapi/linux/counter.h                  |  105 ++
 22 files changed, 3094 insertions(+), 2689 deletions(-)
 create mode 100644 drivers/counter/counter-chrdev.c
 create mode 100644 drivers/counter/counter-chrdev.h
 create mode 100644 drivers/counter/counter-core.c
 create mode 100644 drivers/counter/counter-sysfs.c
 create mode 100644 drivers/counter/counter-sysfs.h
 delete mode 100644 drivers/counter/counter.c
 delete mode 100644 include/linux/counter_enum.h
 create mode 100644 include/uapi/linux/counter.h

Comments

William Breathitt Gray Nov. 25, 2020, 1:07 p.m. UTC | #1
On Sun, Nov 22, 2020 at 03:29:52PM -0500, William Breathitt Gray wrote:
> @@ -117,62 +112,95 @@ static int ti_eqep_count_write(struct counter_device *counter,
>  	return regmap_write(priv->regmap32, QPOSCNT, val);
>  }
>  
> -static int ti_eqep_function_get(struct counter_device *counter,
> -				struct counter_count *count, size_t *function)
> +static const u8 ti_qep_t2c_functions_map[] = {
> +};

Just a heads-up: this ti_qep_t2c_functions_map array is left over from
some code I was testing. It's not used at all -- I simply forgot to
remove it -- so I'll make sure to take it out in the next patchset. I'll
give this v6 patchset some more time for people to review and comment
before I submit the v7 revision.

William Breathitt Gray
David Lechner Dec. 13, 2020, 11:15 p.m. UTC | #2
On 11/22/20 2:29 PM, William Breathitt Gray wrote:

>   14 files changed, 1806 insertions(+), 2546 deletions(-)

It would be really nice if we could break this down into smaller
pieces and start getting it merged. It is really tough to keep
reviewing this much code in one patch over and over again.

Here are some initial findings from testing:


> +static void counter_device_release(struct device *dev)
> +{
> +	struct counter_device *const counter = dev_get_drvdata(dev);
> +
> +	counter_chrdev_remove(counter);
> +	ida_simple_remove(&counter_ida, counter->id);
> +}


I got the following error after `modprobe -r ti-eqep`:

[ 1186.045766] ------------[ cut here ]------------
[ 1186.050647] WARNING: CPU: 0 PID: 2625 at lib/refcount.c:28 counter_device_release+0x10/0x24 [counter]
[ 1186.059976] refcount_t: underflow; use-after-free.
[ 1186.064790] Modules linked in: aes_arm_bs(+) crypto_simd cryptd ccm usb_f_mass_storage usb_f_acm u_serial usb_f_ecm rfcomm usb_f_rndis u_ether libcomposite aes_arm aes_generic cmac bnep wl18xx wlcore mac80211 libarc4 sha256_generic libsha256 sha256_arm cfg80211 ti_am335x_adc kfifo_buf omap_aes_driver omap_crypto omap_sham crypto_engine pm33xx ti_emif_sram hci_uart omap_rng btbcm rng_core ti_eqep(-) counter bluetooth c_can_platform c_can ecdh_generic bmp280_spi ecc can_dev libaes bmp280_i2c bmp280 industrialio omap_mailbox musb_dsps wlcore_sdio musb_hdrc udc_core usbcore wkup_m3_ipc at24 omap_wdt phy_am335x watchdog phy_am335x_control ti_am335x_tscadc phy_generic wkup_m3_rproc usb_common cppi41 rtc_omap leds_gpio led_class cpufreq_dt pwm_tiehrpwm autofs4
[ 1186.132376] CPU: 0 PID: 2625 Comm: modprobe Not tainted 5.10.0-rc7bone-counter+ #23
[ 1186.140070] Hardware name: Generic AM33XX (Flattened Device Tree)
[ 1186.146225] [<c0110d70>] (unwind_backtrace) from [<c010b640>] (show_stack+0x10/0x14)
[ 1186.154017] [<c010b640>] (show_stack) from [<c09a0c98>] (dump_stack+0xc4/0xe4)
[ 1186.161285] [<c09a0c98>] (dump_stack) from [<c0137ba0>] (__warn+0xd8/0x100)
[ 1186.168284] [<c0137ba0>] (__warn) from [<c099c8e4>] (warn_slowpath_fmt+0x94/0xbc)
[ 1186.175814] [<c099c8e4>] (warn_slowpath_fmt) from [<bf10b0e8>] (counter_device_release+0x10/0x24 [counter])
[ 1186.185632] [<bf10b0e8>] (counter_device_release [counter]) from [<c0667118>] (device_release+0x30/0xa4)
[ 1186.195163] [<c0667118>] (device_release) from [<c057f73c>] (kobject_put+0x94/0x104)
[ 1186.202944] [<c057f73c>] (kobject_put) from [<c057f73c>] (kobject_put+0x94/0x104)
[ 1186.210472] [<c057f73c>] (kobject_put) from [<bf19004c>] (ti_eqep_remove+0x10/0x30 [ti_eqep])
[ 1186.219047] [<bf19004c>] (ti_eqep_remove [ti_eqep]) from [<c066f390>] (platform_drv_remove+0x24/0x3c)
[ 1186.228313] [<c066f390>] (platform_drv_remove) from [<c066d934>] (device_release_driver_internal+0xfc/0x1d0)
[ 1186.238187] [<c066d934>] (device_release_driver_internal) from [<c066da78>] (driver_detach+0x58/0xa8)
[ 1186.247456] [<c066da78>] (driver_detach) from [<c066c5ec>] (bus_remove_driver+0x4c/0xa0)
[ 1186.255594] [<c066c5ec>] (bus_remove_driver) from [<c01dd150>] (sys_delete_module+0x180/0x264)
[ 1186.264250] [<c01dd150>] (sys_delete_module) from [<c0100080>] (ret_fast_syscall+0x0/0x54)
[ 1186.272551] Exception stack(0xd247ffa8 to 0xd247fff0)
[ 1186.277629] ffa0:                   004fb478 004fb478 004fb4b4 00000800 b3bfcf00 00000000
[ 1186.285847] ffc0: 004fb478 004fb478 004fb478 00000081 00000000 be974900 be974a55 004fb478
[ 1186.294062] ffe0: 004f8f5c be97352c 004ddd97 b6d11d68
[ 1186.299253] ---[ end trace e1c61dea091f1078 ]---

> +static ssize_t counter_comp_u8_store(struct device *dev,
> +				     struct device_attribute *attr,
> +				     const char *buf, size_t len)
> +{
> +	const struct counter_attribute *const a = to_counter_attribute(attr);
> +	struct counter_device *const counter = dev_get_drvdata(dev);
> +	struct counter_count *const count = a->parent;
> +	struct counter_synapse *const synapse = a->comp.priv;
> +	const struct counter_available *const avail = a->comp.priv;
> +	int err;
> +	bool bool_data;
> +	int idx;
> +	u8 data;
> +
> +	switch (a->comp.type) {
> +	case COUNTER_COMP_BOOL:
> +		err = kstrtobool(buf, &bool_data);
> +		data = bool_data;
> +		break;
> +	case COUNTER_COMP_FUNCTION:
> +		err = find_in_string_array(&data, count->functions_list,
> +					   count->num_functions, buf,
> +					   counter_function_str);
> +		break;
> +	case COUNTER_COMP_SYNAPSE_ACTION:
> +		err = find_in_string_array(&data, synapse->actions_list,
> +					   synapse->num_actions, buf,
> +					   counter_synapse_action_str);
> +		break;
> +	case COUNTER_COMP_ENUM:
> +		idx = __sysfs_match_string(avail->strs, avail->num_items, buf);
> +		if (idx < 0)
> +			return idx;
> +		data = idx;
> +		break;
> +	case COUNTER_COMP_COUNT_MODE:
> +		err = find_in_string_array(&data, avail->enums,
> +					   avail->num_items, buf,
> +					   counter_count_mode_str);
> +		break;
> +	default:
> +		err = kstrtou8(buf, 0, &data);
> +		break;
> +	}
> +	if (err)

This needs to be `if (err < 0)`. There are cases where the functions
above return positive values. (And to be overly safe, it probably wouldn't
hurt to use err < 0 everywhere - not just in this function.)

> +		return err;
> +
> +	switch (a->scope) {
> +	case COUNTER_SCOPE_DEVICE:
> +		err = a->comp.device_u8_write(counter, data);
> +		break;
> +	case COUNTER_SCOPE_SIGNAL:
> +		err = a->comp.signal_u8_write(counter, a->parent, data);
> +		break;
> +	case COUNTER_SCOPE_COUNT:
> +		if (a->comp.type == COUNTER_COMP_SYNAPSE_ACTION)
> +			err = a->comp.action_write(counter, count, synapse,
> +						   data);
> +		else
> +			err = a->comp.count_u8_write(counter, count, data);
> +		break;
> +	}
> +	if (err)
> +		return err;
> +
> +	return len;
> +}
David Lechner Dec. 13, 2020, 11:15 p.m. UTC | #3
On 11/22/20 2:29 PM, William Breathitt Gray wrote:
> 
> 1. Should standard Counter component data types be defined as u8 or u32?
> 
>     Many standard Counter component types such COUNTER_COMP_SIGNAL_LEVEL
>     have standard values defined (e.g. COUNTER_SIGNAL_LEVEL_LOW and
>     COUNTER_SIGNAL_LEVEL_HIGH). These values are currently handled by the
>     Counter subsystem code as u8 data types.
> 
>     If u32 is used for these values instead, C enum structures could be
>     used by driver authors to implicitly cast these values via the driver
>     callback parameters.
> 
>     This question is primarily addressed to David Lechner. I'm somewhat
>     confused about how this setup would look in device drivers. I've gone
>     ahead and refactored the code to support u32 enums, and pushed it to
>     a separate branch on my repository called counter_chrdev_v6_u32_enum:
>     https://gitlab.com/vilhelmgray/iio/-/tree/counter_chrdev_v6_u32_enum
> 
>     Please check it out and let me know what you think. Is this the
>     support you had in mind? I'm curious to see an example of how would
>     your driver callback functions would look in this case. If everything
>     works out fine, then I'll submit this branch as v7 of this patchset.

I haven't had time to look at this in depth, but just superficially looking
at it, it is mostly there. The driver callback would just use the enum type
in place of u32. For example:

static int ti_eqep_function_write(struct counter_device *counter,
				  struct counter_count *count,
				  enum counter_function function)

and the COUNTER_FUNCTION_* constants would be defined as:

enum counter_function {
	COUNTER_FUNCTION_INCREASE,
	...
};

instead of using #define macros.

One advantage I see to using u8, at least in the user API data structures,
is that it increases the number of events that fit in the kfifo buffer by
a significant factor.

And that is not to say that we couldn't do both: have the user API structs
use u8 for enum values and still use u32/strong enum types internally in
the callback functions.



> 
> 2. How should we handle "raw" timestamps?
> 
>     Ahmad Fatoum brought up the possibility of returning "raw" timestamps
>     similar to what the network stack offers (see the network stack
>     SOF_TIMESTAMPING_{RAW,SYS}_HARDWARE support).
> 
>     I'm not very familiar with the networking stack code, but if I
>     understand correctly the SOF_TIMESTAMPING_RAW_HARDWARE timestamps are
>     values returned from the device. If so, I suspect we would be able to
>     support these "raw" timestamps by defining them as Counter Extensions
>     and returning them in struct counter_event elements similar to the
>     other Extension values.

Is nanosecond resolution good enough? In the TI eQEP driver I considered
returning the raw timer value, but quickly realized that it would not be
very nice to expect the user code to know the clock rate of the timer. It
was very easy to get the clock rate in the kernel and just convert the
timer value to nanoseconds before returning it to userspace.

So if there is some specialized case where it can be solved no other way
besides using raw timestamps, then sure, include it. Otherwise I think we
should stick with nanoseconds for time values when possible.
William Breathitt Gray Dec. 20, 2020, 9:44 p.m. UTC | #4
On Sun, Dec 13, 2020 at 05:15:14PM -0600, David Lechner wrote:
> On 11/22/20 2:29 PM, William Breathitt Gray wrote:
> > 
> > 1. Should standard Counter component data types be defined as u8 or u32?
> > 
> >     Many standard Counter component types such COUNTER_COMP_SIGNAL_LEVEL
> >     have standard values defined (e.g. COUNTER_SIGNAL_LEVEL_LOW and
> >     COUNTER_SIGNAL_LEVEL_HIGH). These values are currently handled by the
> >     Counter subsystem code as u8 data types.
> > 
> >     If u32 is used for these values instead, C enum structures could be
> >     used by driver authors to implicitly cast these values via the driver
> >     callback parameters.
> > 
> >     This question is primarily addressed to David Lechner. I'm somewhat
> >     confused about how this setup would look in device drivers. I've gone
> >     ahead and refactored the code to support u32 enums, and pushed it to
> >     a separate branch on my repository called counter_chrdev_v6_u32_enum:
> >     https://gitlab.com/vilhelmgray/iio/-/tree/counter_chrdev_v6_u32_enum
> > 
> >     Please check it out and let me know what you think. Is this the
> >     support you had in mind? I'm curious to see an example of how would
> >     your driver callback functions would look in this case. If everything
> >     works out fine, then I'll submit this branch as v7 of this patchset.
> 
> I haven't had time to look at this in depth, but just superficially looking
> at it, it is mostly there. The driver callback would just use the enum type
> in place of u32. For example:
> 
> static int ti_eqep_function_write(struct counter_device *counter,
> 				  struct counter_count *count,
> 				  enum counter_function function)
> 
> and the COUNTER_FUNCTION_* constants would be defined as:
> 
> enum counter_function {
> 	COUNTER_FUNCTION_INCREASE,
> 	...
> };
> 
> instead of using #define macros.
> 
> One advantage I see to using u8, at least in the user API data structures,
> is that it increases the number of events that fit in the kfifo buffer by
> a significant factor.
> 
> And that is not to say that we couldn't do both: have the user API structs
> use u8 for enum values and still use u32/strong enum types internally in
> the callback functions.

I'm including David Laight because he initially opposed enums in favor
of fixed size types when we discussed this in an earlier revision:
https://lkml.org/lkml/2020/5/3/159

However, there have been significant changes to this patchset so the
context now is different than those earlier discussions (i.e. we're no
longer discussing ioctl calls).

I think reimplementing these constants as enums as described could work.
If we do so, should the enum constants be given specific values? For
example:

enum counter_function {
	COUNTER_FUNCTION_INCREASE = 0,
	COUNTER_FUNCTION_DECREASE = 1,
	...
};

> 
> > 
> > 2. How should we handle "raw" timestamps?
> > 
> >     Ahmad Fatoum brought up the possibility of returning "raw" timestamps
> >     similar to what the network stack offers (see the network stack
> >     SOF_TIMESTAMPING_{RAW,SYS}_HARDWARE support).
> > 
> >     I'm not very familiar with the networking stack code, but if I
> >     understand correctly the SOF_TIMESTAMPING_RAW_HARDWARE timestamps are
> >     values returned from the device. If so, I suspect we would be able to
> >     support these "raw" timestamps by defining them as Counter Extensions
> >     and returning them in struct counter_event elements similar to the
> >     other Extension values.
> 
> Is nanosecond resolution good enough? In the TI eQEP driver I considered
> returning the raw timer value, but quickly realized that it would not be
> very nice to expect the user code to know the clock rate of the timer. It
> was very easy to get the clock rate in the kernel and just convert the
> timer value to nanoseconds before returning it to userspace.
> 
> So if there is some specialized case where it can be solved no other way
> besides using raw timestamps, then sure, include it. Otherwise I think we
> should stick with nanoseconds for time values when possible.

Given that the struct counter_event 'timestamp' member serves as the
identification vessel for correlating component values to a single event
(i.e. component values of a given event will share the same unique
timestamp), I believe it's prudent to standardize this timestamp format
on the kernel monotonic time as we have currently done so via our
ktime_get_ns() call.

There are cases where it is understandably better to use a timestamp
provided directly by the hardware (e.g. keeping timestamping close to
data collection). For these cases, we can retrieve these "raw"
timestamps via a Counter Extension: users would get their "raw"
timestamp via the struct counter_event 'value' member, and just treat
the 'timestamp' member as a unique event identification number.

William Breathitt Gray
William Breathitt Gray Dec. 20, 2020, 10:11 p.m. UTC | #5
On Sun, Dec 13, 2020 at 05:15:00PM -0600, David Lechner wrote:
> On 11/22/20 2:29 PM, William Breathitt Gray wrote:
> 
> >   14 files changed, 1806 insertions(+), 2546 deletions(-)
> 
> It would be really nice if we could break this down into smaller
> pieces and start getting it merged. It is really tough to keep
> reviewing this much code in one patch over and over again.

Yes, this is a pretty massive patch. I could break this across the
individual files affected to make it simpler to review, but in the end
all those patches would need to end up squashed together before merge
again (for the sake of git bisect), so the effort feels somewhat moot.

Luckily, I don't think there will be much change in the next revision
since it's looking like it'll mainly be a few bug fixes; hopefully this
coming version 7 will be the final revision before merge.

> Here are some initial findings from testing:
> 
> 
> > +static void counter_device_release(struct device *dev)
> > +{
> > +	struct counter_device *const counter = dev_get_drvdata(dev);
> > +
> > +	counter_chrdev_remove(counter);
> > +	ida_simple_remove(&counter_ida, counter->id);
> > +}
> 
> 
> I got the following error after `modprobe -r ti-eqep`:
> 
> [ 1186.045766] ------------[ cut here ]------------
> [ 1186.050647] WARNING: CPU: 0 PID: 2625 at lib/refcount.c:28 counter_device_release+0x10/0x24 [counter]
> [ 1186.059976] refcount_t: underflow; use-after-free.
> [ 1186.064790] Modules linked in: aes_arm_bs(+) crypto_simd cryptd ccm usb_f_mass_storage usb_f_acm u_serial usb_f_ecm rfcomm usb_f_rndis u_ether libcomposite aes_arm aes_generic cmac bnep wl18xx wlcore mac80211 libarc4 sha256_generic libsha256 sha256_arm cfg80211 ti_am335x_adc kfifo_buf omap_aes_driver omap_crypto omap_sham crypto_engine pm33xx ti_emif_sram hci_uart omap_rng btbcm rng_core ti_eqep(-) counter bluetooth c_can_platform c_can ecdh_generic bmp280_spi ecc can_dev libaes bmp280_i2c bmp280 industrialio omap_mailbox musb_dsps wlcore_sdio musb_hdrc udc_core usbcore wkup_m3_ipc at24 omap_wdt phy_am335x watchdog phy_am335x_control ti_am335x_tscadc phy_generic wkup_m3_rproc usb_common cppi41 rtc_omap leds_gpio led_class cpufreq_dt pwm_tiehrpwm autofs4
> [ 1186.132376] CPU: 0 PID: 2625 Comm: modprobe Not tainted 5.10.0-rc7bone-counter+ #23
> [ 1186.140070] Hardware name: Generic AM33XX (Flattened Device Tree)
> [ 1186.146225] [<c0110d70>] (unwind_backtrace) from [<c010b640>] (show_stack+0x10/0x14)
> [ 1186.154017] [<c010b640>] (show_stack) from [<c09a0c98>] (dump_stack+0xc4/0xe4)
> [ 1186.161285] [<c09a0c98>] (dump_stack) from [<c0137ba0>] (__warn+0xd8/0x100)
> [ 1186.168284] [<c0137ba0>] (__warn) from [<c099c8e4>] (warn_slowpath_fmt+0x94/0xbc)
> [ 1186.175814] [<c099c8e4>] (warn_slowpath_fmt) from [<bf10b0e8>] (counter_device_release+0x10/0x24 [counter])
> [ 1186.185632] [<bf10b0e8>] (counter_device_release [counter]) from [<c0667118>] (device_release+0x30/0xa4)
> [ 1186.195163] [<c0667118>] (device_release) from [<c057f73c>] (kobject_put+0x94/0x104)
> [ 1186.202944] [<c057f73c>] (kobject_put) from [<c057f73c>] (kobject_put+0x94/0x104)
> [ 1186.210472] [<c057f73c>] (kobject_put) from [<bf19004c>] (ti_eqep_remove+0x10/0x30 [ti_eqep])
> [ 1186.219047] [<bf19004c>] (ti_eqep_remove [ti_eqep]) from [<c066f390>] (platform_drv_remove+0x24/0x3c)
> [ 1186.228313] [<c066f390>] (platform_drv_remove) from [<c066d934>] (device_release_driver_internal+0xfc/0x1d0)
> [ 1186.238187] [<c066d934>] (device_release_driver_internal) from [<c066da78>] (driver_detach+0x58/0xa8)
> [ 1186.247456] [<c066da78>] (driver_detach) from [<c066c5ec>] (bus_remove_driver+0x4c/0xa0)
> [ 1186.255594] [<c066c5ec>] (bus_remove_driver) from [<c01dd150>] (sys_delete_module+0x180/0x264)
> [ 1186.264250] [<c01dd150>] (sys_delete_module) from [<c0100080>] (ret_fast_syscall+0x0/0x54)
> [ 1186.272551] Exception stack(0xd247ffa8 to 0xd247fff0)
> [ 1186.277629] ffa0:                   004fb478 004fb478 004fb4b4 00000800 b3bfcf00 00000000
> [ 1186.285847] ffc0: 004fb478 004fb478 004fb478 00000081 00000000 be974900 be974a55 004fb478
> [ 1186.294062] ffe0: 004f8f5c be97352c 004ddd97 b6d11d68
> [ 1186.299253] ---[ end trace e1c61dea091f1078 ]---

I noticed that I'm calling counter_chrdev_remove() twice: once in
counter_unregister(), and again in counter_device_release(). I suspect
this is what's causing the refcount to underflow. I'll test and verify
that this is the culprit.

In fact, I don't think I need to define a counter_device_release()
callback at all, would I? These cleanup function calls could be moved to
counter_unregister() instead.

> > +static ssize_t counter_comp_u8_store(struct device *dev,
> > +				     struct device_attribute *attr,
> > +				     const char *buf, size_t len)
> > +{
> > +	const struct counter_attribute *const a = to_counter_attribute(attr);
> > +	struct counter_device *const counter = dev_get_drvdata(dev);
> > +	struct counter_count *const count = a->parent;
> > +	struct counter_synapse *const synapse = a->comp.priv;
> > +	const struct counter_available *const avail = a->comp.priv;
> > +	int err;
> > +	bool bool_data;
> > +	int idx;
> > +	u8 data;
> > +
> > +	switch (a->comp.type) {
> > +	case COUNTER_COMP_BOOL:
> > +		err = kstrtobool(buf, &bool_data);
> > +		data = bool_data;
> > +		break;
> > +	case COUNTER_COMP_FUNCTION:
> > +		err = find_in_string_array(&data, count->functions_list,
> > +					   count->num_functions, buf,
> > +					   counter_function_str);
> > +		break;
> > +	case COUNTER_COMP_SYNAPSE_ACTION:
> > +		err = find_in_string_array(&data, synapse->actions_list,
> > +					   synapse->num_actions, buf,
> > +					   counter_synapse_action_str);
> > +		break;
> > +	case COUNTER_COMP_ENUM:
> > +		idx = __sysfs_match_string(avail->strs, avail->num_items, buf);
> > +		if (idx < 0)
> > +			return idx;
> > +		data = idx;
> > +		break;
> > +	case COUNTER_COMP_COUNT_MODE:
> > +		err = find_in_string_array(&data, avail->enums,
> > +					   avail->num_items, buf,
> > +					   counter_count_mode_str);
> > +		break;
> > +	default:
> > +		err = kstrtou8(buf, 0, &data);
> > +		break;
> > +	}
> > +	if (err)
> 
> This needs to be `if (err < 0)`. There are cases where the functions
> above return positive values. (And to be overly safe, it probably wouldn't
> hurt to use err < 0 everywhere - not just in this function.)

Ack.

William Breathitt Gray
David Lechner Dec. 21, 2020, 3:19 p.m. UTC | #6
On 12/20/20 3:44 PM, William Breathitt Gray wrote:
> On Sun, Dec 13, 2020 at 05:15:14PM -0600, David Lechner wrote:
>> On 11/22/20 2:29 PM, William Breathitt Gray wrote:
>>>
>>> 1. Should standard Counter component data types be defined as u8 or u32?
>>>
>>>      Many standard Counter component types such COUNTER_COMP_SIGNAL_LEVEL
>>>      have standard values defined (e.g. COUNTER_SIGNAL_LEVEL_LOW and
>>>      COUNTER_SIGNAL_LEVEL_HIGH). These values are currently handled by the
>>>      Counter subsystem code as u8 data types.
>>>
>>>      If u32 is used for these values instead, C enum structures could be
>>>      used by driver authors to implicitly cast these values via the driver
>>>      callback parameters.
>>>
>>>      This question is primarily addressed to David Lechner. I'm somewhat
>>>      confused about how this setup would look in device drivers. I've gone
>>>      ahead and refactored the code to support u32 enums, and pushed it to
>>>      a separate branch on my repository called counter_chrdev_v6_u32_enum:
>>>      https://gitlab.com/vilhelmgray/iio/-/tree/counter_chrdev_v6_u32_enum
>>>
>>>      Please check it out and let me know what you think. Is this the
>>>      support you had in mind? I'm curious to see an example of how would
>>>      your driver callback functions would look in this case. If everything
>>>      works out fine, then I'll submit this branch as v7 of this patchset.
>>
>> I haven't had time to look at this in depth, but just superficially looking
>> at it, it is mostly there. The driver callback would just use the enum type
>> in place of u32. For example:
>>
>> static int ti_eqep_function_write(struct counter_device *counter,
>> 				  struct counter_count *count,
>> 				  enum counter_function function)
>>
>> and the COUNTER_FUNCTION_* constants would be defined as:
>>
>> enum counter_function {
>> 	COUNTER_FUNCTION_INCREASE,
>> 	...
>> };
>>
>> instead of using #define macros.
>>
>> One advantage I see to using u8, at least in the user API data structures,
>> is that it increases the number of events that fit in the kfifo buffer by
>> a significant factor.
>>
>> And that is not to say that we couldn't do both: have the user API structs
>> use u8 for enum values and still use u32/strong enum types internally in
>> the callback functions.
> 
> I'm including David Laight because he initially opposed enums in favor
> of fixed size types when we discussed this in an earlier revision:
> https://lkml.org/lkml/2020/5/3/159
> 
> However, there have been significant changes to this patchset so the
> context now is different than those earlier discussions (i.e. we're no
> longer discussing ioctl calls).
> 
> I think reimplementing these constants as enums as described could work.
> If we do so, should the enum constants be given specific values? For
> example:
> 
> enum counter_function {
> 	COUNTER_FUNCTION_INCREASE = 0,
> 	COUNTER_FUNCTION_DECREASE = 1,
> 	...
> };

I would say no on the explicit values since they don't have
any significant meaning.
David Lechner Dec. 21, 2020, 3:26 p.m. UTC | #7
On 12/20/20 4:11 PM, William Breathitt Gray wrote:
> On Sun, Dec 13, 2020 at 05:15:00PM -0600, David Lechner wrote:
>> On 11/22/20 2:29 PM, William Breathitt Gray wrote:
>>
>>>    14 files changed, 1806 insertions(+), 2546 deletions(-)
>>
>> It would be really nice if we could break this down into smaller
>> pieces and start getting it merged. It is really tough to keep
>> reviewing this much code in one patch over and over again.
> 
> Yes, this is a pretty massive patch. I could break this across the
> individual files affected to make it simpler to review, but in the end
> all those patches would need to end up squashed together before merge
> again (for the sake of git bisect), so the effort feels somewhat moot.
> 
> Luckily, I don't think there will be much change in the next revision
> since it's looking like it'll mainly be a few bug fixes; hopefully this
> coming version 7 will be the final revision before merge.
> 
>> Here are some initial findings from testing:
>>
>>
>>> +static void counter_device_release(struct device *dev)
>>> +{
>>> +	struct counter_device *const counter = dev_get_drvdata(dev);
>>> +
>>> +	counter_chrdev_remove(counter);
>>> +	ida_simple_remove(&counter_ida, counter->id);
>>> +}
>>
>>
>> I got the following error after `modprobe -r ti-eqep`:
>>
>> [ 1186.045766] ------------[ cut here ]------------
>> [ 1186.050647] WARNING: CPU: 0 PID: 2625 at lib/refcount.c:28 counter_device_release+0x10/0x24 [counter]
>> [ 1186.059976] refcount_t: underflow; use-after-free.
>> [ 1186.064790] Modules linked in: aes_arm_bs(+) crypto_simd cryptd ccm usb_f_mass_storage usb_f_acm u_serial usb_f_ecm rfcomm usb_f_rndis u_ether libcomposite aes_arm aes_generic cmac bnep wl18xx wlcore mac80211 libarc4 sha256_generic libsha256 sha256_arm cfg80211 ti_am335x_adc kfifo_buf omap_aes_driver omap_crypto omap_sham crypto_engine pm33xx ti_emif_sram hci_uart omap_rng btbcm rng_core ti_eqep(-) counter bluetooth c_can_platform c_can ecdh_generic bmp280_spi ecc can_dev libaes bmp280_i2c bmp280 industrialio omap_mailbox musb_dsps wlcore_sdio musb_hdrc udc_core usbcore wkup_m3_ipc at24 omap_wdt phy_am335x watchdog phy_am335x_control ti_am335x_tscadc phy_generic wkup_m3_rproc usb_common cppi41 rtc_omap leds_gpio led_class cpufreq_dt pwm_tiehrpwm autofs4
>> [ 1186.132376] CPU: 0 PID: 2625 Comm: modprobe Not tainted 5.10.0-rc7bone-counter+ #23
>> [ 1186.140070] Hardware name: Generic AM33XX (Flattened Device Tree)
>> [ 1186.146225] [<c0110d70>] (unwind_backtrace) from [<c010b640>] (show_stack+0x10/0x14)
>> [ 1186.154017] [<c010b640>] (show_stack) from [<c09a0c98>] (dump_stack+0xc4/0xe4)
>> [ 1186.161285] [<c09a0c98>] (dump_stack) from [<c0137ba0>] (__warn+0xd8/0x100)
>> [ 1186.168284] [<c0137ba0>] (__warn) from [<c099c8e4>] (warn_slowpath_fmt+0x94/0xbc)
>> [ 1186.175814] [<c099c8e4>] (warn_slowpath_fmt) from [<bf10b0e8>] (counter_device_release+0x10/0x24 [counter])
>> [ 1186.185632] [<bf10b0e8>] (counter_device_release [counter]) from [<c0667118>] (device_release+0x30/0xa4)
>> [ 1186.195163] [<c0667118>] (device_release) from [<c057f73c>] (kobject_put+0x94/0x104)
>> [ 1186.202944] [<c057f73c>] (kobject_put) from [<c057f73c>] (kobject_put+0x94/0x104)
>> [ 1186.210472] [<c057f73c>] (kobject_put) from [<bf19004c>] (ti_eqep_remove+0x10/0x30 [ti_eqep])
>> [ 1186.219047] [<bf19004c>] (ti_eqep_remove [ti_eqep]) from [<c066f390>] (platform_drv_remove+0x24/0x3c)
>> [ 1186.228313] [<c066f390>] (platform_drv_remove) from [<c066d934>] (device_release_driver_internal+0xfc/0x1d0)
>> [ 1186.238187] [<c066d934>] (device_release_driver_internal) from [<c066da78>] (driver_detach+0x58/0xa8)
>> [ 1186.247456] [<c066da78>] (driver_detach) from [<c066c5ec>] (bus_remove_driver+0x4c/0xa0)
>> [ 1186.255594] [<c066c5ec>] (bus_remove_driver) from [<c01dd150>] (sys_delete_module+0x180/0x264)
>> [ 1186.264250] [<c01dd150>] (sys_delete_module) from [<c0100080>] (ret_fast_syscall+0x0/0x54)
>> [ 1186.272551] Exception stack(0xd247ffa8 to 0xd247fff0)
>> [ 1186.277629] ffa0:                   004fb478 004fb478 004fb4b4 00000800 b3bfcf00 00000000
>> [ 1186.285847] ffc0: 004fb478 004fb478 004fb478 00000081 00000000 be974900 be974a55 004fb478
>> [ 1186.294062] ffe0: 004f8f5c be97352c 004ddd97 b6d11d68
>> [ 1186.299253] ---[ end trace e1c61dea091f1078 ]---
> 
> I noticed that I'm calling counter_chrdev_remove() twice: once in
> counter_unregister(), and again in counter_device_release(). I suspect
> this is what's causing the refcount to underflow. I'll test and verify
> that this is the culprit.
> 
> In fact, I don't think I need to define a counter_device_release()
> callback at all, would I? These cleanup function calls could be moved to
> counter_unregister() instead.

As long as a user program keeps a chrdev open, it holds a reference to
the device, so I think it needs to be the other way around. (Unless it
is impossible to call counter_unregister() before all references have
been released - but I don't think that is the case - not 100% sure.)