diff mbox series

[v7,1/9] firmware: arm_scmi: Add notification protocol-registration

Message ID 20200504163855.54548-2-cristian.marussi@arm.com (mailing list archive)
State New, archived
Headers show
Series SCMI Notifications Core Support | expand

Commit Message

Cristian Marussi May 4, 2020, 4:38 p.m. UTC
Add core SCMI Notifications protocol-registration support: allow protocols
to register their own set of supported events, during their initialization
phase. Notification core can track multiple platform instances by their
handles.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
---
V4 --> V5
- fixed kernel-doc
- added barriers for registered protocols and events
- using kfifo_alloc and devm_add_action_or_reset
V3 --> V4
- removed scratch ISR buffer, move scratch BH buffer into protocol
  descriptor
- converted registered_protocols and registered_events from hashtables
  into bare fixed-sized arrays
- removed unregister protocols' routines (never called really)
V2 --> V3
- added scmi_notify_instance to track target platform instance
V1 --> V2
- splitted out of V1 patch 04
- moved from IDR maps to real HashTables to store events
- scmi_notifications_initialized is now an atomic_t
- reviewed protocol registration/unregistration to use devres
- fixed:
  drivers/firmware/arm_scmi/notify.c:483:18-23: ERROR:
  	reference preceded by free on line 482

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
---
 drivers/firmware/arm_scmi/Makefile |   2 +-
 drivers/firmware/arm_scmi/common.h |   4 +
 drivers/firmware/arm_scmi/notify.c | 444 +++++++++++++++++++++++++++++
 drivers/firmware/arm_scmi/notify.h |  56 ++++
 include/linux/scmi_protocol.h      |   3 +
 5 files changed, 508 insertions(+), 1 deletion(-)
 create mode 100644 drivers/firmware/arm_scmi/notify.c
 create mode 100644 drivers/firmware/arm_scmi/notify.h

Comments

Dave Martin May 6, 2020, 3:25 p.m. UTC | #1
On Mon, May 04, 2020 at 05:38:47PM +0100, Cristian Marussi wrote:
> Add core SCMI Notifications protocol-registration support: allow protocols
> to register their own set of supported events, during their initialization
> phase. Notification core can track multiple platform instances by their
> handles.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
> ---
> V4 --> V5
> - fixed kernel-doc
> - added barriers for registered protocols and events
> - using kfifo_alloc and devm_add_action_or_reset
> V3 --> V4
> - removed scratch ISR buffer, move scratch BH buffer into protocol
>   descriptor
> - converted registered_protocols and registered_events from hashtables
>   into bare fixed-sized arrays
> - removed unregister protocols' routines (never called really)
> V2 --> V3
> - added scmi_notify_instance to track target platform instance
> V1 --> V2
> - splitted out of V1 patch 04
> - moved from IDR maps to real HashTables to store events
> - scmi_notifications_initialized is now an atomic_t
> - reviewed protocol registration/unregistration to use devres
> - fixed:
>   drivers/firmware/arm_scmi/notify.c:483:18-23: ERROR:
>   	reference preceded by free on line 482
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Reported-by: Julia Lawall <julia.lawall@lip6.fr>
> ---
>  drivers/firmware/arm_scmi/Makefile |   2 +-
>  drivers/firmware/arm_scmi/common.h |   4 +
>  drivers/firmware/arm_scmi/notify.c | 444 +++++++++++++++++++++++++++++
>  drivers/firmware/arm_scmi/notify.h |  56 ++++
>  include/linux/scmi_protocol.h      |   3 +
>  5 files changed, 508 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/firmware/arm_scmi/notify.c
>  create mode 100644 drivers/firmware/arm_scmi/notify.h

[...]

> diff --git a/drivers/firmware/arm_scmi/notify.c b/drivers/firmware/arm_scmi/notify.c

[...]

> +int scmi_register_protocol_events(const struct scmi_handle *handle,
> +				  u8 proto_id, size_t queue_sz,
> +				  const struct scmi_protocol_event_ops *ops,
> +				  const struct scmi_event *evt, int num_events,
> +				  int num_sources)
> +{
> +	int i;
> +	size_t payld_sz = 0;
> +	struct scmi_registered_protocol_events_desc *pd;
> +	struct scmi_notify_instance *ni = handle->notify_priv;
> +
> +	if (!ops || !evt || proto_id >= SCMI_MAX_PROTO)
> +		return -EINVAL;
> +
> +	/* Ensure atomic value is updated */
> +	smp_mb__before_atomic();
> +	if (unlikely(!ni || !atomic_read(&ni->initialized)))
> +		return -EAGAIN;

The atomics/barriers don't look quite right to me here.

I'd have expected:

scmi_register_protocol_events()
{
	if (atomic_read(&ni->initialized))
		return -EAGAIN;
	smp_mb_after_atomic();

	/* ... */
}

to pair with:

scmi_notification_init()
{
	/* ... */

	smp_mb__before_atomic();
	atomic_set(&ni->enabled, 1);
}


...however, do we need to allow these two functions to race with each
other at all?  (I haven't tried to understand the wider context here,
so if there really is no way to avoid initialisation racing with use I
guess we may have to do something like this.  We don't want callers
to dumbly spin on this function though.)


In other patches in the series, calls to scmi_register_protocol_events()
seem to be assuming there is no race: the return value is not checked.
Possibly a bug?


I'm not sure about scmi_notification_exit() (see below).

> +
> +	/* Attach to the notification main devres group */
> +	if (!devres_open_group(ni->handle->dev, ni->gid, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	for (i = 0; i < num_events; i++)
> +		payld_sz = max_t(size_t, payld_sz, evt[i].max_payld_sz);
> +	pd = scmi_allocate_registered_protocol_desc(ni, proto_id, queue_sz,
> +				    sizeof(struct scmi_event_header) + payld_sz,
> +						    num_events, ops);
> +	if (IS_ERR(pd))
> +		goto err;
> +
> +	for (i = 0; i < num_events; i++, evt++) {
> +		struct scmi_registered_event *r_evt;
> +
> +		r_evt = devm_kzalloc(ni->handle->dev, sizeof(*r_evt),
> +				     GFP_KERNEL);
> +		if (!r_evt)
> +			goto err;
> +		r_evt->proto = pd;
> +		r_evt->evt = evt;
> +
> +		r_evt->sources = devm_kcalloc(ni->handle->dev, num_sources,
> +					      sizeof(refcount_t), GFP_KERNEL);
> +		if (!r_evt->sources)
> +			goto err;
> +		r_evt->num_sources = num_sources;
> +		mutex_init(&r_evt->sources_mtx);
> +
> +		r_evt->report = devm_kzalloc(ni->handle->dev,
> +					     evt->max_report_sz, GFP_KERNEL);
> +		if (!r_evt->report)
> +			goto err;
> +
> +		pd->registered_events[i] = r_evt;
> +		/* Ensure events are updated */
> +		smp_wmb();
> +		pr_info("SCMI Notifications: registered event - %X\n",
> +			MAKE_ALL_SRCS_KEY(r_evt->proto->id, r_evt->evt->id));
> +	}
> +
> +	/* Register protocol and events...it will never be removed */
> +	ni->registered_protocols[proto_id] = pd;
> +	/* Ensure protocols are updated */
> +	smp_wmb();
> +
> +	devres_close_group(ni->handle->dev, ni->gid);
> +
> +	return 0;
> +
> +err:
> +	pr_warn("SCMI Notifications - Proto:%X - Registration Failed !\n",
> +		proto_id);
> +	/* A failing protocol registration does not trigger full failure */
> +	devres_close_group(ni->handle->dev, ni->gid);
> +
> +	return -ENOMEM;
> +}
> +
> +/**
> + * scmi_notification_init()  - Initializes Notification Core Support
> + * @handle: The handle identifying the platform instance to initialize
> + *
> + * This function lays out all the basic resources needed by the notification
> + * core instance identified by the provided handle: once done, all of the
> + * SCMI Protocols can register their events with the core during their own
> + * initializations.
> + *
> + * Note that failing to initialize the core notifications support does not
> + * cause the whole SCMI Protocols stack to fail its initialization.
> + *
> + * SCMI Notification Initialization happens in 2 steps:
> + * * initialization: basic common allocations (this function) -> @initialized
> + * * registration: protocols asynchronously come into life and registers their
> + *		   own supported list of events with the core; this causes
> + *		   further per-protocol allocations
> + *
> + * Any user's callback registration attempt, referring a still not registered
> + * event, will be registered as pending and finalized later (if possible)
> + * by scmi_protocols_late_init() work.
> + * This allows for lazy initialization of SCMI Protocols due to late (or
> + * missing) SCMI drivers' modules loading.
> + *
> + * Return: 0 on Success
> + */
> +int scmi_notification_init(struct scmi_handle *handle)
> +{
> +	void *gid;
> +	struct scmi_notify_instance *ni;
> +
> +	gid = devres_open_group(handle->dev, NULL, GFP_KERNEL);
> +	if (!gid)
> +		return -ENOMEM;
> +
> +	ni = devm_kzalloc(handle->dev, sizeof(*ni), GFP_KERNEL);
> +	if (!ni)
> +		goto err;
> +
> +	ni->gid = gid;
> +	ni->handle = handle;
> +
> +	ni->registered_protocols = devm_kcalloc(handle->dev, SCMI_MAX_PROTO,
> +						sizeof(char *), GFP_KERNEL);
> +	if (!ni->registered_protocols)
> +		goto err;
> +
> +	handle->notify_priv = ni;
> +
> +	atomic_set(&ni->initialized, 1);
> +	atomic_set(&ni->enabled, 1);
> +	/* Ensure atomic values are updated */
> +	smp_mb__after_atomic();
> +
> +	pr_info("SCMI Notifications Core Initialized.\n");
> +
> +	devres_close_group(handle->dev, ni->gid);
> +
> +	return 0;
> +
> +err:
> +	pr_warn("SCMI Notifications - Initialization Failed.\n");
> +	devres_release_group(handle->dev, NULL);
> +	return -ENOMEM;
> +}
> +
> +/**
> + * scmi_notification_exit()  - Shutdown and clean Notification core
> + * @handle: The handle identifying the platform instance to shutdown
> + */
> +void scmi_notification_exit(struct scmi_handle *handle)
> +{
> +	struct scmi_notify_instance *ni = handle->notify_priv;
> +
> +	if (unlikely(!ni || !atomic_read(&ni->initialized)))
> +		return;
> +
> +	atomic_set(&ni->enabled, 0);
> +	/* Ensure atomic values are updated */
> +	smp_mb__after_atomic();

If users can race with this, we're dead: the atomic by itself doesn't
ensure that handle is not in use once we arrive here.  Should this
be a refcount instead?

If users can't race with this, we probably don't protection here.


I may be misunderstanding what this code is doing...

Cheers
---Dave
Cristian Marussi May 11, 2020, 10:04 p.m. UTC | #2
Hi Dave

thanks for the review first of all.

On Wed, May 06, 2020 at 04:25:50PM +0100, Dave Martin wrote:
> On Mon, May 04, 2020 at 05:38:47PM +0100, Cristian Marussi wrote:
> > Add core SCMI Notifications protocol-registration support: allow protocols
> > to register their own set of supported events, during their initialization
> > phase. Notification core can track multiple platform instances by their
> > handles.
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
> > ---
> > V4 --> V5
> > - fixed kernel-doc
> > - added barriers for registered protocols and events
> > - using kfifo_alloc and devm_add_action_or_reset
> > V3 --> V4
> > - removed scratch ISR buffer, move scratch BH buffer into protocol
> >   descriptor
> > - converted registered_protocols and registered_events from hashtables
> >   into bare fixed-sized arrays
> > - removed unregister protocols' routines (never called really)
> > V2 --> V3
> > - added scmi_notify_instance to track target platform instance
> > V1 --> V2
> > - splitted out of V1 patch 04
> > - moved from IDR maps to real HashTables to store events
> > - scmi_notifications_initialized is now an atomic_t
> > - reviewed protocol registration/unregistration to use devres
> > - fixed:
> >   drivers/firmware/arm_scmi/notify.c:483:18-23: ERROR:
> >   	reference preceded by free on line 482
> > 
> > Reported-by: kbuild test robot <lkp@intel.com>
> > Reported-by: Julia Lawall <julia.lawall@lip6.fr>
> > ---
> >  drivers/firmware/arm_scmi/Makefile |   2 +-
> >  drivers/firmware/arm_scmi/common.h |   4 +
> >  drivers/firmware/arm_scmi/notify.c | 444 +++++++++++++++++++++++++++++
> >  drivers/firmware/arm_scmi/notify.h |  56 ++++
> >  include/linux/scmi_protocol.h      |   3 +
> >  5 files changed, 508 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/firmware/arm_scmi/notify.c
> >  create mode 100644 drivers/firmware/arm_scmi/notify.h
> 
> [...]
> 
> > diff --git a/drivers/firmware/arm_scmi/notify.c b/drivers/firmware/arm_scmi/notify.c
> 
> [...]
> 
> > +int scmi_register_protocol_events(const struct scmi_handle *handle,
> > +				  u8 proto_id, size_t queue_sz,
> > +				  const struct scmi_protocol_event_ops *ops,
> > +				  const struct scmi_event *evt, int num_events,
> > +				  int num_sources)
> > +{
> > +	int i;
> > +	size_t payld_sz = 0;
> > +	struct scmi_registered_protocol_events_desc *pd;
> > +	struct scmi_notify_instance *ni = handle->notify_priv;
> > +
> > +	if (!ops || !evt || proto_id >= SCMI_MAX_PROTO)
> > +		return -EINVAL;
> > +
> > +	/* Ensure atomic value is updated */
> > +	smp_mb__before_atomic();
> > +	if (unlikely(!ni || !atomic_read(&ni->initialized)))
> > +		return -EAGAIN;
> 
> The atomics/barriers don't look quite right to me here.
> 
> I'd have expected:
> 
> scmi_register_protocol_events()
> {
> 	if (atomic_read(&ni->initialized))
> 		return -EAGAIN;
> 	smp_mb_after_atomic();
> 
> 	/* ... */
> }
> 
> to pair with:
> 
> scmi_notification_init()
> {
> 	/* ... */
> 
> 	smp_mb__before_atomic();
> 	atomic_set(&ni->enabled, 1);
> }
> 
> 
> ...however, do we need to allow these two functions to race with each
> other at all?  (I haven't tried to understand the wider context here,
> so if there really is no way to avoid initialisation racing with use I
> guess we may have to do something like this.  We don't want callers
> to dumbly spin on this function though.)
> 
> 
> In other patches in the series, calls to scmi_register_protocol_events()
> seem to be assuming there is no race: the return value is not checked.
> Possibly a bug?
> 

I think you are right in these regards, there's no need of an atomic here
for 'initialized' and using -EAGAIN on !initialized as error code in
scmi_register_protocol_events() is wrong too in this context.

The aim is to detect when general SCMI notification core initialization has
failed as a whole and in that case inhibit any further SCMI protocols events'
registration attempt during general protocol init (since init has failed and
no related data has been allocated and readied).
No attempt should be made to re-register on failure because the failure to
init the notif stuff is permanent when happens (not solvable with deferred
re-probing) and there's no race in checking this condition
(more on this down below)

So I think I'll just drop the 'initialized' field as a whole and simply use
the value of handle->notify_priv (ni) to detect if initialization was
successfull or not, like:

scmi_register_protocol_events()
{
	...
	if (unlikely(!ni))  		// ni is NULL when init has failed
		return -ENOMEM;
}

(..plus barriers where needed)

and could probably check the ret value too in the caller to printout some
debug info in case of other-than-ENOMEM errors. (due anyway to a broken
implementation not to runtime errors)

As said there's no race to initialized (or ni itself) in fact because the
sequence in SCMI general init is roughly as follows:

a. SCMI bus is created and registered
b. SCMI protocol driver (a platform driver) is probed and an SCMI handle is
   initialized for each distinct SCMI plaform (if any) defined in the DT,
   and then:
  -> scmi_notification_init()
  -> scmi_create_protocol_devices:
     DT-defined and fw-supported SCMI protocol devices are created (this does
     NOT cause automatically protocol initialization though...)
	--> scmi_set_handle()
c. An SCMI driver using protoX is finally loaded/initialized (say scmi-cpufreq)
   and the related protocol device is probed, and, if this was the first device
   to be probe relying on protoX, protocol X itself (say perf.c) is finally
   initialized:
  __probe():
  -> if !handle	return -EPROBEDEFER
  -> ...
  -> X.scmi_protocol_init()
    ...
    -> scmi_register_protocol_events(X..)

b. and c. could indeed happen out of order and lead to a deferred reprobe of c.
once b. has finally completed, BUT the reprobe is triggered anyway before any
protocol init and event registration can start, so when ni == NULL
in scmi_register_protocol(), the initialization has been effectively attempted
and has failed (-ENOMEM) and so makes no sense to retry further.

> 
> I'm not sure about scmi_notification_exit() (see below).
> 
> > +
> > +	/* Attach to the notification main devres group */
> > +	if (!devres_open_group(ni->handle->dev, ni->gid, GFP_KERNEL))
> > +		return -ENOMEM;
> > +
> > +	for (i = 0; i < num_events; i++)
> > +		payld_sz = max_t(size_t, payld_sz, evt[i].max_payld_sz);
> > +	pd = scmi_allocate_registered_protocol_desc(ni, proto_id, queue_sz,
> > +				    sizeof(struct scmi_event_header) + payld_sz,
> > +						    num_events, ops);
> > +	if (IS_ERR(pd))
> > +		goto err;
> > +
> > +	for (i = 0; i < num_events; i++, evt++) {
> > +		struct scmi_registered_event *r_evt;
> > +
> > +		r_evt = devm_kzalloc(ni->handle->dev, sizeof(*r_evt),
> > +				     GFP_KERNEL);
> > +		if (!r_evt)
> > +			goto err;
> > +		r_evt->proto = pd;
> > +		r_evt->evt = evt;
> > +
> > +		r_evt->sources = devm_kcalloc(ni->handle->dev, num_sources,
> > +					      sizeof(refcount_t), GFP_KERNEL);
> > +		if (!r_evt->sources)
> > +			goto err;
> > +		r_evt->num_sources = num_sources;
> > +		mutex_init(&r_evt->sources_mtx);
> > +
> > +		r_evt->report = devm_kzalloc(ni->handle->dev,
> > +					     evt->max_report_sz, GFP_KERNEL);
> > +		if (!r_evt->report)
> > +			goto err;
> > +
> > +		pd->registered_events[i] = r_evt;
> > +		/* Ensure events are updated */
> > +		smp_wmb();
> > +		pr_info("SCMI Notifications: registered event - %X\n",
> > +			MAKE_ALL_SRCS_KEY(r_evt->proto->id, r_evt->evt->id));
> > +	}
> > +
> > +	/* Register protocol and events...it will never be removed */
> > +	ni->registered_protocols[proto_id] = pd;
> > +	/* Ensure protocols are updated */
> > +	smp_wmb();
> > +
> > +	devres_close_group(ni->handle->dev, ni->gid);
> > +
> > +	return 0;
> > +
> > +err:
> > +	pr_warn("SCMI Notifications - Proto:%X - Registration Failed !\n",
> > +		proto_id);
> > +	/* A failing protocol registration does not trigger full failure */
> > +	devres_close_group(ni->handle->dev, ni->gid);
> > +
> > +	return -ENOMEM;
> > +}
> > +
> > +/**
> > + * scmi_notification_init()  - Initializes Notification Core Support
> > + * @handle: The handle identifying the platform instance to initialize
> > + *
> > + * This function lays out all the basic resources needed by the notification
> > + * core instance identified by the provided handle: once done, all of the
> > + * SCMI Protocols can register their events with the core during their own
> > + * initializations.
> > + *
> > + * Note that failing to initialize the core notifications support does not
> > + * cause the whole SCMI Protocols stack to fail its initialization.
> > + *
> > + * SCMI Notification Initialization happens in 2 steps:
> > + * * initialization: basic common allocations (this function) -> @initialized
> > + * * registration: protocols asynchronously come into life and registers their
> > + *		   own supported list of events with the core; this causes
> > + *		   further per-protocol allocations
> > + *
> > + * Any user's callback registration attempt, referring a still not registered
> > + * event, will be registered as pending and finalized later (if possible)
> > + * by scmi_protocols_late_init() work.
> > + * This allows for lazy initialization of SCMI Protocols due to late (or
> > + * missing) SCMI drivers' modules loading.
> > + *
> > + * Return: 0 on Success
> > + */
> > +int scmi_notification_init(struct scmi_handle *handle)
> > +{
> > +	void *gid;
> > +	struct scmi_notify_instance *ni;
> > +
> > +	gid = devres_open_group(handle->dev, NULL, GFP_KERNEL);
> > +	if (!gid)
> > +		return -ENOMEM;
> > +
> > +	ni = devm_kzalloc(handle->dev, sizeof(*ni), GFP_KERNEL);
> > +	if (!ni)
> > +		goto err;
> > +
> > +	ni->gid = gid;
> > +	ni->handle = handle;
> > +
> > +	ni->registered_protocols = devm_kcalloc(handle->dev, SCMI_MAX_PROTO,
> > +						sizeof(char *), GFP_KERNEL);
> > +	if (!ni->registered_protocols)
> > +		goto err;
> > +
> > +	handle->notify_priv = ni;
> > +
> > +	atomic_set(&ni->initialized, 1);
> > +	atomic_set(&ni->enabled, 1);
> > +	/* Ensure atomic values are updated */
> > +	smp_mb__after_atomic();
> > +
> > +	pr_info("SCMI Notifications Core Initialized.\n");
> > +
> > +	devres_close_group(handle->dev, ni->gid);
> > +
> > +	return 0;
> > +
> > +err:
> > +	pr_warn("SCMI Notifications - Initialization Failed.\n");
> > +	devres_release_group(handle->dev, NULL);
> > +	return -ENOMEM;
> > +}
> > +
> > +/**
> > + * scmi_notification_exit()  - Shutdown and clean Notification core
> > + * @handle: The handle identifying the platform instance to shutdown
> > + */
> > +void scmi_notification_exit(struct scmi_handle *handle)
> > +{
> > +	struct scmi_notify_instance *ni = handle->notify_priv;
> > +
> > +	if (unlikely(!ni || !atomic_read(&ni->initialized)))
> > +		return;
> > +
> > +	atomic_set(&ni->enabled, 0);
> > +	/* Ensure atomic values are updated */
> > +	smp_mb__after_atomic();
> 
> If users can race with this, we're dead: the atomic by itself doesn't
> ensure that handle is not in use once we arrive here.  Should this
> be a refcount instead?
> 
> If users can't race with this, we probably don't protection here.
> 
> 
> I may be misunderstanding what this code is doing...
> 

First of all the enabled flag does not probably belong to this commit properly;
here is initialized but it is really fully used only in subsequent patches
(...so makes apparently little sense here... my bad...)

Anyway, in general SCMI protocols (beside notifications stuff) are initialized
as depicted above, BUT they are never deinitialized explicitly (there's no
equivalent scmi_protocol_deinit()) and also proto devices are never destroyed:
so there's no scmi_protocol_deregister_events() neither in this series, because
it would have been tricky to properly invoke it and would have not been consistent
with the original SCMI design.

On the other side since in protocol driver _remove() some general protos resources
are in fact freed anyway, for consistency I decided to free the devm notification
resources allocated with the above init() in scmi_notification_exit(): this should
happen only at system shutdown in fact when notification are no more of a concern.

So given there's no explicit deregister I had to ensure somehow that the wanna-be-freed
notif devm resources were safe to be released.

In this context the 'enabled' atomic flag is set to 0 @_exit to stop the dispatch of the
events (possibly still coming from the fw) from the ISR into the kfifo queues: once such
pkts flow is stopped I destroy_sync() (in a subsequent patch @_exit too) all the workqueues
fetching the events from the kfifos: this way I can be sure that all the notif resources
are no more used at all when I free all of them with devm_release() at the end.

All of this is an additional precaution against buggy fw not stopping sending events
even when asked to do so (by drivers when deregistering notif callbacks in their shutdown)

Give the above scenario on shutdown (which I never observed to tell the truth), and the fact
I'm freeing all devm res (including ni) at shutdown, it's now apparent ALSO that I cannot use
'enabled' to keep stopped the flow in a safe way after its enclosing struct ni has been freed !

So I'll remove the 'enable' atomic_t too and rely equally on the bare !ni check to determine
if the notification are enabled and should be dispatched. So that in 

scmi_notify_init() {

	if (unlikely(!ni))
		return 0;     /// Stop notification dispatching from ISR
}

So ni and !ni will signify (initialized && enabled) or not.

Not really a short explanation...sorry

Thanks

Cristian
Cristian Marussi May 12, 2020, 5 p.m. UTC | #3
On Mon, May 11, 2020 at 11:04:03PM +0100, Cristian Marussi wrote:
> Hi Dave
> 
> thanks for the review first of all.
> 
[snip]

> > I'm not sure about scmi_notification_exit() (see below).
[snip]
> > > +/**
> > > + * scmi_notification_exit()  - Shutdown and clean Notification core
> > > + * @handle: The handle identifying the platform instance to shutdown
> > > + */
> > > +void scmi_notification_exit(struct scmi_handle *handle)
> > > +{
> > > +	struct scmi_notify_instance *ni = handle->notify_priv;
> > > +
> > > +	if (unlikely(!ni || !atomic_read(&ni->initialized)))
> > > +		return;
> > > +
> > > +	atomic_set(&ni->enabled, 0);
> > > +	/* Ensure atomic values are updated */
> > > +	smp_mb__after_atomic();
> > 
> > If users can race with this, we're dead: the atomic by itself doesn't
> > ensure that handle is not in use once we arrive here.  Should this
> > be a refcount instead?
> > 
> > If users can't race with this, we probably don't protection here.
> > 
> > 
> > I may be misunderstanding what this code is doing...
> > 
> 
> First of all the enabled flag does not probably belong to this commit properly;
> here is initialized but it is really fully used only in subsequent patches
> (...so makes apparently little sense here... my bad...)
> 
> Anyway, in general SCMI protocols (beside notifications stuff) are initialized
> as depicted above, BUT they are never deinitialized explicitly (there's no
> equivalent scmi_protocol_deinit()) and also proto devices are never destroyed:
> so there's no scmi_protocol_deregister_events() neither in this series, because
> it would have been tricky to properly invoke it and would have not been consistent
> with the original SCMI design.
> 
> On the other side since in protocol driver _remove() some general protos resources
> are in fact freed anyway, for consistency I decided to free the devm notification
> resources allocated with the above init() in scmi_notification_exit(): this should
> happen only at system shutdown in fact when notification are no more of a concern.
> 
> So given there's no explicit deregister I had to ensure somehow that the wanna-be-freed
> notif devm resources were safe to be released.
> 
> In this context the 'enabled' atomic flag is set to 0 @_exit to stop the dispatch of the
> events (possibly still coming from the fw) from the ISR into the kfifo queues: once such
> pkts flow is stopped I destroy_sync() (in a subsequent patch @_exit too) all the workqueues
> fetching the events from the kfifos: this way I can be sure that all the notif resources
> are no more used at all when I free all of them with devm_release() at the end.
> 
> All of this is an additional precaution against buggy fw not stopping sending events
> even when asked to do so (by drivers when deregistering notif callbacks in their shutdown)
> 
> Give the above scenario on shutdown (which I never observed to tell the truth), and the fact
> I'm freeing all devm res (including ni) at shutdown, it's now apparent ALSO that I cannot use
> 'enabled' to keep stopped the flow in a safe way after its enclosing struct ni has been freed !
> 
> So I'll remove the 'enable' atomic_t too and rely equally on the bare !ni check to determine
> if the notification are enabled and should be dispatched. So that in 
> 

...replying to my early self here (o_O)....I'd add that I've tested the above changes (removing
initialized and enabled) triggering this _exit path by brutally unbinding the platform protocol
driver and I can see the notifications flow stop and the queues emptied as expected without
tragedy...the SCMI stack in general is not so happy though at that point, since it is not even
supposed to be unloaded ever in fact...I wonder if this limit condition(unbind of a core SCMI
driver which is not even modularizable in Kconfig) makes sense to be tested at all...
(if not for testing this specific code path...)

Cheers

Cristian
Sudeep Holla May 12, 2020, 5:19 p.m. UTC | #4
On Tue, May 12, 2020 at 06:00:20PM +0100, Cristian Marussi wrote:
> On Mon, May 11, 2020 at 11:04:03PM +0100, Cristian Marussi wrote:
> > Hi Dave
> >
> > thanks for the review first of all.
> >
> [snip]
>
> > > I'm not sure about scmi_notification_exit() (see below).
> [snip]
> > > > +/**
> > > > + * scmi_notification_exit()  - Shutdown and clean Notification core
> > > > + * @handle: The handle identifying the platform instance to shutdown
> > > > + */
> > > > +void scmi_notification_exit(struct scmi_handle *handle)
> > > > +{
> > > > +	struct scmi_notify_instance *ni = handle->notify_priv;
> > > > +
> > > > +	if (unlikely(!ni || !atomic_read(&ni->initialized)))
> > > > +		return;
> > > > +
> > > > +	atomic_set(&ni->enabled, 0);
> > > > +	/* Ensure atomic values are updated */
> > > > +	smp_mb__after_atomic();
> > >
> > > If users can race with this, we're dead: the atomic by itself doesn't
> > > ensure that handle is not in use once we arrive here.  Should this
> > > be a refcount instead?
> > >
> > > If users can't race with this, we probably don't protection here.
> > >
> > >
> > > I may be misunderstanding what this code is doing...
> > >
> >
> > First of all the enabled flag does not probably belong to this commit properly;
> > here is initialized but it is really fully used only in subsequent patches
> > (...so makes apparently little sense here... my bad...)
> >
> > Anyway, in general SCMI protocols (beside notifications stuff) are initialized
> > as depicted above, BUT they are never deinitialized explicitly (there's no
> > equivalent scmi_protocol_deinit()) and also proto devices are never destroyed:
> > so there's no scmi_protocol_deregister_events() neither in this series, because
> > it would have been tricky to properly invoke it and would have not been consistent
> > with the original SCMI design.
> >
> > On the other side since in protocol driver _remove() some general protos resources
> > are in fact freed anyway, for consistency I decided to free the devm notification
> > resources allocated with the above init() in scmi_notification_exit(): this should
> > happen only at system shutdown in fact when notification are no more of a concern.
> >
> > So given there's no explicit deregister I had to ensure somehow that the wanna-be-freed
> > notif devm resources were safe to be released.
> >
> > In this context the 'enabled' atomic flag is set to 0 @_exit to stop the dispatch of the
> > events (possibly still coming from the fw) from the ISR into the kfifo queues: once such
> > pkts flow is stopped I destroy_sync() (in a subsequent patch @_exit too) all the workqueues
> > fetching the events from the kfifos: this way I can be sure that all the notif resources
> > are no more used at all when I free all of them with devm_release() at the end.
> >
> > All of this is an additional precaution against buggy fw not stopping sending events
> > even when asked to do so (by drivers when deregistering notif callbacks in their shutdown)
> >
> > Give the above scenario on shutdown (which I never observed to tell the truth), and the fact
> > I'm freeing all devm res (including ni) at shutdown, it's now apparent ALSO that I cannot use
> > 'enabled' to keep stopped the flow in a safe way after its enclosing struct ni has been freed !
> >
> > So I'll remove the 'enable' atomic_t too and rely equally on the bare !ni check to determine
> > if the notification are enabled and should be dispatched. So that in
> >
>
> ...replying to my early self here (o_O)....I'd add that I've tested the above changes (removing
> initialized and enabled) triggering this _exit path by brutally unbinding the platform protocol
> driver and I can see the notifications flow stop and the queues emptied as expected without
> tragedy...the SCMI stack in general is not so happy though at that point, since it is not even
> supposed to be unloaded ever in fact...I wonder if this limit condition(unbind of a core SCMI
> driver which is not even modularizable in Kconfig) makes sense to be tested at all...
> (if not for testing this specific code path...)
>

We may need this eventually, I just kept initial implementation simple.
The scmi_drivers should be module and loading/unloading should be stable
and must work today.

Looking at the driver again, I am wondering why haven't I added
scmi_device_destroy in scmi_remove. We should be able to add that.

Lastly we can see how to make protocol registration and unregistration
as a module.

--
Regards,
Sudeep
Dave Martin May 13, 2020, 4:46 p.m. UTC | #5
On Mon, May 11, 2020 at 11:04:03PM +0100, Cristian Marussi wrote:
> Hi Dave
> 
> thanks for the review first of all.
> 
> On Wed, May 06, 2020 at 04:25:50PM +0100, Dave Martin wrote:
> > On Mon, May 04, 2020 at 05:38:47PM +0100, Cristian Marussi wrote:
> > > Add core SCMI Notifications protocol-registration support: allow protocols
> > > to register their own set of supported events, during their initialization
> > > phase. Notification core can track multiple platform instances by their
> > > handles.
> > > 
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
> > > ---
> > > V4 --> V5
> > > - fixed kernel-doc
> > > - added barriers for registered protocols and events
> > > - using kfifo_alloc and devm_add_action_or_reset
> > > V3 --> V4
> > > - removed scratch ISR buffer, move scratch BH buffer into protocol
> > >   descriptor
> > > - converted registered_protocols and registered_events from hashtables
> > >   into bare fixed-sized arrays
> > > - removed unregister protocols' routines (never called really)
> > > V2 --> V3
> > > - added scmi_notify_instance to track target platform instance
> > > V1 --> V2
> > > - splitted out of V1 patch 04
> > > - moved from IDR maps to real HashTables to store events
> > > - scmi_notifications_initialized is now an atomic_t
> > > - reviewed protocol registration/unregistration to use devres
> > > - fixed:
> > >   drivers/firmware/arm_scmi/notify.c:483:18-23: ERROR:
> > >   	reference preceded by free on line 482
> > > 
> > > Reported-by: kbuild test robot <lkp@intel.com>
> > > Reported-by: Julia Lawall <julia.lawall@lip6.fr>
> > > ---
> > >  drivers/firmware/arm_scmi/Makefile |   2 +-
> > >  drivers/firmware/arm_scmi/common.h |   4 +
> > >  drivers/firmware/arm_scmi/notify.c | 444 +++++++++++++++++++++++++++++
> > >  drivers/firmware/arm_scmi/notify.h |  56 ++++
> > >  include/linux/scmi_protocol.h      |   3 +
> > >  5 files changed, 508 insertions(+), 1 deletion(-)
> > >  create mode 100644 drivers/firmware/arm_scmi/notify.c
> > >  create mode 100644 drivers/firmware/arm_scmi/notify.h
> > 
> > [...]
> > 
> > > diff --git a/drivers/firmware/arm_scmi/notify.c b/drivers/firmware/arm_scmi/notify.c
> > 
> > [...]
> > 
> > > +int scmi_register_protocol_events(const struct scmi_handle *handle,
> > > +				  u8 proto_id, size_t queue_sz,
> > > +				  const struct scmi_protocol_event_ops *ops,
> > > +				  const struct scmi_event *evt, int num_events,
> > > +				  int num_sources)
> > > +{
> > > +	int i;
> > > +	size_t payld_sz = 0;
> > > +	struct scmi_registered_protocol_events_desc *pd;
> > > +	struct scmi_notify_instance *ni = handle->notify_priv;
> > > +
> > > +	if (!ops || !evt || proto_id >= SCMI_MAX_PROTO)
> > > +		return -EINVAL;
> > > +
> > > +	/* Ensure atomic value is updated */
> > > +	smp_mb__before_atomic();
> > > +	if (unlikely(!ni || !atomic_read(&ni->initialized)))
> > > +		return -EAGAIN;
> > 
> > The atomics/barriers don't look quite right to me here.
> > 
> > I'd have expected:
> > 
> > scmi_register_protocol_events()
> > {
> > 	if (atomic_read(&ni->initialized))
> > 		return -EAGAIN;
> > 	smp_mb_after_atomic();
> > 
> > 	/* ... */
> > }
> > 
> > to pair with:
> > 
> > scmi_notification_init()
> > {
> > 	/* ... */
> > 
> > 	smp_mb__before_atomic();
> > 	atomic_set(&ni->enabled, 1);
> > }
> > 
> > 
> > ...however, do we need to allow these two functions to race with each
> > other at all?  (I haven't tried to understand the wider context here,
> > so if there really is no way to avoid initialisation racing with use I
> > guess we may have to do something like this.  We don't want callers
> > to dumbly spin on this function though.)
> > 
> > 
> > In other patches in the series, calls to scmi_register_protocol_events()
> > seem to be assuming there is no race: the return value is not checked.
> > Possibly a bug?
> > 
> 
> I think you are right in these regards, there's no need of an atomic here
> for 'initialized' and using -EAGAIN on !initialized as error code in
> scmi_register_protocol_events() is wrong too in this context.
> 
> The aim is to detect when general SCMI notification core initialization has
> failed as a whole and in that case inhibit any further SCMI protocols events'
> registration attempt during general protocol init (since init has failed and
> no related data has been allocated and readied).
> No attempt should be made to re-register on failure because the failure to
> init the notif stuff is permanent when happens (not solvable with deferred
> re-probing) and there's no race in checking this condition
> (more on this down below)
> 
> So I think I'll just drop the 'initialized' field as a whole and simply use
> the value of handle->notify_priv (ni) to detect if initialization was
> successfull or not, like:
> 
> scmi_register_protocol_events()
> {
> 	...
> 	if (unlikely(!ni))  		// ni is NULL when init has failed
> 		return -ENOMEM;
> }
> 
> (..plus barriers where needed)
> 
> and could probably check the ret value too in the caller to printout some
> debug info in case of other-than-ENOMEM errors. (due anyway to a broken
> implementation not to runtime errors)
> 
> As said there's no race to initialized (or ni itself) in fact because the
> sequence in SCMI general init is roughly as follows:
> 
> a. SCMI bus is created and registered
> b. SCMI protocol driver (a platform driver) is probed and an SCMI handle is
>    initialized for each distinct SCMI plaform (if any) defined in the DT,
>    and then:
>   -> scmi_notification_init()
>   -> scmi_create_protocol_devices:
>      DT-defined and fw-supported SCMI protocol devices are created (this does
>      NOT cause automatically protocol initialization though...)
> 	--> scmi_set_handle()
> c. An SCMI driver using protoX is finally loaded/initialized (say scmi-cpufreq)
>    and the related protocol device is probed, and, if this was the first device
>    to be probe relying on protoX, protocol X itself (say perf.c) is finally
>    initialized:
>   __probe():
>   -> if !handle	return -EPROBEDEFER
>   -> ...
>   -> X.scmi_protocol_init()
>     ...
>     -> scmi_register_protocol_events(X..)
> 
> b. and c. could indeed happen out of order and lead to a deferred reprobe of c.
> once b. has finally completed, BUT the reprobe is triggered anyway before any
> protocol init and event registration can start, so when ni == NULL
> in scmi_register_protocol(), the initialization has been effectively attempted
> and has failed (-ENOMEM) and so makes no sense to retry further.
> 
> > 
> > I'm not sure about scmi_notification_exit() (see below).
> > 
> > > +
> > > +	/* Attach to the notification main devres group */
> > > +	if (!devres_open_group(ni->handle->dev, ni->gid, GFP_KERNEL))
> > > +		return -ENOMEM;
> > > +
> > > +	for (i = 0; i < num_events; i++)
> > > +		payld_sz = max_t(size_t, payld_sz, evt[i].max_payld_sz);
> > > +	pd = scmi_allocate_registered_protocol_desc(ni, proto_id, queue_sz,
> > > +				    sizeof(struct scmi_event_header) + payld_sz,
> > > +						    num_events, ops);
> > > +	if (IS_ERR(pd))
> > > +		goto err;
> > > +
> > > +	for (i = 0; i < num_events; i++, evt++) {
> > > +		struct scmi_registered_event *r_evt;
> > > +
> > > +		r_evt = devm_kzalloc(ni->handle->dev, sizeof(*r_evt),
> > > +				     GFP_KERNEL);
> > > +		if (!r_evt)
> > > +			goto err;
> > > +		r_evt->proto = pd;
> > > +		r_evt->evt = evt;
> > > +
> > > +		r_evt->sources = devm_kcalloc(ni->handle->dev, num_sources,
> > > +					      sizeof(refcount_t), GFP_KERNEL);
> > > +		if (!r_evt->sources)
> > > +			goto err;
> > > +		r_evt->num_sources = num_sources;
> > > +		mutex_init(&r_evt->sources_mtx);
> > > +
> > > +		r_evt->report = devm_kzalloc(ni->handle->dev,
> > > +					     evt->max_report_sz, GFP_KERNEL);
> > > +		if (!r_evt->report)
> > > +			goto err;
> > > +
> > > +		pd->registered_events[i] = r_evt;
> > > +		/* Ensure events are updated */
> > > +		smp_wmb();
> > > +		pr_info("SCMI Notifications: registered event - %X\n",
> > > +			MAKE_ALL_SRCS_KEY(r_evt->proto->id, r_evt->evt->id));
> > > +	}
> > > +
> > > +	/* Register protocol and events...it will never be removed */
> > > +	ni->registered_protocols[proto_id] = pd;
> > > +	/* Ensure protocols are updated */
> > > +	smp_wmb();
> > > +
> > > +	devres_close_group(ni->handle->dev, ni->gid);
> > > +
> > > +	return 0;
> > > +
> > > +err:
> > > +	pr_warn("SCMI Notifications - Proto:%X - Registration Failed !\n",
> > > +		proto_id);
> > > +	/* A failing protocol registration does not trigger full failure */
> > > +	devres_close_group(ni->handle->dev, ni->gid);
> > > +
> > > +	return -ENOMEM;
> > > +}
> > > +
> > > +/**
> > > + * scmi_notification_init()  - Initializes Notification Core Support
> > > + * @handle: The handle identifying the platform instance to initialize
> > > + *
> > > + * This function lays out all the basic resources needed by the notification
> > > + * core instance identified by the provided handle: once done, all of the
> > > + * SCMI Protocols can register their events with the core during their own
> > > + * initializations.
> > > + *
> > > + * Note that failing to initialize the core notifications support does not
> > > + * cause the whole SCMI Protocols stack to fail its initialization.
> > > + *
> > > + * SCMI Notification Initialization happens in 2 steps:
> > > + * * initialization: basic common allocations (this function) -> @initialized
> > > + * * registration: protocols asynchronously come into life and registers their
> > > + *		   own supported list of events with the core; this causes
> > > + *		   further per-protocol allocations
> > > + *
> > > + * Any user's callback registration attempt, referring a still not registered
> > > + * event, will be registered as pending and finalized later (if possible)
> > > + * by scmi_protocols_late_init() work.
> > > + * This allows for lazy initialization of SCMI Protocols due to late (or
> > > + * missing) SCMI drivers' modules loading.
> > > + *
> > > + * Return: 0 on Success
> > > + */
> > > +int scmi_notification_init(struct scmi_handle *handle)
> > > +{
> > > +	void *gid;
> > > +	struct scmi_notify_instance *ni;
> > > +
> > > +	gid = devres_open_group(handle->dev, NULL, GFP_KERNEL);
> > > +	if (!gid)
> > > +		return -ENOMEM;
> > > +
> > > +	ni = devm_kzalloc(handle->dev, sizeof(*ni), GFP_KERNEL);
> > > +	if (!ni)
> > > +		goto err;
> > > +
> > > +	ni->gid = gid;
> > > +	ni->handle = handle;
> > > +
> > > +	ni->registered_protocols = devm_kcalloc(handle->dev, SCMI_MAX_PROTO,
> > > +						sizeof(char *), GFP_KERNEL);
> > > +	if (!ni->registered_protocols)
> > > +		goto err;
> > > +
> > > +	handle->notify_priv = ni;
> > > +
> > > +	atomic_set(&ni->initialized, 1);
> > > +	atomic_set(&ni->enabled, 1);
> > > +	/* Ensure atomic values are updated */
> > > +	smp_mb__after_atomic();
> > > +
> > > +	pr_info("SCMI Notifications Core Initialized.\n");
> > > +
> > > +	devres_close_group(handle->dev, ni->gid);
> > > +
> > > +	return 0;
> > > +
> > > +err:
> > > +	pr_warn("SCMI Notifications - Initialization Failed.\n");
> > > +	devres_release_group(handle->dev, NULL);
> > > +	return -ENOMEM;
> > > +}
> > > +
> > > +/**
> > > + * scmi_notification_exit()  - Shutdown and clean Notification core
> > > + * @handle: The handle identifying the platform instance to shutdown
> > > + */
> > > +void scmi_notification_exit(struct scmi_handle *handle)
> > > +{
> > > +	struct scmi_notify_instance *ni = handle->notify_priv;
> > > +
> > > +	if (unlikely(!ni || !atomic_read(&ni->initialized)))
> > > +		return;
> > > +
> > > +	atomic_set(&ni->enabled, 0);
> > > +	/* Ensure atomic values are updated */
> > > +	smp_mb__after_atomic();
> > 
> > If users can race with this, we're dead: the atomic by itself doesn't
> > ensure that handle is not in use once we arrive here.  Should this
> > be a refcount instead?
> > 
> > If users can't race with this, we probably don't protection here.
> > 
> > 
> > I may be misunderstanding what this code is doing...
> > 
> 
> First of all the enabled flag does not probably belong to this commit properly;
> here is initialized but it is really fully used only in subsequent patches
> (...so makes apparently little sense here... my bad...)
> 
> Anyway, in general SCMI protocols (beside notifications stuff) are initialized
> as depicted above, BUT they are never deinitialized explicitly (there's no
> equivalent scmi_protocol_deinit()) and also proto devices are never destroyed:
> so there's no scmi_protocol_deregister_events() neither in this series, because
> it would have been tricky to properly invoke it and would have not been consistent
> with the original SCMI design.
> 
> On the other side since in protocol driver _remove() some general protos resources
> are in fact freed anyway, for consistency I decided to free the devm notification
> resources allocated with the above init() in scmi_notification_exit(): this should
> happen only at system shutdown in fact when notification are no more of a concern.
> 
> So given there's no explicit deregister I had to ensure somehow that the wanna-be-freed
> notif devm resources were safe to be released.
> 
> In this context the 'enabled' atomic flag is set to 0 @_exit to stop the dispatch of the
> events (possibly still coming from the fw) from the ISR into the kfifo queues: once such
> pkts flow is stopped I destroy_sync() (in a subsequent patch @_exit too) all the workqueues
> fetching the events from the kfifos: this way I can be sure that all the notif resources
> are no more used at all when I free all of them with devm_release() at the end.
> 
> All of this is an additional precaution against buggy fw not stopping sending events
> even when asked to do so (by drivers when deregistering notif callbacks in their shutdown)
> 
> Give the above scenario on shutdown (which I never observed to tell the truth), and the fact
> I'm freeing all devm res (including ni) at shutdown, it's now apparent ALSO that I cannot use
> 'enabled' to keep stopped the flow in a safe way after its enclosing struct ni has been freed !
> 
> So I'll remove the 'enable' atomic_t too and rely equally on the bare !ni check to determine
> if the notification are enabled and should be dispatched. So that in 
> 
> scmi_notify_init() {
> 
> 	if (unlikely(!ni))
> 		return 0;     /// Stop notification dispatching from ISR
> }
> 
> So ni and !ni will signify (initialized && enabled) or not.

Can this still race?

If teardown is impossible or useless, perhaps it would make sense not to
have the code for it (?)

Do we expect SCMI related drivers to be buildable as modules?

> Not really a short explanation...sorry

Sounds like it's not simple :)

What you've said sounds plausible -- I haven't found much time to think
about it this week I'm afraid!

Cheers
---Dave
Cristian Marussi May 13, 2020, 6:56 p.m. UTC | #6
On Wed, May 13, 2020 at 05:46:13PM +0100, Dave Martin wrote:
> On Mon, May 11, 2020 at 11:04:03PM +0100, Cristian Marussi wrote:
> > Hi Dave
> > 
> > thanks for the review first of all.
> > 
[snip]

> > 
> > First of all the enabled flag does not probably belong to this commit properly;
> > here is initialized but it is really fully used only in subsequent patches
> > (...so makes apparently little sense here... my bad...)
> > 
> > Anyway, in general SCMI protocols (beside notifications stuff) are initialized
> > as depicted above, BUT they are never deinitialized explicitly (there's no
> > equivalent scmi_protocol_deinit()) and also proto devices are never destroyed:
> > so there's no scmi_protocol_deregister_events() neither in this series, because
> > it would have been tricky to properly invoke it and would have not been consistent
> > with the original SCMI design.
> > 
> > On the other side since in protocol driver _remove() some general protos resources
> > are in fact freed anyway, for consistency I decided to free the devm notification
> > resources allocated with the above init() in scmi_notification_exit(): this should
> > happen only at system shutdown in fact when notification are no more of a concern.
> > 
> > So given there's no explicit deregister I had to ensure somehow that the wanna-be-freed
> > notif devm resources were safe to be released.
> > 
> > In this context the 'enabled' atomic flag is set to 0 @_exit to stop the dispatch of the
> > events (possibly still coming from the fw) from the ISR into the kfifo queues: once such
> > pkts flow is stopped I destroy_sync() (in a subsequent patch @_exit too) all the workqueues
> > fetching the events from the kfifos: this way I can be sure that all the notif resources
> > are no more used at all when I free all of them with devm_release() at the end.
> > 
> > All of this is an additional precaution against buggy fw not stopping sending events
> > even when asked to do so (by drivers when deregistering notif callbacks in their shutdown)
> > 
> > Give the above scenario on shutdown (which I never observed to tell the truth), and the fact
> > I'm freeing all devm res (including ni) at shutdown, it's now apparent ALSO that I cannot use
> > 'enabled' to keep stopped the flow in a safe way after its enclosing struct ni has been freed !
> > 
> > So I'll remove the 'enable' atomic_t too and rely equally on the bare !ni check to determine
> > if the notification are enabled and should be dispatched. So that in 
> > 
> > scmi_notify_init() {
> > 
> > 	if (unlikely(!ni))
> > 		return 0;     /// Stop notification dispatching from ISR
> > }
> > 
> > So ni and !ni will signify (initialized && enabled) or not.
> 
> Can this still race?

It should not race now with a few barriers...V8 is not posted still (and it's a
bit different from the above broken snippet of mine :D)

> 
> If teardown is impossible or useless, perhaps it would make sense not to
> have the code for it (?)
> 

Well yes in fact I was doubtful from the start if this rarely used code was
needed at all...I added just for consistency since some cleanup happens
elsewhere in SCMI...but...

> Do we expect SCMI related drivers to be buildable as modules?
> 

... talking with Sudeep these days around the content of this thread, it seems that
we should be able anyway to modularize cleanly most of the SCMI core due to the
Android GKI thing....so I'm going to review the whole SCMI core and notification
shutdown/cleanup process soon....still I'm not convinced if I'll do anything different
here for notif at that point...as of now I fixed as above the current series foro v8
following your feedback then we'll see if I'll change it more drastically....

> > Not really a short explanation...sorry
> 
> Sounds like it's not simple :)
> 
> What you've said sounds plausible -- I haven't found much time to think
> about it this week I'm afraid!
>

Thanks you for the feedback !

Cheers,

Cristian
 
> Cheers
> ---Dave
diff mbox series

Patch

diff --git a/drivers/firmware/arm_scmi/Makefile b/drivers/firmware/arm_scmi/Makefile
index 11b238f81923..d55612362d65 100644
--- a/drivers/firmware/arm_scmi/Makefile
+++ b/drivers/firmware/arm_scmi/Makefile
@@ -1,7 +1,7 @@ 
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y	= scmi-bus.o scmi-driver.o scmi-protocols.o scmi-transport.o
 scmi-bus-y = bus.o
-scmi-driver-y = driver.o
+scmi-driver-y = driver.o notify.o
 scmi-transport-y = shmem.o
 scmi-transport-$(CONFIG_MAILBOX) += mailbox.o
 scmi-transport-$(CONFIG_HAVE_ARM_SMCCC) += smc.o
diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h
index 31fe5a22a011..c113e578cc6c 100644
--- a/drivers/firmware/arm_scmi/common.h
+++ b/drivers/firmware/arm_scmi/common.h
@@ -6,6 +6,8 @@ 
  *
  * Copyright (C) 2018 ARM Ltd.
  */
+#ifndef _SCMI_COMMON_H
+#define _SCMI_COMMON_H
 
 #include <linux/bitfield.h>
 #include <linux/completion.h>
@@ -235,3 +237,5 @@  void shmem_fetch_notification(struct scmi_shared_mem __iomem *shmem,
 void shmem_clear_channel(struct scmi_shared_mem __iomem *shmem);
 bool shmem_poll_done(struct scmi_shared_mem __iomem *shmem,
 		     struct scmi_xfer *xfer);
+
+#endif /* _SCMI_COMMON_H */
diff --git a/drivers/firmware/arm_scmi/notify.c b/drivers/firmware/arm_scmi/notify.c
new file mode 100644
index 000000000000..15097bf2b52a
--- /dev/null
+++ b/drivers/firmware/arm_scmi/notify.c
@@ -0,0 +1,444 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * System Control and Management Interface (SCMI) Notification support
+ *
+ * Copyright (C) 2020 ARM Ltd.
+ */
+/**
+ * DOC: Theory of operation
+ *
+ * SCMI Protocol specification allows the platform to signal events to
+ * interested agents via notification messages: this is an implementation
+ * of the dispatch and delivery of such notifications to the interested users
+ * inside the Linux kernel.
+ *
+ * An SCMI Notification core instance is initialized for each active platform
+ * instance identified by the means of the usual &struct scmi_handle.
+ *
+ * Each SCMI Protocol implementation, during its initialization, registers with
+ * this core its set of supported events using scmi_register_protocol_events():
+ * all the needed descriptors are stored in the &struct registered_protocols and
+ * &struct registered_events arrays.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/compiler.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/kernel.h>
+#include <linux/kfifo.h>
+#include <linux/mutex.h>
+#include <linux/refcount.h>
+#include <linux/scmi_protocol.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "notify.h"
+
+#define	SCMI_MAX_PROTO			256
+#define	SCMI_ALL_SRC_IDS		0xffffUL
+/*
+ * Builds an unsigned 32bit key from the given input tuple to be used
+ * as a key in hashtables.
+ */
+#define MAKE_HASH_KEY(p, e, s)			\
+	((u32)(((p) << 24) | ((e) << 16) | ((s) & SCMI_ALL_SRC_IDS)))
+
+#define MAKE_ALL_SRCS_KEY(p, e)			\
+	MAKE_HASH_KEY((p), (e), SCMI_ALL_SRC_IDS)
+
+struct scmi_registered_protocol_events_desc;
+
+/**
+ * struct scmi_notify_instance  - Represents an instance of the notification
+ * core
+ * @gid: GroupID used for devres
+ * @handle: A reference to the platform instance
+ * @initialized: A flag that indicates if the core resources have been allocated
+ *		 and protocols are allowed to register their supported events
+ * @enabled: A flag to indicate events can be enabled and start flowing
+ * @registered_protocols: A statically allocated array containing pointers to
+ *			  all the registered protocol-level specific information
+ *			  related to events' handling
+ *
+ * Each platform instance, represented by a handle, has its own instance of
+ * the notification subsystem represented by this structure.
+ */
+struct scmi_notify_instance {
+	void						*gid;
+	struct scmi_handle				*handle;
+	atomic_t					initialized;
+	atomic_t					enabled;
+	struct scmi_registered_protocol_events_desc	**registered_protocols;
+};
+
+/**
+ * struct events_queue  - Describes a queue and its associated worker
+ * @sz: Size in bytes of the related kfifo
+ * @kfifo: A dedicated Kernel kfifo descriptor
+ *
+ * Each protocol has its own dedicated events_queue descriptor.
+ */
+struct events_queue {
+	size_t				sz;
+	struct kfifo			kfifo;
+};
+
+/**
+ * struct scmi_event_header  - A utility header
+ * @timestamp: The timestamp, in nanoseconds (boottime), which was associated
+ *	       to this event as soon as it entered the SCMI RX ISR
+ * @evt_id: Event ID (corresponds to the Event MsgID for this Protocol)
+ * @payld_sz: Effective size of the embedded message payload which follows
+ * @payld: A reference to the embedded event payload
+ *
+ * This header is prepended to each received event message payload before
+ * queueing it on the related &struct events_queue.
+ */
+struct scmi_event_header {
+	u64	timestamp;
+	u8	evt_id;
+	size_t	payld_sz;
+	u8	payld[];
+} __packed;
+
+struct scmi_registered_event;
+
+/**
+ * struct scmi_registered_protocol_events_desc  - Protocol Specific information
+ * @id: Protocol ID
+ * @ops: Protocol specific and event-related operations
+ * @equeue: The embedded per-protocol events_queue
+ * @ni: A reference to the initialized instance descriptor
+ * @eh: A reference to pre-allocated buffer to be used as a scratch area by the
+ *	deferred worker when fetching data from the kfifo
+ * @eh_sz: Size of the pre-allocated buffer @eh
+ * @in_flight: A reference to an in flight &struct scmi_registered_event
+ * @num_events: Number of events in @registered_events
+ * @registered_events: A dynamically allocated array holding all the registered
+ *		       events' descriptors, whose fixed-size is determined at
+ *		       compile time.
+ *
+ * All protocols that register at least one event have their protocol-specific
+ * information stored here, together with the embedded allocated events_queue.
+ * These descriptors are stored in the @registered_protocols array at protocol
+ * registration time.
+ *
+ * Once these descriptors are successfully registered, they are NEVER again
+ * removed or modified since protocols do not unregister ever, so that, once
+ * we safely grab a NON-NULL reference from the array we can keep it and use it.
+ */
+struct scmi_registered_protocol_events_desc {
+	u8					id;
+	const struct scmi_protocol_event_ops	*ops;
+	struct events_queue			equeue;
+	struct scmi_notify_instance		*ni;
+	struct scmi_event_header		*eh;
+	size_t					eh_sz;
+	void					*in_flight;
+	int					num_events;
+	struct scmi_registered_event		**registered_events;
+};
+
+/**
+ * struct scmi_registered_event  - Event Specific Information
+ * @proto: A reference to the associated protocol descriptor
+ * @evt: A reference to the associated event descriptor (as provided at
+ *       registration time)
+ * @report: A pre-allocated buffer used by the deferred worker to fill a
+ *	    customized event report
+ * @num_sources: The number of possible sources for this event as stated at
+ *		 events' registration time
+ * @sources: A reference to a dynamically allocated array used to refcount the
+ *	     events' enable requests for all the existing sources
+ * @sources_mtx: A mutex to serialize the access to @sources
+ *
+ * All registered events are represented by one of these structures that are
+ * stored in the @registered_events array at protocol registration time.
+ *
+ * Once these descriptors are successfully registered, they are NEVER again
+ * removed or modified since protocols do not unregister ever, so that once we
+ * safely grab a NON-NULL reference from the table we can keep it and use it.
+ */
+struct scmi_registered_event {
+	struct scmi_registered_protocol_events_desc	*proto;
+	const struct scmi_event				*evt;
+	void						*report;
+	u32						num_sources;
+	refcount_t					*sources;
+	struct mutex					sources_mtx;
+};
+
+/**
+ * scmi_kfifo_free()  - Devres action helper to free the kfifo
+ * @kfifo: The kfifo to free
+ */
+static void scmi_kfifo_free(void *kfifo)
+{
+	kfifo_free((struct kfifo *)kfifo);
+}
+
+/**
+ * scmi_initialize_events_queue()  - Allocate/Initialize a kfifo buffer
+ * @ni: A reference to the notification instance to use
+ * @equeue: The events_queue to initialize
+ * @sz: Size of the kfifo buffer to allocate
+ *
+ * Allocate a buffer for the kfifo and initialize it.
+ *
+ * Return: 0 on Success
+ */
+static int scmi_initialize_events_queue(struct scmi_notify_instance *ni,
+					struct events_queue *equeue, size_t sz)
+{
+	if (kfifo_alloc(&equeue->kfifo, sz, GFP_KERNEL))
+		return -ENOMEM;
+	/* Size could have been roundup to power-of-two */
+	equeue->sz = kfifo_size(&equeue->kfifo);
+
+	return devm_add_action_or_reset(ni->handle->dev, scmi_kfifo_free,
+					&equeue->kfifo);
+}
+
+/**
+ * scmi_allocate_registered_protocol_desc()  - Allocate a registered protocol
+ * events' descriptor
+ * @ni: A reference to the &struct scmi_notify_instance notification instance
+ *	to use
+ * @proto_id: Protocol ID
+ * @queue_sz: Size of the associated queue to allocate
+ * @eh_sz: Size of the event header scratch area to pre-allocate
+ * @num_events: Number of events to support (size of @registered_events)
+ * @ops: Pointer to a struct holding references to protocol specific helpers
+ *	 needed during events handling
+ *
+ * It is supposed to be called only once for each protocol at protocol
+ * initialization time, so it warns if the requested protocol is found already
+ * registered.
+ *
+ * Return: The allocated and registered descriptor on Success
+ */
+static struct scmi_registered_protocol_events_desc *
+scmi_allocate_registered_protocol_desc(struct scmi_notify_instance *ni,
+				       u8 proto_id, size_t queue_sz,
+				       size_t eh_sz, int num_events,
+				const struct scmi_protocol_event_ops *ops)
+{
+	int ret;
+	struct scmi_registered_protocol_events_desc *pd;
+
+	/* Ensure protocols are up to date */
+	smp_rmb();
+	if (ni->registered_protocols[proto_id]) {
+		WARN_ON(1);
+		return ERR_PTR(-EINVAL);
+	}
+
+	pd = devm_kzalloc(ni->handle->dev, sizeof(*pd), GFP_KERNEL);
+	if (!pd)
+		return ERR_PTR(-ENOMEM);
+	pd->id = proto_id;
+	pd->ops = ops;
+	pd->ni = ni;
+
+	ret = scmi_initialize_events_queue(ni, &pd->equeue, queue_sz);
+	if (ret)
+		return ERR_PTR(ret);
+
+	pd->eh = devm_kzalloc(ni->handle->dev, eh_sz, GFP_KERNEL);
+	if (!pd->eh)
+		return ERR_PTR(-ENOMEM);
+	pd->eh_sz = eh_sz;
+
+	pd->registered_events = devm_kcalloc(ni->handle->dev, num_events,
+					     sizeof(char *), GFP_KERNEL);
+	if (!pd->registered_events)
+		return ERR_PTR(-ENOMEM);
+	pd->num_events = num_events;
+
+	return pd;
+}
+
+/**
+ * scmi_register_protocol_events()  - Register Protocol Events with the core
+ * @handle: The handle identifying the platform instance against which the
+ *	    the protocol's events are registered
+ * @proto_id: Protocol ID
+ * @queue_sz: Size in bytes of the associated queue to be allocated
+ * @ops: Protocol specific event-related operations
+ * @evt: Event descriptor array
+ * @num_events: Number of events in @evt array
+ * @num_sources: Number of possible sources for this protocol on this
+ *		 platform.
+ *
+ * Used by SCMI Protocols initialization code to register with the notification
+ * core the list of supported events and their descriptors: takes care to
+ * pre-allocate and store all needed descriptors, scratch buffers and event
+ * queues.
+ *
+ * Return: 0 on Success
+ */
+int scmi_register_protocol_events(const struct scmi_handle *handle,
+				  u8 proto_id, size_t queue_sz,
+				  const struct scmi_protocol_event_ops *ops,
+				  const struct scmi_event *evt, int num_events,
+				  int num_sources)
+{
+	int i;
+	size_t payld_sz = 0;
+	struct scmi_registered_protocol_events_desc *pd;
+	struct scmi_notify_instance *ni = handle->notify_priv;
+
+	if (!ops || !evt || proto_id >= SCMI_MAX_PROTO)
+		return -EINVAL;
+
+	/* Ensure atomic value is updated */
+	smp_mb__before_atomic();
+	if (unlikely(!ni || !atomic_read(&ni->initialized)))
+		return -EAGAIN;
+
+	/* Attach to the notification main devres group */
+	if (!devres_open_group(ni->handle->dev, ni->gid, GFP_KERNEL))
+		return -ENOMEM;
+
+	for (i = 0; i < num_events; i++)
+		payld_sz = max_t(size_t, payld_sz, evt[i].max_payld_sz);
+	pd = scmi_allocate_registered_protocol_desc(ni, proto_id, queue_sz,
+				    sizeof(struct scmi_event_header) + payld_sz,
+						    num_events, ops);
+	if (IS_ERR(pd))
+		goto err;
+
+	for (i = 0; i < num_events; i++, evt++) {
+		struct scmi_registered_event *r_evt;
+
+		r_evt = devm_kzalloc(ni->handle->dev, sizeof(*r_evt),
+				     GFP_KERNEL);
+		if (!r_evt)
+			goto err;
+		r_evt->proto = pd;
+		r_evt->evt = evt;
+
+		r_evt->sources = devm_kcalloc(ni->handle->dev, num_sources,
+					      sizeof(refcount_t), GFP_KERNEL);
+		if (!r_evt->sources)
+			goto err;
+		r_evt->num_sources = num_sources;
+		mutex_init(&r_evt->sources_mtx);
+
+		r_evt->report = devm_kzalloc(ni->handle->dev,
+					     evt->max_report_sz, GFP_KERNEL);
+		if (!r_evt->report)
+			goto err;
+
+		pd->registered_events[i] = r_evt;
+		/* Ensure events are updated */
+		smp_wmb();
+		pr_info("SCMI Notifications: registered event - %X\n",
+			MAKE_ALL_SRCS_KEY(r_evt->proto->id, r_evt->evt->id));
+	}
+
+	/* Register protocol and events...it will never be removed */
+	ni->registered_protocols[proto_id] = pd;
+	/* Ensure protocols are updated */
+	smp_wmb();
+
+	devres_close_group(ni->handle->dev, ni->gid);
+
+	return 0;
+
+err:
+	pr_warn("SCMI Notifications - Proto:%X - Registration Failed !\n",
+		proto_id);
+	/* A failing protocol registration does not trigger full failure */
+	devres_close_group(ni->handle->dev, ni->gid);
+
+	return -ENOMEM;
+}
+
+/**
+ * scmi_notification_init()  - Initializes Notification Core Support
+ * @handle: The handle identifying the platform instance to initialize
+ *
+ * This function lays out all the basic resources needed by the notification
+ * core instance identified by the provided handle: once done, all of the
+ * SCMI Protocols can register their events with the core during their own
+ * initializations.
+ *
+ * Note that failing to initialize the core notifications support does not
+ * cause the whole SCMI Protocols stack to fail its initialization.
+ *
+ * SCMI Notification Initialization happens in 2 steps:
+ * * initialization: basic common allocations (this function) -> @initialized
+ * * registration: protocols asynchronously come into life and registers their
+ *		   own supported list of events with the core; this causes
+ *		   further per-protocol allocations
+ *
+ * Any user's callback registration attempt, referring a still not registered
+ * event, will be registered as pending and finalized later (if possible)
+ * by scmi_protocols_late_init() work.
+ * This allows for lazy initialization of SCMI Protocols due to late (or
+ * missing) SCMI drivers' modules loading.
+ *
+ * Return: 0 on Success
+ */
+int scmi_notification_init(struct scmi_handle *handle)
+{
+	void *gid;
+	struct scmi_notify_instance *ni;
+
+	gid = devres_open_group(handle->dev, NULL, GFP_KERNEL);
+	if (!gid)
+		return -ENOMEM;
+
+	ni = devm_kzalloc(handle->dev, sizeof(*ni), GFP_KERNEL);
+	if (!ni)
+		goto err;
+
+	ni->gid = gid;
+	ni->handle = handle;
+
+	ni->registered_protocols = devm_kcalloc(handle->dev, SCMI_MAX_PROTO,
+						sizeof(char *), GFP_KERNEL);
+	if (!ni->registered_protocols)
+		goto err;
+
+	handle->notify_priv = ni;
+
+	atomic_set(&ni->initialized, 1);
+	atomic_set(&ni->enabled, 1);
+	/* Ensure atomic values are updated */
+	smp_mb__after_atomic();
+
+	pr_info("SCMI Notifications Core Initialized.\n");
+
+	devres_close_group(handle->dev, ni->gid);
+
+	return 0;
+
+err:
+	pr_warn("SCMI Notifications - Initialization Failed.\n");
+	devres_release_group(handle->dev, NULL);
+	return -ENOMEM;
+}
+
+/**
+ * scmi_notification_exit()  - Shutdown and clean Notification core
+ * @handle: The handle identifying the platform instance to shutdown
+ */
+void scmi_notification_exit(struct scmi_handle *handle)
+{
+	struct scmi_notify_instance *ni = handle->notify_priv;
+
+	if (unlikely(!ni || !atomic_read(&ni->initialized)))
+		return;
+
+	atomic_set(&ni->enabled, 0);
+	/* Ensure atomic values are updated */
+	smp_mb__after_atomic();
+
+	devres_release_group(ni->handle->dev, ni->gid);
+}
diff --git a/drivers/firmware/arm_scmi/notify.h b/drivers/firmware/arm_scmi/notify.h
new file mode 100644
index 000000000000..54094aaf812a
--- /dev/null
+++ b/drivers/firmware/arm_scmi/notify.h
@@ -0,0 +1,56 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * System Control and Management Interface (SCMI) Message Protocol
+ * notification header file containing some definitions, structures
+ * and function prototypes related to SCMI Notification handling.
+ *
+ * Copyright (C) 2020 ARM Ltd.
+ */
+#ifndef _SCMI_NOTIFY_H
+#define _SCMI_NOTIFY_H
+
+#include <linux/device.h>
+#include <linux/types.h>
+
+/**
+ * struct scmi_event  - Describes an event to be supported
+ * @id: Event ID
+ * @max_payld_sz: Max possible size for the payload of a notif msg of this kind
+ * @max_report_sz: Max possible size for the report of a notif msg of this kind
+ *
+ * Each SCMI protocol, during its initialization phase, can describe the events
+ * it wishes to support in a few struct scmi_event and pass them to the core
+ * using scmi_register_protocol_events().
+ */
+struct scmi_event {
+	u8	id;
+	size_t	max_payld_sz;
+	size_t	max_report_sz;
+};
+
+/**
+ * struct scmi_protocol_event_ops  - Protocol helpers called by the notification
+ *				     core.
+ * @set_notify_enabled: Enable/disable the required evt_id/src_id notifications
+ *			using the proper custom protocol commands.
+ *			Return true if at least one the required src_id
+ *			has been successfully enabled/disabled
+ *
+ * Context: Helpers described in &struct scmi_protocol_event_ops are called
+ *	    only in process context.
+ */
+struct scmi_protocol_event_ops {
+	bool (*set_notify_enabled)(const struct scmi_handle *handle,
+				   u8 evt_id, u32 src_id, bool enabled);
+};
+
+int scmi_notification_init(struct scmi_handle *handle);
+void scmi_notification_exit(struct scmi_handle *handle);
+
+int scmi_register_protocol_events(const struct scmi_handle *handle,
+				  u8 proto_id, size_t queue_sz,
+				  const struct scmi_protocol_event_ops *ops,
+				  const struct scmi_event *evt, int num_events,
+				  int num_sources);
+
+#endif /* _SCMI_NOTIFY_H */
diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h
index ce2f5c28b2df..0679f10ab05e 100644
--- a/include/linux/scmi_protocol.h
+++ b/include/linux/scmi_protocol.h
@@ -231,6 +231,8 @@  struct scmi_reset_ops {
  *	protocol(for internal use only)
  * @reset_priv: pointer to private data structure specific to reset
  *	protocol(for internal use only)
+ * @notify_priv: pointer to private data structure specific to notifications
+ *	(for internal use only)
  */
 struct scmi_handle {
 	struct device *dev;
@@ -246,6 +248,7 @@  struct scmi_handle {
 	void *power_priv;
 	void *sensor_priv;
 	void *reset_priv;
+	void *notify_priv;
 };
 
 enum scmi_std_protocol {