mbox series

[0/9] Add support for Microsoft Surface System Aggregator Module

Message ID 20201115192143.21571-1-luzmaximilian@gmail.com (mailing list archive)
Headers show
Series Add support for Microsoft Surface System Aggregator Module | expand

Message

Maximilian Luz Nov. 15, 2020, 7:21 p.m. UTC
Hello,

  N.B.: the following text is mostly a repeat of cover letter from the
  previous RFC for the uninitiated, which can be found at

  https://lore.kernel.org/linux-serial/20200923151511.3842150-1-luzmaximilian@gmail.com/

  See "Changes" below for an overview of differences between the RFC and
  this patchset. I hope I have addressed all comments from that in this
  version, thank you again for those.

The Surface System Aggregator Module (we'll refer to it as Surface
Aggregator or SAM below) is an embedded controller (EC) found on various
Microsoft Surface devices. Specifically, all 4th and later generation
Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
exception of the Surface Go series and the Surface Duo. Notably, it
seems like this EC can also be found on the ARM-based Surface Pro X [1].

Functionality provided by this EC depends on the Surface model and can
(roughly) be broken down by their generations: Starting with 5th
generation devices (Surface Pro 2017/5, Surface Book 2, Surface Laptop
1, and later), the EC provides battery and thermal readings, as well as
access to the real-time clock. On 5th and 6th generations, these
features, specifically, are provided via the ACPI interface of the EC,
referred to as Surface ACPI Notify (SAN), i.e. they act as standard ACPI
devices of that type, but require a driver translating requests written
to an ACPI operation region to requests to the EC. On 7th generation
devices, the ACPI interface is (largely) gone, and has been replaced
with custom battery and thermal drivers, directly querying the EC.

Additionally, HID keyboard and touchpad input for Surface models with
these devices built in can be handled via the EC: On the Surface Laptops
1 and 2, this includes only the keyboard, while on the Surface Laptop 3
and Book 3, this includes both touchpad and keyboard. In this case,
actual input is provided as HID data and the EC connection acts as HID
transport, thus requiring a special transport driver for those devices
to work.

Further, all known devices (5th and later generations) also support
changing of performance/cooling modes, which can influence cooling
capabilities of the device (e.g. prefer silent operation over
performance), and may influence power limits (e.g. of the discrete GPU
found on Surface Books).

While this constitutes all major functionality, some more device
specific functionality is also handled by the EC. For example, on the
Surface Books, the EC handles detaching of the clipboard (i.e. the upper
part with screen and CPU) from the base (the lower part with keyboard
and optional discrete GPU) and can influence its behavior (i.e. it
provides an interface via which detachment can be requested, aborted, or
confirmed). It can also be used to detect if there has been a base
attached to the clipboard, and if so what type.

This patch-series adds the basis for supporting this EC and the features
provided by it, by, first, implementing a communication driver providing
a fundamental API for client drivers, handling specific aspects of the
EC. Additionally, it builds on top of that to provide a dedicated bus
and device type to better manage EC clients (and break it down pseudo-
device-wise), especially in the case when these client devices are not
described in ACPI, i.e. cannot be discovered by conventional means.
Furthermore, it provides support for debugging and prototyping via an
optional user-space interface, and, lastly, also support for the
aforementioned ACPI interface, allowing ACPI to communicate with the EC
directly.

This series only addresses 5th and later generation Surface models as
the communication interface has changed substantially from 4th to 5th
generation, and the 4th generation interface has not been reverse-
engineered yet. Specifically, the underlying transport has been changed
from HID feature and input-/output-reports to serial communication via
an UART and a custom protocol. Support for 4th generation devices may be
added in the future, but as currently not much is known about 4th
generation SAM, it yet remains to be seen if this can happen as addition
to this subsystem, or if it will be easier to implement this as separate
platform driver. Especially as the 4th generation EC does not seem to
provide much of the functionality found on 5th and later generations
(e.g. no battery status reporting, no thermal reporting, ..., we assume
it's just clipboard detachment on the Surface Book 1 and performance
mode setting).

In more detail, this series adds a driver for the Surface Serial Hub
(SSH), the 5th- and later-generation communication channel to the EC, a
pseudo-device and driver exposing a misc-device that can be used to
communicate with the EC from user-space, intended for debugging,
testing, and prototyping, as well as a driver for the Surface ACPI
Notify (SAN) device, i.e. the interface between ACPI and EC. Some more
details on those can be found on the individual commit messages.

This series, apart from the SAN and user-space drivers, does not add any
client drivers. This will be handled via future patches once the core
has been accepted (and the other client drivers have been cleaned up a
bit).

On the top level, EC communication via the SSH driver can be broken down
into requests (sent from host to EC), corresponding responses (sent from
EC to host, associated with and triggered by a request), and events
(sent from EC to host without association to a request). The SSH driver
manages all communication (i.e. matches responses to requests, enables
and disables events, and manages event handlers/notifiers installed by
client drivers). On the lower levels, SSH communication is packet-based,
and described in more detail in the documentation added in this series
(specifically ssh.rst).

This set of modules and drivers has been developed out of tree at [2]
and used/tested in the kernel we provide at [3] pretty much since its
beginnings. It has been developed by reverse-engineering the SSH
protocol, mostly through the ACPI interface, communication dumps
obtained from listening in on Windows, and deduction. So things may be
wrong. There have been some attempts at reverse-engineering existing
drivers, which also gave a bit of insight for development, however, I
haven't gotten very far on that front beyond some more higher-level
concepts and detecting a couple of new EC commands/confirming the
functionality of already known commands.

Driver and module names have been chosen to align with Windows driver
names, some field, variable, and concept names have been chosen to align
with ACPI code (or at least with what I think some of the more cryptic
names could mean and make sense in the respective context, e.g. IID ->
Instance ID, TC -> Target Category).

This patch-set can also be found at the following repository and
reference, if you prefer to look at a kernel tree instead of these
emails:

  https://github.com/linux-surface/kernel tags/s/surface-aggregator/v1

Thanks,
Max

[1]: The Surface Pro X is, however, currently considered unsupported due
     to a lack of test candidates and, as it seems, general lack of
     Linux support on other parts. AFAIK there is an issue preventing
     serial devices from being registered, on which the core driver in
     this series is build on, thus there is no way to even test that at
     this point. I'd be happy to work out any issues regarding SAM on
     the Pro X at some point in the future, provided someone can/wants
     to actually test it.

[2]: https://github.com/linux-surface/surface-aggregator-module
[3]: https://github.com/linux-surface/linux-surface


Note: This patch depends on

  [PATCH v4] platform/surface: Create a platform subdirectory for
             Microsoft Surface devices

which can be found at

  https://lore.kernel.org/platform-driver-x86/20201009141128.683254-1-luzmaximilian@gmail.com/

and is currently in platform-drivers-x86/for-next.


Changes from the previous RFC (overview):
 - move to platform/surface
 - add copyright lines
 - change SPDX identifier to GPL-2.0+ (was GPL-2.0-or-later)
 - change user-space interface from debugfs to misc-device
 - address issues in user-space interface
 - fix typos in commit messages and documentation
 - fix some bugs, address other issues

Changes regarding specific patches (and more details) can be found on
the individual patch.


Maximilian Luz (9):
  platform/surface: Add Surface Aggregator subsystem
  platform/surface: aggregator: Add control packet allocation caching
  platform/surface: aggregator: Add event item allocation caching
  platform/surface: aggregator: Add trace points
  platform/surface: aggregator: Add error injection capabilities
  platform/surface: aggregator: Add dedicated bus and device type
  docs: driver-api: Add Surface Aggregator subsystem documentation
  platform/surface: Add Surface Aggregator user-space interface
  platform/surface: Add Surface ACPI Notify driver

 Documentation/driver-api/index.rst            |    1 +
 .../surface_aggregator/client-api.rst         |   38 +
 .../driver-api/surface_aggregator/client.rst  |  394 +++
 .../surface_aggregator/clients/cdev.rst       |   85 +
 .../surface_aggregator/clients/index.rst      |   21 +
 .../surface_aggregator/clients/san.rst        |   44 +
 .../driver-api/surface_aggregator/index.rst   |   21 +
 .../surface_aggregator/internal-api.rst       |   67 +
 .../surface_aggregator/internal.rst           |   50 +
 .../surface_aggregator/overview.rst           |   76 +
 .../driver-api/surface_aggregator/ssh.rst     |  343 +++
 MAINTAINERS                                   |   13 +
 drivers/platform/surface/Kconfig              |   39 +
 drivers/platform/surface/Makefile             |    3 +
 drivers/platform/surface/aggregator/Kconfig   |   65 +
 drivers/platform/surface/aggregator/Makefile  |   17 +
 drivers/platform/surface/aggregator/bus.c     |  424 +++
 drivers/platform/surface/aggregator/bus.h     |   27 +
 .../platform/surface/aggregator/controller.c  | 2557 +++++++++++++++++
 .../platform/surface/aggregator/controller.h  |  288 ++
 drivers/platform/surface/aggregator/core.c    |  831 ++++++
 .../platform/surface/aggregator/ssh_msgb.h    |  201 ++
 .../surface/aggregator/ssh_packet_layer.c     | 2009 +++++++++++++
 .../surface/aggregator/ssh_packet_layer.h     |  175 ++
 .../platform/surface/aggregator/ssh_parser.c  |  229 ++
 .../platform/surface/aggregator/ssh_parser.h  |  157 +
 .../surface/aggregator/ssh_request_layer.c    | 1254 ++++++++
 .../surface/aggregator/ssh_request_layer.h    |  142 +
 drivers/platform/surface/aggregator/trace.h   |  625 ++++
 .../platform/surface/surface_acpi_notify.c    |  884 ++++++
 .../surface/surface_aggregator_cdev.c         |  299 ++
 include/linux/mod_devicetable.h               |   18 +
 include/linux/surface_acpi_notify.h           |   39 +
 include/linux/surface_aggregator/controller.h |  832 ++++++
 include/linux/surface_aggregator/device.h     |  430 +++
 include/linux/surface_aggregator/serial_hub.h |  659 +++++
 include/uapi/linux/surface_aggregator/cdev.h  |   58 +
 scripts/mod/devicetable-offsets.c             |    8 +
 scripts/mod/file2alias.c                      |   23 +
 39 files changed, 13446 insertions(+)
 create mode 100644 Documentation/driver-api/surface_aggregator/client-api.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/client.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/clients/cdev.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/clients/index.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/clients/san.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/index.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/internal-api.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/internal.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/overview.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/ssh.rst
 create mode 100644 drivers/platform/surface/aggregator/Kconfig
 create mode 100644 drivers/platform/surface/aggregator/Makefile
 create mode 100644 drivers/platform/surface/aggregator/bus.c
 create mode 100644 drivers/platform/surface/aggregator/bus.h
 create mode 100644 drivers/platform/surface/aggregator/controller.c
 create mode 100644 drivers/platform/surface/aggregator/controller.h
 create mode 100644 drivers/platform/surface/aggregator/core.c
 create mode 100644 drivers/platform/surface/aggregator/ssh_msgb.h
 create mode 100644 drivers/platform/surface/aggregator/ssh_packet_layer.c
 create mode 100644 drivers/platform/surface/aggregator/ssh_packet_layer.h
 create mode 100644 drivers/platform/surface/aggregator/ssh_parser.c
 create mode 100644 drivers/platform/surface/aggregator/ssh_parser.h
 create mode 100644 drivers/platform/surface/aggregator/ssh_request_layer.c
 create mode 100644 drivers/platform/surface/aggregator/ssh_request_layer.h
 create mode 100644 drivers/platform/surface/aggregator/trace.h
 create mode 100644 drivers/platform/surface/surface_acpi_notify.c
 create mode 100644 drivers/platform/surface/surface_aggregator_cdev.c
 create mode 100644 include/linux/surface_acpi_notify.h
 create mode 100644 include/linux/surface_aggregator/controller.h
 create mode 100644 include/linux/surface_aggregator/device.h
 create mode 100644 include/linux/surface_aggregator/serial_hub.h
 create mode 100644 include/uapi/linux/surface_aggregator/cdev.h

Comments

Andy Shevchenko Nov. 16, 2020, 1:33 p.m. UTC | #1
On Sun, Nov 15, 2020 at 9:25 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>
> Add Surface System Aggregator Module core and Surface Serial Hub driver,
> required for the embedded controller found on Microsoft Surface devices.
>
> The Surface System Aggregator Module (SSAM, SAM or Surface Aggregator)
> is an embedded controller (EC) found on 4th and later generation
> Microsoft Surface devices, with the exception of the Surface Go series.
> This EC provides various functionality, depending on the device in
> question. This can include battery status and thermal reporting (5th and
> later generations), but also HID keyboard (6th+) and touchpad input
> (7th+) on Surface Laptop and Surface Book 3 series devices.
>
> This patch provides the basic necessities for communication with the SAM
> EC on 5th and later generation devices. On these devices, the EC
> provides an interface that acts as serial device, called the Surface
> Serial Hub (SSH). 4th generation devices, on which the EC interface is
> provided via an HID-over-I2C device, are not supported by this patch.
>
> Specifically, this patch adds a driver for the SSH device (device HID
> MSHW0084 in ACPI), as well as a controller structure and associated API.
> This represents the functional core of the Surface Aggregator kernel
> subsystem, introduced with this patch, and will be expanded upon in
> subsequent commits.
>
> The SSH driver acts as the main attachment point for this subsystem and
> sets-up and manages the controller structure. The controller in turn
> provides a basic communication interface, allowing to send requests from
> host to EC and receiving the corresponding responses, as well as
> managing and receiving events, sent from EC to host. It is structured
> into multiple layers, with the top layer presenting the API used by
> other kernel drivers and the lower layers modeled after the serial
> protocol used for communication.
>
> Said other drivers are then responsible for providing the (Surface model
> specific) functionality accessible through the EC (e.g. battery status
> reporting, thermal information, ...) via said controller structure and
> API, and will be added in future commits.

...

> +menuconfig SURFACE_AGGREGATOR
> +       tristate "Microsoft Surface System Aggregator Module Subsystem and Drivers"
> +       depends on SERIAL_DEV_BUS
> +       depends on ACPI
> +       select CRC_CCITT
> +       help
> +         The Surface System Aggregator Module (Surface SAM or SSAM) is an
> +         embedded controller (EC) found on 5th- and later-generation Microsoft
> +         Surface devices (i.e. Surface Pro 5, Surface Book 2, Surface Laptop,
> +         and newer, with exception of Surface Go series devices).
> +
> +         Depending on the device in question, this EC provides varying
> +         functionality, including:
> +         - EC access from ACPI via Surface ACPI Notify (5th- and 6th-generation)
> +         - battery status information (all devices)
> +         - thermal sensor access (all devices)
> +         - performance mode / cooling mode control (all devices)
> +         - clipboard detachment system control (Surface Book 2 and 3)
> +         - HID / keyboard input (Surface Laptops, Surface Book 3)
> +
> +         This option controls whether the Surface SAM subsystem core will be
> +         built. This includes a driver for the Surface Serial Hub (SSH), which
> +         is the device responsible for the communication with the EC, and a
> +         basic kernel interface exposing the EC functionality to other client
> +         drivers, i.e. allowing them to make requests to the EC and receive
> +         events from it. Selecting this option alone will not provide any
> +         client drivers and therefore no functionality beyond the in-kernel
> +         interface. Said functionality is the responsibility of the respective
> +         client drivers.
> +
> +         Note: While 4th-generation Surface devices also make use of a SAM EC,
> +         due to a difference in the communication interface of the controller,
> +         only 5th and later generations are currently supported. Specifically,
> +         devices using SAM-over-SSH are supported, whereas devices using
> +         SAM-over-HID, which is used on the 4th generation, are currently not
> +         supported.

From this help text I didn't get if it will be a module or what if I
chose the above?

...

> +/* -- Safe counters. -------------------------------------------------------- */

Why can't XArray be used here?

...

> +
> +

One blank line is enough

...

> +static bool ssam_event_matches_notifier(
> +               const struct ssam_event_notifier *notif,
> +               const struct ssam_event *event)

Perhaps

static bool
ssam_event_matches_notifier(const struct ssam_event_notifier *n,
               const struct ssam_event *event)

(or even switch to 100 limit, also note notif ->n — no need to repeat same word)

...

> +       nb = rcu_dereference_raw(nh->head);
> +       while (nb) {
> +               nf = container_of(nb, struct ssam_event_notifier, base);
> +               next_nb = rcu_dereference_raw(nb->next);
> +
> +               if (ssam_event_matches_notifier(nf, event)) {
> +                       ret = (ret & SSAM_NOTIF_STATE_MASK) | nb->fn(nf, event);
> +                       if (ret & SSAM_NOTIF_STOP)
> +                               break;

The returned value is a bitmask?!
What are you returning at the end?

> +               }
> +
> +               nb = next_nb;
> +       }

...

> +static int __ssam_nfblk_insert(struct ssam_nf_head *nh, struct ssam_notifier_block *nb)
> +{
> +       struct ssam_notifier_block **link = &nh->head;
> +
> +       while ((*link) != NULL) {
> +               if (unlikely((*link) == nb)) {
> +                       WARN(1, "double register detected");
> +                       return -EINVAL;
> +               }
> +
> +               if (nb->priority > (*link)->priority)
> +                       break;
> +
> +               link = &((*link)->next);
> +       }
> +
> +       nb->next = *link;
> +       rcu_assign_pointer(*link, nb);
> +
> +       return 0;
> +}

If you need RCU (which is also the Q per se), why not use RCU list?

...

> +       while ((*link) != NULL) {

Redundant parentheses, redundant ' != NULL' part.

> +               if ((*link) == nb)
> +                       return link;
> +
> +               link = &((*link)->next);
> +       }

...

> + * struct ssam_nf_refcount_entry - RB-tree entry for referecnce counting event

In above you mistyped 'which' as 'whic' or so, and here reference.
Perhaps go thru spell checker?

...

> + * registered, or ``ERR_PTR(-ENOMEM)`` if the entry could not be allocated.

Better to spell out "error pointer" instead of cryptic ERR_PTR().

...

> + * Executa registered callbacks in order of their priority until either no
> + * callback is left or a callback returned a value with the %SSAM_NOTIF_STOP

returns

> + * bit set. Note that this bit is set automatically when converting non.zero
> + * error values via ssam_notifier_from_errno() to notifier values.

...

> +               for (i = i - 1; i >= 0; i--)

while (i--)

> +                       ssam_nf_head_destroy(&nf->head[i]);

...

> +       // limit number of processed events to avoid livelocking
> +       for (i = 0; i < 10; i++) {

Magic number! Also, this will be better to read in a form of

unsigned int iterations = 10;

do {
...
} while (--iterations);

> +               item = ssam_event_queue_pop(queue);
> +               if (item == NULL)
> +                       return;
> +
> +               ssam_nf_call(nf, dev, item->rqid, &item->event);
> +               kfree(item);
> +       }

...

> +static const guid_t SSAM_SSH_DSM_GUID = GUID_INIT(0xd5e383e1, 0xd892, 0x4a76,
> +               0x89, 0xfc, 0xf6, 0xaa, 0xae, 0x7e, 0xd5, 0xb5);

Can you use usual pattern for these UIDs, like
static const guid_t SSAM_SSH_DSM_GUID =
        GUID_INIT(0xd5e383e1, 0xd892, 0x4a76,
                 0x89, 0xfc, 0xf6, 0xaa, 0xae, 0x7e, 0xd5, 0xb5);
?

Also put a comment how this UID will look like in a string representation.

...

> +       if (!acpi_has_method(handle, "_DSM"))
> +               return 0;

Hmm... What's the expectation?

> +       obj = acpi_evaluate_dsm_typed(handle, &SSAM_SSH_DSM_GUID,
> +                                     SSAM_SSH_DSM_REVISION, 0, NULL,
> +                                     ACPI_TYPE_BUFFER);
> +       if (!obj)
> +               return -EFAULT;

EFAULT?! Perhaps you can simply return 0 here, no?

> +       for (i = 0; i < obj->buffer.length && i < 8; i++)
> +               mask |= (((u64)obj->buffer.pointer[i]) << (i * 8));

Don't we have some helpers for this? At least I remember similar code
went to one of PDx86 drivers like intel-vbtn or so.

> +       if (mask & 0x01)

BIT(0) ?

> +               *funcs = mask;

...

> +       caps->ssh_power_profile = (u32)-1;
> +       caps->screen_on_sleep_idle_timeout = (u32)-1;
> +       caps->screen_off_sleep_idle_timeout = (u32)-1;
> +       caps->d3_closes_handle = false;
> +       caps->ssh_buffer_size = (u32)-1;

Use proper types and their limits (limits.h missed?).

...

> +       // initialize request and packet transport layers

Inconsistent style of comments.

...

> + * In the course of this shutdown procedure, all currently registered
> + * notifiers will be unregistered. It is, however, strongly recommended to not
> + * rely on this behavior, and instead the party registring the notifier should

registering

> + * unregister it before the controller gets shut down, e.g. via the SSAM bus
> + * which guarantees client devices to be removed before a shutdown.

> + * Note that events may still be pending after this call, but due to the
> + * notifiers being unregistered, the will be dropped when the controller is

the?!

> + * subsequently being destroyed via ssam_controller_destroy().

...

> + * Ensures that all resources associated with the controller get freed. This
> + * function should only be called after the controller has been stopped via
> + * ssam_controller_shutdown(). In general, this function should not be called
> + * directly. The only valid place to call this function direclty is during

directly

> + * initialization, before the controller has been fully initialized and passed
> + * to other processes. This function is called automatically when the
> + * reference count of the controller reaches zero.

...

> + * ssam_request_sync_free() - Free a synchronous request.
> + * @rqst: The request to free.

to be freed?

...

> + * Allocates a synchronous request struct on the stack, fully initializes it
> + * using the provided buffer as message data buffer, submits it, and then
> + * waits for its completion before returning its staus. The

status

> + * SSH_COMMAND_MESSAGE_LENGTH() macro can be used to compute the required
> + * message buffer size.

...

> + * This is a wrapper for the raw SAM request to enable an event, thus it does
> + * not handle referecnce counting for enable/disable of events. If an event
> + * has already been enabled, the EC will ignore this request.

Grammar and English language style somehow feels not okay.

...

> +       u8 buf[1] = { 0x00 };

Can't be simply buf ?

...

> + * This function will only send the display-off notification command if
> + * display noticications are supported by the EC. Currently all known devices
> + * support these notification.

Spell check!

...

> + * This function will only send the display-on notification command if display
> + * noticications are supported by the EC. Currently all known devices support
> + * these notification.

Ditto.

...

> +               ssam_err(ctrl, "unexpected response from D0-exit notification:"
> +                        " 0x%02x\n", response);

Don't split string literals. Had you run a checkpatch?
Please do and use strict mode.


...

Overall impression of this submission:
 - it's quite huge as for a single patch
 - it feels like quite an overengineered solution: READ_ONCE, RCU,
list + spin_lock, RB tree of notifiers, my head!
 - where is the architectural document of all these?
Maximilian Luz Nov. 16, 2020, 5:03 p.m. UTC | #2
On 11/16/20 2:33 PM, Andy Shevchenko wrote:
> On Sun, Nov 15, 2020 at 9:25 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>
>> Add Surface System Aggregator Module core and Surface Serial Hub driver,
>> required for the embedded controller found on Microsoft Surface devices.
>>
>> The Surface System Aggregator Module (SSAM, SAM or Surface Aggregator)
>> is an embedded controller (EC) found on 4th and later generation
>> Microsoft Surface devices, with the exception of the Surface Go series.
>> This EC provides various functionality, depending on the device in
>> question. This can include battery status and thermal reporting (5th and
>> later generations), but also HID keyboard (6th+) and touchpad input
>> (7th+) on Surface Laptop and Surface Book 3 series devices.
>>
>> This patch provides the basic necessities for communication with the SAM
>> EC on 5th and later generation devices. On these devices, the EC
>> provides an interface that acts as serial device, called the Surface
>> Serial Hub (SSH). 4th generation devices, on which the EC interface is
>> provided via an HID-over-I2C device, are not supported by this patch.
>>
>> Specifically, this patch adds a driver for the SSH device (device HID
>> MSHW0084 in ACPI), as well as a controller structure and associated API.
>> This represents the functional core of the Surface Aggregator kernel
>> subsystem, introduced with this patch, and will be expanded upon in
>> subsequent commits.
>>
>> The SSH driver acts as the main attachment point for this subsystem and
>> sets-up and manages the controller structure. The controller in turn
>> provides a basic communication interface, allowing to send requests from
>> host to EC and receiving the corresponding responses, as well as
>> managing and receiving events, sent from EC to host. It is structured
>> into multiple layers, with the top layer presenting the API used by
>> other kernel drivers and the lower layers modeled after the serial
>> protocol used for communication.
>>
>> Said other drivers are then responsible for providing the (Surface model
>> specific) functionality accessible through the EC (e.g. battery status
>> reporting, thermal information, ...) via said controller structure and
>> API, and will be added in future commits.
> 
> ...
> 
>> +menuconfig SURFACE_AGGREGATOR
>> +       tristate "Microsoft Surface System Aggregator Module Subsystem and Drivers"
>> +       depends on SERIAL_DEV_BUS
>> +       depends on ACPI
>> +       select CRC_CCITT
>> +       help
>> +         The Surface System Aggregator Module (Surface SAM or SSAM) is an
>> +         embedded controller (EC) found on 5th- and later-generation Microsoft
>> +         Surface devices (i.e. Surface Pro 5, Surface Book 2, Surface Laptop,
>> +         and newer, with exception of Surface Go series devices).
>> +
>> +         Depending on the device in question, this EC provides varying
>> +         functionality, including:
>> +         - EC access from ACPI via Surface ACPI Notify (5th- and 6th-generation)
>> +         - battery status information (all devices)
>> +         - thermal sensor access (all devices)
>> +         - performance mode / cooling mode control (all devices)
>> +         - clipboard detachment system control (Surface Book 2 and 3)
>> +         - HID / keyboard input (Surface Laptops, Surface Book 3)
>> +
>> +         This option controls whether the Surface SAM subsystem core will be
>> +         built. This includes a driver for the Surface Serial Hub (SSH), which
>> +         is the device responsible for the communication with the EC, and a
>> +         basic kernel interface exposing the EC functionality to other client
>> +         drivers, i.e. allowing them to make requests to the EC and receive
>> +         events from it. Selecting this option alone will not provide any
>> +         client drivers and therefore no functionality beyond the in-kernel
>> +         interface. Said functionality is the responsibility of the respective
>> +         client drivers.
>> +
>> +         Note: While 4th-generation Surface devices also make use of a SAM EC,
>> +         due to a difference in the communication interface of the controller,
>> +         only 5th and later generations are currently supported. Specifically,
>> +         devices using SAM-over-SSH are supported, whereas devices using
>> +         SAM-over-HID, which is used on the 4th generation, are currently not
>> +         supported.
> 
>  From this help text I didn't get if it will be a module or what if I
> chose the above?

I thought this would be implied by the tristate options. So should I
simply add something like

     Choose m if you want to build the SAM subsystem core and SSH driver
     as module, y if you want to build it into the kernel and n if you
     don't want it at all.

> ...
> 
>> +/* -- Safe counters. -------------------------------------------------------- */
> 
> Why can't XArray be used here?

These are basically packet and request IDs (that are sent to the EC and
received from it). XArray is a type of collection. I'm not sure how this
would work.

What happens is: For each new packet/request we get a new ID by
incrementing the counter (with wrap-around). This ID is used, for
example, to match ACK-packets to the corresponding data packet. The EC
will also use this to detect re-transmission, e.g. if we send two
packets with the same ID in sequence, the EC will detect the second
packet as re-transmission of the first one (which is the reason why
simple IDA won't work).

Furthermore, both the Windows driver and the EC itself (for packets sent
by it) use wrapping counters as well, so it seems to be the safest way
to do this without possibly triggering re-transmission detection by
accident.

> ...
> 
>> +
>> +
> 
> One blank line is enough
> 

Can I keep two blank lines for the section separator comments?

> ...
> 
>> +static bool ssam_event_matches_notifier(
>> +               const struct ssam_event_notifier *notif,
>> +               const struct ssam_event *event)
> 
> Perhaps
> 
> static bool
> ssam_event_matches_notifier(const struct ssam_event_notifier *n,
>                 const struct ssam_event *event)
> 
> (or even switch to 100 limit, also note notif ->n — no need to repeat same word)

Ack.

> ...
> 
>> +       nb = rcu_dereference_raw(nh->head);
>> +       while (nb) {
>> +               nf = container_of(nb, struct ssam_event_notifier, base);
>> +               next_nb = rcu_dereference_raw(nb->next);
>> +
>> +               if (ssam_event_matches_notifier(nf, event)) {
>> +                       ret = (ret & SSAM_NOTIF_STATE_MASK) | nb->fn(nf, event);
>> +                       if (ret & SSAM_NOTIF_STOP)
>> +                               break;
> 
> The returned value is a bitmask?!

In part, yes. I've referred to this as the "notifier status value" in
the (function) documentation. This is basically

     | bit 0       | bit 1    | bit 2 to 31         |
     +-------------+----------+---------------------+
     | handled-bit | stop-bit | error-value or zero |

The ssam_notifier_to_errno() function can be used to convert an error
value to this notifier status value. This is the same as the return
value of the notifier function and is described a bit better in the
documentation of struct ssam_notifier_block.

> What are you returning at the end?

The notifier status value as described above. This can be tested for the
SSAM_NOTIF_HANDLED and SSAM_NOTIF_STOP bits. The error value can, as
mentioned in the documentation, be extracted via
ssam_notifier_to_errno().

All in all, this is based on notifier_call_chain() and SRCU notifiers
(linux/notifier.h, kernel/notifier.c) in general, with the key
difference that it includes a bit to indicate if the notification has
been handled (or detect if it goes unhandled).

> 
>> +               }
>> +
>> +               nb = next_nb;
>> +       }
> 
> ...
> 
>> +static int __ssam_nfblk_insert(struct ssam_nf_head *nh, struct ssam_notifier_block *nb)
>> +{
>> +       struct ssam_notifier_block **link = &nh->head;
>> +
>> +       while ((*link) != NULL) {
>> +               if (unlikely((*link) == nb)) {
>> +                       WARN(1, "double register detected");
>> +                       return -EINVAL;
>> +               }
>> +
>> +               if (nb->priority > (*link)->priority)
>> +                       break;
>> +
>> +               link = &((*link)->next);
>> +       }
>> +
>> +       nb->next = *link;
>> +       rcu_assign_pointer(*link, nb);
>> +
>> +       return 0;
>> +}
> 
> If you need RCU (which is also the Q per se), why not use RCU list?

This part has been mostly copied and adapted from
notifier_chain_register(), which uses the exact same mechanism. I can
change it to use RCU list primitives if that's preferred.

> ...
> 
>> +       while ((*link) != NULL) {
> 
> Redundant parentheses, redundant ' != NULL' part.

Ack.

> 
>> +               if ((*link) == nb)
>> +                       return link;
>> +
>> +               link = &((*link)->next);
>> +       }
> 
> ...
> 
>> + * struct ssam_nf_refcount_entry - RB-tree entry for referecnce counting event
> 
> In above you mistyped 'which' as 'whic' or so, and here reference.
> Perhaps go thru spell checker?

Ack. I'll do that.

> 
> ...
> 
>> + * registered, or ``ERR_PTR(-ENOMEM)`` if the entry could not be allocated.
> 
> Better to spell out "error pointer" instead of cryptic ERR_PTR().

Ack.

> 
> ...
> 
>> + * Executa registered callbacks in order of their priority until either no
>> + * callback is left or a callback returned a value with the %SSAM_NOTIF_STOP
> 
> returns

Ack.

> 
>> + * bit set. Note that this bit is set automatically when converting non.zero
>> + * error values via ssam_notifier_from_errno() to notifier values.
> 
> ...
> 
>> +               for (i = i - 1; i >= 0; i--)
> 
> while (i--)

Ack.

> 
>> +                       ssam_nf_head_destroy(&nf->head[i]);
> 
> ...
> 
>> +       // limit number of processed events to avoid livelocking
>> +       for (i = 0; i < 10; i++) {
> 
> Magic number! Also, this will be better to read in a form of
> 
> unsigned int iterations = 10;
> 
> do {
> ...
> } while (--iterations);

Okay.

> 
>> +               item = ssam_event_queue_pop(queue);
>> +               if (item == NULL)
>> +                       return;
>> +
>> +               ssam_nf_call(nf, dev, item->rqid, &item->event);
>> +               kfree(item);
>> +       }
> 
> ...
> 
>> +static const guid_t SSAM_SSH_DSM_GUID = GUID_INIT(0xd5e383e1, 0xd892, 0x4a76,
>> +               0x89, 0xfc, 0xf6, 0xaa, 0xae, 0x7e, 0xd5, 0xb5);
> 
> Can you use usual pattern for these UIDs, like
> static const guid_t SSAM_SSH_DSM_GUID =
>          GUID_INIT(0xd5e383e1, 0xd892, 0x4a76,
>                   0x89, 0xfc, 0xf6, 0xaa, 0xae, 0x7e, 0xd5, 0xb5);
> ?
> 
> Also put a comment how this UID will look like in a string representation.

Will change that.

> 
> ...
> 
>> +       if (!acpi_has_method(handle, "_DSM"))
>> +               return 0;
> 
> Hmm... What's the expectation?

Depends on the device. Old devices don't have that _DSM (thus fallback
to defaults), newer devices do have it.

> 
>> +       obj = acpi_evaluate_dsm_typed(handle, &SSAM_SSH_DSM_GUID,
>> +                                     SSAM_SSH_DSM_REVISION, 0, NULL,
>> +                                     ACPI_TYPE_BUFFER);
>> +       if (!obj)
>> +               return -EFAULT;
> 
> EFAULT?! Perhaps you can simply return 0 here, no?

I'd prefer returning an error (_DSM index 0 should always be present and
return a buffer), but I agree that EFAULT is probably the wrong one.

> 
>> +       for (i = 0; i < obj->buffer.length && i < 8; i++)
>> +               mask |= (((u64)obj->buffer.pointer[i]) << (i * 8));
> 
> Don't we have some helpers for this? At least I remember similar code
> went to one of PDx86 drivers like intel-vbtn or so.

There is acpi_check_dsm(), which checks if a single function is
supported, but as far as I can tell no helper that returns a bitmask of
all supported functions. I can change it to use acpi_check_dsm() in
ssam_dsm_load_u32() though.

> 
>> +       if (mask & 0x01)
> 
> BIT(0) ?

BIT(0) looks better, ack.

> 
>> +               *funcs = mask;
> 
> ...
> 
>> +       caps->ssh_power_profile = (u32)-1;
>> +       caps->screen_on_sleep_idle_timeout = (u32)-1;
>> +       caps->screen_off_sleep_idle_timeout = (u32)-1;
>> +       caps->d3_closes_handle = false;
>> +       caps->ssh_buffer_size = (u32)-1;
> 
> Use proper types and their limits (limits.h missed?).

Okay.

> 
> ...
> 
>> +       // initialize request and packet transport layers
> 
> Inconsistent style of comments.

Will change that.

> 
> ...
> 
>> + * In the course of this shutdown procedure, all currently registered
>> + * notifiers will be unregistered. It is, however, strongly recommended to not
>> + * rely on this behavior, and instead the party registring the notifier should
> 
> registering

Ack.

> 
>> + * unregister it before the controller gets shut down, e.g. via the SSAM bus
>> + * which guarantees client devices to be removed before a shutdown.
> 
>> + * Note that events may still be pending after this call, but due to the
>> + * notifiers being unregistered, the will be dropped when the controller is
> 
> the?!

Should be "the events".

> 
>> + * subsequently being destroyed via ssam_controller_destroy().
> 
> ...
> 
>> + * Ensures that all resources associated with the controller get freed. This
>> + * function should only be called after the controller has been stopped via
>> + * ssam_controller_shutdown(). In general, this function should not be called
>> + * directly. The only valid place to call this function direclty is during
> 
> directly

Ack.

> 
>> + * initialization, before the controller has been fully initialized and passed
>> + * to other processes. This function is called automatically when the
>> + * reference count of the controller reaches zero.
> 
> ...
> 
>> + * ssam_request_sync_free() - Free a synchronous request.
>> + * @rqst: The request to free.
> 
> to be freed?

I think both should be grammatically correct, but I'll change it.

> 
> ...
> 
>> + * Allocates a synchronous request struct on the stack, fully initializes it
>> + * using the provided buffer as message data buffer, submits it, and then
>> + * waits for its completion before returning its staus. The
> 
> status

Ack.

> 
>> + * SSH_COMMAND_MESSAGE_LENGTH() macro can be used to compute the required
>> + * message buffer size.
> 
> ...
> 
>> + * This is a wrapper for the raw SAM request to enable an event, thus it does
>> + * not handle referecnce counting for enable/disable of events. If an event
>> + * has already been enabled, the EC will ignore this request.
> 
> Grammar and English language style somehow feels not okay.

I'll try to reword that. I agree that this feels a bit off.

> 
> ...
> 
>> +       u8 buf[1] = { 0x00 };
> 
> Can't be simply buf ?

It can be, I'll change that

> 
> ...
> 
>> + * This function will only send the display-off notification command if
>> + * display noticications are supported by the EC. Currently all known devices
>> + * support these notification.
> 
> Spell check!

Will do that.

> 
> ...
> 
>> + * This function will only send the display-on notification command if display
>> + * noticications are supported by the EC. Currently all known devices support
>> + * these notification.
> 
> Ditto.

Ack.

> 
> ...
> 
>> +               ssam_err(ctrl, "unexpected response from D0-exit notification:"
>> +                        " 0x%02x\n", response);
> 
> Don't split string literals. Had you run a checkpatch?
> Please do and use strict mode.

I did run checkpatch, however, I decided to split strings as some of
them are a bit long. When doing that, I only split at places where I
personally wouldn't expect a grep to succeed for, e.g. before or after
format specifiers. I though that this is okay as the main reason for
"don't split string literals" is searchability.

> 
> 
> ...
> 
> Overall impression of this submission:
>   - it's quite huge as for a single patch

I tried to split it into parts, but this is the smallest I've been able
to get it while also keeping it a functional unit.

>   - it feels like quite an overengineered solution: READ_ONCE, RCU,
> list + spin_lock, RB tree of notifiers, my head!

Sorry about any headaches. My main goal was stability and reliability,
while keeping things somewhat performant and generic.

The "generic" part mostly because this whole thing is based on
reverse-engineering, and I believe that makes it easier to correct for
things that we got wrong, e.g. in what we think the protocol looks like.

Packet state is tracked via atomic flags (which is also laid out in a
comment), and is the simplest solution I found to track that. I found
that a simple enum based state will complicate things as the driver has
to support and properly handle re-submission of packets.

READ_ONCE and WRITE_ONCE are used to ensure proper access to state that
can be changed outside of the queue/pending locks (or under any one of
them). In general, I have tried to document all critical access of such
state with an explanation of why it is safe to do so.

RCU is only used for notifier callbacks, which is based on the SRCU
notifiers from linux/notifier.h and kernel/notifier.c, but adapted to
detect unhandled events.

While it is somewhat interwoven, the (RCU-based) notifier chain stores
the notifiers (one chain per target category) and the RB-tree stores the
number of times an event has been enabled. Essentially, those should be
seen as two separate parts (see e.g. documentation of struct ssam_nf,
which brings those two together).

So the RB-tree is not for notifiers, but for tracking which events are
enabled. Basically reference-counting for enabled events, as the EC
itself only allows turning them on/off. It's necessary to do that as
there can be multiple clients listening to the same event. It's also
required to do this with some sort of map, as (in context of
enabling/disabling) the events are uniquely identified by both their
registry and ID. This combination is a bit too complex to do that with a
simple array, so a RB-tree looked like the best solution.

Note that we store notifiers per target category only, so for that we
can use a simple array.

>   - where is the architectural document of all these?

What specifically do you want to see documented? I've documented the
basic communication and principles in another patch of this series, but
kept implementation details somewhat vague as I think those are better
covered by struct and function documentation (which won't become
outdated as easily as a separate documentation).

Thank you for your comments.

Regards,
Max
Maximilian Luz Nov. 17, 2020, 6:05 a.m. UTC | #3
On 11/16/20 6:03 PM, Maximilian Luz wrote:
> On 11/16/20 2:33 PM, Andy Shevchenko wrote:
>> On Sun, Nov 15, 2020 at 9:25 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:

[...]

> READ_ONCE and WRITE_ONCE are used to ensure proper access to state that
> can be changed outside of the queue/pending locks (or under any one of
> them). In general, I have tried to document all critical access of such
> state with an explanation of why it is safe to do so.

I've looked at this again and noticed that I can guard the packet
timestamp by the pending lock and the packet priority by the queue lock
(after first submission). This makes reasoning about access to them
significantly easier and removes the need for WRITE_ONCE / READ_ONCE.

After that, READ_ONCE is used

- to access the controller state for smoke-testing to (hopefully) detect
   invalid request function usage (note that all other access to this
   state happens under the controller state lock)

- for the "safe counters", where access to the shared value is, after
   initialization, restricted to the increment function

- to update the timeout reaper, where access to the shared value
   (rtx_timeout.expires) is, after initialization, restricted to its
   modification function (ssh_ptl_timeout_reaper_mod() /
   ssh_rtl_timeout_reaper_mod()) and the timer function

- to access the request timestamp, which is, after initialization, only
   set once in the lifetime of a request (all other access is read-only)

- to access the 'ptl' reference of the packet, which, after
   initialization, is only set once, either at packet or request
   submission (all other access is read-only). Note due to this,
   READ_ONCE is only required for functions that can run concurrently
   with ssh_ptl_submit() and ssh_rtl_submit(), i.e. ssh_ptl_cancel() and
   ssh_rtl_cancel().

- to access request state outside of bit-ops when canceling

I'd argue that all of these cases can be checked and verified with a
reasonable amount of effort. Cancellation (last two points) is probably
the most complex one. Unfortunately, I don't see any way to simplify
that part without disallowing cancellation to run concurrently to
submission, which is something I'd like to support as this makes
implementing asynchronous requests in the future easier.

Regards,
Max
Barnabás Pőcze Nov. 18, 2020, 12:28 a.m. UTC | #4
Hi

I have attached some thoughts and comments inline.


2020. november 15., vasárnap 20:21 keltezéssel, Maximilian Luz írta:

> [...]
> +/* -- Event notifier/callbacks. --------------------------------------------- */
> +/*
> + * The notifier system is based on linux/notifier.h, specifically the SRCU
> + * implementation. The difference to that is, that some bits of the notifier
> + * call return value can be tracked across multiple calls. This is done so that
> + * handling of events can be tracked and a warning can be issued in case an
> + * event goes unhandled. The idea of that waring is that it should help discover
                                                ^
"warning"


> + * and identify new/currently unimplemented features.
> + */
> +
> +
> +/**
> + * ssam_event_matches_notifier() - Test if an event matches a notifier;
                                                                         ^
Shouldn't it be a period?


> + * @notif: The event notifier to test against.
> + * @event: The event to test.
> + *
> + * Return: Returns %true iff the given event matches the given notifier
> + * according to the rules set in the notifier's event mask, %false otherwise.
> + */

> [...]

> +static int __ssam_nfblk_remove(struct ssam_nf_head *nh,
> +			       struct ssam_notifier_block *nb)
> +{
> +	struct ssam_notifier_block **link;
> +
> +	link = __ssam_nfblk_find_link(nh, nb);
> +	if (!link)
> +		return -ENOENT;

I find it odd that here you return ENOENT, but in `__ssam_nfblk_insert()`
EINVAL is returned instead of EEXIST. I believe either both should be EINVAL,
or EEXIST+ENOENT.


> +
> +	__ssam_nfblk_erase(link);

I'm wondering if it's necessary to create a new function which contains just
a single line.


> +	return 0;
> +}

> [...]

> +/**
> + * ssam_nf_head_destroy() - Deinitialize the given notifier head.
> + * @nh: The notifier head to deinitialize.
> + */
> +static void ssam_nf_head_destroy(struct ssam_nf_head *nh)
> +{
> +	cleanup_srcu_struct(&nh->srcu);
> +}

I'm also wondering if there's any reason why these static one-liner functions are
not explicitly marked inline.


> [...]

> +/**
> + * struct ssam_nf_refcount_entry - RB-tree entry for referecnce counting event
> + * activations.
> + * @node:     The node of this entry in the rb-tree.
> + * @key:      The key of the event.
> + * @refcount: The reference-count of the event.
> + * @flags:    The flags used when enabling the event.
> + */
> +struct ssam_nf_refcount_entry {
> +	struct rb_node node;
> +	struct ssam_nf_refcount_key key;
> +	int refcount;

Is there any reason why a signed type is used for reference counting?


> +	u8 flags;
> +};
> +
> +
> +/**
> + * ssam_nf_refcount_inc() - Increment reference-/activation-count of the given
> + * event.
> + * @nf:  The notifier system reference.
> + * @reg: The registry used to enable/disable the event.
> + * @id:  The event ID.
> + *
> + * Increments the reference-/activation-count associated with the specified
> + * event type/ID, allocating a new entry for this event ID if necessary. A
> + * newly allocated entry will have a refcount of one.

Shouldn't it be noted that nf->lock(?) must(?) be held when calling?


> [...]

> +/**
> + * ssam_nf_call() - Call notification callbacks for the provided event.
> + * @nf:    The notifier system
> + * @dev:   The associated device, only used for logging.
> + * @rqid:  The request ID of the event.
> + * @event: The event provided to the callbacks.
> + *
> + * Executa registered callbacks in order of their priority until either no
            ^
"execute"


> + * callback is left or a callback returned a value with the %SSAM_NOTIF_STOP
> + * bit set. Note that this bit is set automatically when converting non.zero
                                                                          ^
maybe "non-zero"?


> + * error values via ssam_notifier_from_errno() to notifier values.
> + *
> + * Also note that any callback that could handle an event should return a value
> + * with bit %SSAM_NOTIF_HANDLED set, indicating that the event does not go
> + * unhandled/ignored. In case no registered callback could handle an event,
> + * this function will emit a warning.
> + *
> + * In case a callback failed, this function will emit an error message.
> + */
> +static void ssam_nf_call(struct ssam_nf *nf, struct device *dev, u16 rqid,
> +			 struct ssam_event *event)
> +{
> +	struct ssam_nf_head *nf_head;
> +	int status, nf_ret;
> +
> +	if (!ssh_rqid_is_event(rqid)) {
> +		dev_warn(dev, "event: unsupported rqid: 0x%04x\n", rqid);

A small note, "%#04x" would insert the "0x" prefix.


> +		return;
> +	}

> [...]

> +/**
> + * ssam_event_item_alloc() - Allocate an event item with the given payload size.
> + * @len:   The event payload length.
> + * @flags: The flags used for allocation.
> + *
> + * Allocate an event item with the given payload size. Sets the item
> + * operations and payload length values. The item free callback (``ops.free``)
> + * should not be overwritten after this call.
> + *
> + * Return: Returns the newly allocated event item.
> + */
> +static struct ssam_event_item *ssam_event_item_alloc(size_t len, gfp_t flags)

The `flags` argument is seemingly ignored.


> +{
> +	struct ssam_event_item *item;
> +
> +	item = kzalloc(sizeof(*item) + len, GFP_KERNEL);

I believe `struct_size(item, event.data, len)` could be utilized here.


> +	if (!item)
> +		return NULL;
> +
> +	item->event.length = len;
> +	return item;
> +}

> [...]

> +static void ssam_event_queue_work_fn(struct work_struct *work)
> +{
> +	struct ssam_event_queue *queue;
> +	struct ssam_event_item *item;
> +	struct ssam_nf *nf;
> +	struct device *dev;
> +	int i;
> +
> +	queue = container_of(work, struct ssam_event_queue, work);
> +	nf = &queue->cplt->event.notif;
> +	dev = queue->cplt->dev;
> +
> +	// limit number of processed events to avoid livelocking
> +	for (i = 0; i < 10; i++) {
> +		item = ssam_event_queue_pop(queue);
> +		if (item == NULL)

I believe `!item` is preferred.


> +			return;
> +
> +		ssam_nf_call(nf, dev, item->rqid, &item->event);
> +		kfree(item);
> +	}
> +
> +	if (!ssam_event_queue_is_empty(queue))
> +		ssam_cplt_submit(queue->cplt, &queue->work);
> +}

> [...]

> +static void ssam_handle_event(struct ssh_rtl *rtl,
> +			      const struct ssh_command *cmd,
> +			      const struct ssam_span *data)
> +{
> +	struct ssam_controller *ctrl = to_ssam_controller(rtl, rtl);
> +	struct ssam_event_item *item;
> +
> +	item = ssam_event_item_alloc(data->len, GFP_KERNEL);
> +	if (!item)
> +		return;
> +
> +	item->rqid = get_unaligned_le16(&cmd->rqid);
> +	item->event.target_category = cmd->tc;
> +	item->event.target_id = cmd->tid_in;
> +	item->event.command_id = cmd->cid;
> +	item->event.instance_id = cmd->iid;
> +	memcpy(&item->event.data[0], data->ptr, data->len);
> +
> +	WARN_ON(ssam_cplt_submit_event(&ctrl->cplt, item));

I believe that if submission fails, `item` is leaked.


> +}
> +
> +static const struct ssh_rtl_ops ssam_rtl_ops = {
> +	.handle_event = ssam_handle_event,
> +};
> +
> +
> +static bool ssam_notifier_empty(struct ssam_controller *ctrl);

I think it'd be better to name it `ssam_notifier_is_empty()` to be consistent
with the rest of the patch.


> [...]

> +static int ssam_controller_caps_load_from_acpi(
> +		acpi_handle handle, struct ssam_controller_caps *caps)
> +{
> +	u32 d3_closes_handle = false;

Assinging a boolean like this to a `u32` looks very odd to me.


> +	u64 funcs;
> +	int status;
> +
> +	// set defaults
> +	caps->ssh_power_profile = (u32)-1;
> +	caps->screen_on_sleep_idle_timeout = (u32)-1;
> +	caps->screen_off_sleep_idle_timeout = (u32)-1;
> +	caps->d3_closes_handle = false;
> +	caps->ssh_buffer_size = (u32)-1;

> [...]

> +
> +/**
> + * ssam_controller_start() - Start the receiver and transmitter threads of the
> + * controller.
> + * @ctrl: The controller.
> + *
> + * Note: When this function is called, the controller shouldbe properly hooked
                                                               ^
space


> + * up to the serdev core via &struct serdev_device_ops. Please refert to
                                                                       ^
"refer"


> [...]

> +void ssam_controller_shutdown(struct ssam_controller *ctrl)
> +{
> +	enum ssam_controller_state s = ctrl->state;
> +	int status;
> +
> +	if (s == SSAM_CONTROLLER_UNINITIALIZED || s == SSAM_CONTROLLER_STOPPED)
> +		return;
> +
> +	// try to flush pending events and requests while everything still works
> +	status = ssh_rtl_flush(&ctrl->rtl, msecs_to_jiffies(5000));

Wouldn't it be better to name that 5000?


> +	if (status) {
> +		ssam_err(ctrl, "failed to flush request transport layer: %d\n",
> +			 status);
> +	}

> [...]

> +
> +/**
> + * ssam_controller_destroy() - Destroy the controller and free its resources.
> + * @ctrl: The controller.
> + *
> + * Ensures that all resources associated with the controller get freed. This
> + * function should only be called after the controller has been stopped via
> + * ssam_controller_shutdown(). In general, this function should not be called
> + * directly. The only valid place to call this function direclty is during
                                                                ^
"directly"


> + * initialization, before the controller has been fully initialized and passed
> + * to other processes. This function is called automatically when the
> + * reference count of the controller reaches zero.
> + *
> + * Must be called from an exclusive context with regards to the controller
> + * state.
> + */

> [...]

> +int ssam_request_sync_alloc(size_t payload_len, gfp_t flags,
> +			    struct ssam_request_sync **rqst,
> +			    struct ssam_span *buffer)
> +{
> +	size_t msglen = SSH_COMMAND_MESSAGE_LENGTH(payload_len);
> +
> +	*rqst = kzalloc(sizeof(**rqst) + msglen, flags);
> +	if (!*rqst)
> +		return -ENOMEM;
> +
> +	buffer->ptr = (u8 *)(*rqst + 1);
> +	buffer->len = msglen;
> +
> +	return 0;
> +}

I think there is a bit of incosistency: sometimes you use ** pointer + return int,
sometimes you return a pointer with potentially embedded errno. I think it would
be better if you stuck with one or the other.


> [...]

> +static int ssam_ssh_event_disable(struct ssam_controller *ctrl,
> +				  struct ssam_event_registry reg,
> +				  struct ssam_event_id id, u8 flags)
> +{
> [...]
> +	rqst.command_id = reg.cid_disable;

If I see it correctly, this line is the only significant difference between this one
and the previous function. Is there any reason they're not combined?


> [...]

> +/**
> + * ssam_notifier_disable_registered() - Disable events for all registered
> + * notifiers.
> + * @ctrl: The controller for which to disable the notifiers/events.
> + *
> + * Disables events for all currently registered notifiers. In case of an error
> + * (EC command failing), all previously disabled events will be restored and
> + * the error code returned.
> + *
> + * This function is intended to disable all events prior to hibenration entry.
                                                                   ^
"hibernation"


> [...]

> +void ssam_irq_disarm_wakeup(struct ssam_controller *ctrl)
> +{
> +	int status;
> +
> +	if (ctrl->irq.wakeup_enabled) {
> +		status = disable_irq_wake(ctrl->irq.num);
> +		if (status)
> +			ssam_err(ctrl, "failed to disable wake IRQ: %d\n", status);
> +
> +		ctrl->irq.wakeup_enabled = false;

It's set to false even if `disable_irq_wake()` fails?


> +	}
> +	disable_irq(ctrl->irq.num);
> +}

> [...]

> +static int ssam_try_set_controller(struct ssam_controller *ctrl)
> +{
> +	int status = 0;
> +
> +	spin_lock(&__ssam_controller_lock);
> +	if (!__ssam_controller)
> +		__ssam_controller = ctrl;
> +	else
> +		status = -EBUSY;

I feel like EBUSY might not be the best errno here.


> +	spin_unlock(&__ssam_controller_lock);
> +
> +	return status;
> +}

> [...]

> +int ssam_client_link(struct ssam_controller *c, struct device *client)
> +{
> +	const u32 flags = DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_CONSUMER;
> +	struct device_link *link;
> +	struct device *ctrldev;
> +
> +	ssam_controller_statelock(c);
> +
> +	if (c->state != SSAM_CONTROLLER_STARTED) {
> +		ssam_controller_stateunlock(c);
> +		return -ENXIO;
> +	}
> +
> +	ctrldev = ssam_controller_device(c);
> +	if (!ctrldev) {
> +		ssam_controller_stateunlock(c);
> +		return -ENXIO;
> +	}
> +

I'm not sure if ENXIO is the best errno in the last two returns;


> +	link = device_link_add(client, ctrldev, flags);
> +	if (!link) {
> +		ssam_controller_stateunlock(c);
> +		return -ENOMEM;
> +	}
> +
> +	/*
> +	 * Return -ENXIO if supplier driver is on its way to be removed. In this
> +	 * case, the controller won't be around for much longer and the device
> +	 * link is not going to save us any more, as unbinding is already in
> +	 * progress.
> +	 */
> +	if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) {
> +		ssam_controller_stateunlock(c);
> +		return -ENXIO;
> +	}
> +
> +	ssam_controller_stateunlock(c);
> +	return 0;
> +}

> [...]

> +int ssam_client_bind(struct device *client, struct ssam_controller **ctrl)
> +{
> +	struct ssam_controller *c;
> +	int status;
> +
> +	c = ssam_get_controller();
> +	if (!c)
> +		return -ENXIO;

To me, ENODEV seems like a better fit here.


> +
> +	status = ssam_client_link(c, client);

> [...]


> +static ssize_t firmware_version_show(struct device *dev,
> +				     struct device_attribute *attr, char *buf)
> +{
> +	struct ssam_controller *ctrl = dev_get_drvdata(dev);
> +	u32 version, a, b, c;
> +	int status;
> +
> +	status = ssam_get_firmware_version(ctrl, &version);
> +	if (status < 0)
> +		return status;
> +
> +	a = (version >> 24) & 0xff;
> +	b = ((version >> 8) & 0xffff);
> +	c = version & 0xff;
> +
> +	return snprintf(buf, PAGE_SIZE - 1, "%u.%u.%u\n", a, b, c);

`snprintf()` takes care of the null byte, so simply `PAGE_SIZE` would've been
sufficient. But that doesn't matter much since you should use `sysfs_emit()`.


> +}
> +static DEVICE_ATTR_RO(firmware_version);
> +
> +static struct attribute *ssam_sam_attrs[] = {
> +	&dev_attr_firmware_version.attr,
> +	NULL,
            ^
I believe it is preferred to omit the comma after the terminating entry.


> [...]

> +/**
> + * msgb_push_cmd() - Push a SSH command frame with payload to the buffer.
> + * @msgb: The message buffer.
> + * @seq:  The sequence ID (SEQ) of the frame/packet.
> + * @rqid: The request ID (RQID) of the request contained in the frame.
> + * @rqst: The request to wrap in the frame.
> + */
> +static inline void msgb_push_cmd(struct msgbuf *msgb, u8 seq, u16 rqid,
> +				 const struct ssam_request *rqst)
> +{
> +	struct ssh_command *cmd;
> +	const u8 *cmd_begin;
> +	const u8 type = SSH_FRAME_TYPE_DATA_SEQ;
> +
> +	// SYN
> +	msgb_push_syn(msgb);
> +
> +	// command frame + crc
> +	msgb_push_frame(msgb, type, sizeof(*cmd) + rqst->length, seq);
> +
> +	// frame payload: command struct + payload
> +	if (WARN_ON(msgb->ptr + sizeof(*cmd) > msgb->end))
> +		return;
> +
> +	cmd_begin = msgb->ptr;
> +	cmd = (struct ssh_command *)msgb->ptr;

I believe this violates strict aliasing.


> [...]

> + * Note that the packet completion callback is, in case of success and for a
> + * sequenced packet, guaranteed to run on the receiver thread, thus providing
> + * a way to reliably identify responses to the packet. The packet completion
> + * callback is only run once and it does not indicate that the packet has
> + * fully left the system (for this, one should rely on the release method,
> + * triggered when the reference count of the packet reaches zero). In case of
> + * re-submission (and with somewhat unlikely timing), it may be possible that
> + * the packet is being re-transmitted while the completion callback runs.
> + * Completion will occur both on success and internal error, as well as when
> + * the packet is canceled.

If I understand it correctly, it is possible that submission of a packet fails
for the first time, but it's scheduled for resubmission, and this retransmission
happens at the same time when the complete() callback is called. If that's the
case, then the callback is called with an error condition, no? Thus it is possible
that a packet is successfully submitted (for the second, third, etc. time), but the
complete() callback receives notification about failure? Or am I missing something?


> [...]

> +int ssh_ptl_rx_rcvbuf(struct ssh_ptl *ptl, const u8 *buf, size_t n)
> +{
> +	int used;
> +
> +	if (test_bit(SSH_PTL_SF_SHUTDOWN_BIT, &ptl->state))
> +		return -ESHUTDOWN;
> +
> +	used = kfifo_in(&ptl->rx.fifo, buf, n);

Isn't it possible that `n` is greater than the free space in the fifo? What
happens then?


> +	if (used)
> +		ssh_ptl_rx_wakeup(ptl);
> +
> +	return used;
> +}

> [...]

> +void ssh_ptl_shutdown(struct ssh_ptl *ptl)
> +{
> [...]
> +	 * Note 2: We can re-use queue_node (or pending_node) if we mark the
> +	 * packet as locked an then remove it from the queue (or pending set
> +	 * respecitvely). Marking the packet as locked avoids re-queueing
> +	 * (which should already be prevented by having stopped the treads...)
> +	 * and not setting QUEUED_BIT (or PENDING_BIT) prevents removal from a
> +	 * new list via other threads (e.g. canellation).
                                               ^
"cancellation"


> +	 *
> +	 * Note 3: There may be overlap between complete_p and complete_q.
> +	 * This is handled via test_and_set_bit() on the "completed" flag
> +	 * (also handles cancellation).
> +	 */

> [...]

> +#define __ssam_prcond(func, p, fmt, ...)		\
> +	do {						\
> +		if ((p))				\

I believe `if (p)` is sufficient.


> +			func((p), fmt, ##__VA_ARGS__);	\
> +	} while (0)

> [...]

> +int sshp_parse_frame(const struct device *dev, const struct ssam_span *source,
> +		     struct ssh_frame **frame, struct ssam_span *payload,
> +		     size_t maxlen)
> +{
> [...]
> +	// ensure packet does not exceed maximum length
> +	sp.len = get_unaligned_le16(&((struct ssh_frame *)sf.ptr)->len);
> +	if (unlikely(sp.len + SSH_MESSAGE_LENGTH(0) > maxlen)) {
> +		dev_warn(dev, "rx: parser: frame too large: %u bytes\n",

I believe `%hu` would be more appropriate.


> +			 ((struct ssh_frame *)sf.ptr)->len);

Why isn't `get_unaligned_le16()` used here? (Or simply even `sp.len`.)


> +		return -EMSGSIZE;
> +	}
> [...]
> +	*frame = (struct ssh_frame *)sf.ptr;

This also violates strict aliasing.


> +	*payload = sp;
> +
> +	dev_dbg(dev, "rx: parser: valid frame found (type: 0x%02x, len: %u)\n",
> +		(*frame)->type, (*frame)->len);
> +
> +	return 0;
> +}

> [...]

> +int sshp_parse_command(const struct device *dev, const struct ssam_span *source,
> +		       struct ssh_command **command,
> +		       struct ssam_span *command_data)
> +{
> +	// check for minimum length
> +	if (unlikely(source->len < sizeof(struct ssh_command))) {
> +		*command = NULL;
> +		command_data->ptr = NULL;
> +		command_data->len = 0;
> +
> +		dev_err(dev, "rx: parser: command payload is too short\n");
> +		return -ENOMSG;
> +	}
> +
> +	*command = (struct ssh_command *)source->ptr;

I'm quite sure this also violates strict aliasing.


> +	command_data->ptr = source->ptr + sizeof(struct ssh_command);
> +	command_data->len = source->len - sizeof(struct ssh_command);
> +
> +	dev_dbg(dev, "rx: parser: valid command found (tc: 0x%02x, cid: 0x%02x)\n",
> +		(*command)->tc, (*command)->cid);
> +
> +	return 0;
> +}

> [...]

> +#define SSH_MSGOFFSET_FRAME(field) \
> +	(sizeof(u16) + offsetof(struct ssh_frame, field))
> +
> +/**
> + * SSH_MSGOFFSET_COMMAND() - Compute offset in SSH message to specified field
> + * in command.
> + * @field: The field for which the offset should be computed.
> + *
> + * Return: Returns the offset of the specified &struct ssh_command field in
> + * the raw SSH message data.
> + */
> +#define SSH_MSGOFFSET_COMMAND(field) \
> +	(2ull * sizeof(u16) + sizeof(struct ssh_frame) \
> +		+ offsetof(struct ssh_command, field))

I believe it should be noted (here and for `SSH_MSGOFFSET_FRAME()`) why the
`sizeof(u16)`s are necessary.


> +
> +/*
> + * SSH_MSG_SYN - SSH message synchronization (SYN) bytes as u16.
> + */
> +#define SSH_MSG_SYN		((u16)0x55aa)

> [...]

> +enum ssam_ssh_tc {
> +	/* Known SSH/EC target categories. */
> +				// category 0x00 is invalid for EC use
> +	SSAM_SSH_TC_SAM = 0x01,	// generic system functionality, real-time clock
> +	SSAM_SSH_TC_BAT = 0x02,	// battery/power subsystem
> +	SSAM_SSH_TC_TMP = 0x03,	// thermal subsystem
> +	SSAM_SSH_TC_PMC = 0x04,
> +	SSAM_SSH_TC_FAN = 0x05,
> +	SSAM_SSH_TC_PoM = 0x06,
> +	SSAM_SSH_TC_DBG = 0x07,
> +	SSAM_SSH_TC_KBD = 0x08,	// legacy keyboard (Laptop 1/2)
> +	SSAM_SSH_TC_FWU = 0x09,
> +	SSAM_SSH_TC_UNI = 0x0a,
> +	SSAM_SSH_TC_LPC = 0x0b,
> +	SSAM_SSH_TC_TCL = 0x0c,
> +	SSAM_SSH_TC_SFL = 0x0d,
> +	SSAM_SSH_TC_KIP = 0x0e,
> +	SSAM_SSH_TC_EXT = 0x0f,
> +	SSAM_SSH_TC_BLD = 0x10,
> +	SSAM_SSH_TC_BAS = 0x11,	// detachment system (Surface Book 2/3)
> +	SSAM_SSH_TC_SEN = 0x12,
> +	SSAM_SSH_TC_SRQ = 0x13,
> +	SSAM_SSH_TC_MCU = 0x14,
> +	SSAM_SSH_TC_HID = 0x15,	// generic HID input subsystem
> +	SSAM_SSH_TC_TCH = 0x16,
> +	SSAM_SSH_TC_BKL = 0x17,
> +	SSAM_SSH_TC_TAM = 0x18,
> +	SSAM_SSH_TC_ACC = 0x19,
> +	SSAM_SSH_TC_UFI = 0x1a,
> +	SSAM_SSH_TC_USC = 0x1b,
> +	SSAM_SSH_TC_PEN = 0x1c,
> +	SSAM_SSH_TC_VID = 0x1d,
> +	SSAM_SSH_TC_AUD = 0x1e,
> +	SSAM_SSH_TC_SMC = 0x1f,
> +	SSAM_SSH_TC_KPD = 0x20,
> +	SSAM_SSH_TC_REG = 0x21,
> +};

Is it known what these abbreviations stand for? Maybe I missed them?


> [...]

> +/**
> + * struct ssh_request_ops - Callback operations for a SSH request.
> + * @release:  Function called when the request's reference count reaches zero.
> + *            This callback must be relied upon to ensure that the request has
> + *            left the transport systems (both, packet an request systems).
> + * @complete: Function called when the request is completed, either with
> + *            success or failure. The command data for the request response
> + *            is provided via the &struct ssh_command parameter (``cmd``),
> + *            the command payload of the request response via the &struct
> + *            ssh_span parameter (``data``).
> + *
> + *            If the request does not have any response or has not been
> + *            completed with success, both ``cmd`` and ``data`` parameters will
> + *            be NULL. If the request response does not have any command
> + *            payload, the ``data`` span will be an empty (zero-length) span.
> + *
> + *            In case of failure, the reason for the failure is indicated by
> + *            the value of the provided status code argument (``status``). This
> + *            value will be zero in case of success.

I believe it should be noted if the `status` argument is a regular errno,
or something different.


> + *
> + *            Note that a call to this callback does not guarantee that the
> + *            request is not in use by the transport systems any more.
> + */
> +struct ssh_request_ops {
> +	void (*release)(struct ssh_request *rqst);
> +	void (*complete)(struct ssh_request *rqst,
> +			 const struct ssh_command *cmd,
> +			 const struct ssam_span *data, int status);
> +};
> [...]


I have to agree, this is quite a sizable patch, although I think it's well-commented,
which helps reading the code by a lot, however, in some places I feel like it's a bit
over-engineered (or maybe I just cannot fully appreciate the subject at hand at the moment),
nonetheless, I applaud your efforts, I can only imagine the hours that must have gone into it.


Regards,
Barnabás Pőcze
Maximilian Luz Nov. 18, 2020, 11:06 p.m. UTC | #5
On 11/18/20 1:28 AM, Barnabás Pőcze wrote:
> Hi
> 
> I have attached some thoughts and comments inline.

Thanks!

> 2020. november 15., vasárnap 20:21 keltezéssel, Maximilian Luz írta:
> 
>> [...]
>> +/* -- Event notifier/callbacks. --------------------------------------------- */
>> +/*
>> + * The notifier system is based on linux/notifier.h, specifically the SRCU
>> + * implementation. The difference to that is, that some bits of the notifier
>> + * call return value can be tracked across multiple calls. This is done so that
>> + * handling of events can be tracked and a warning can be issued in case an
>> + * event goes unhandled. The idea of that waring is that it should help discover
>                                                  ^
> "warning"

Thanks, I'll run a spell-checker over the comments and (hopefully) fix all typos.

>> + * and identify new/currently unimplemented features.
>> + */
>> +
>> +
>> +/**
>> + * ssam_event_matches_notifier() - Test if an event matches a notifier;
>                                                                           ^
> Shouldn't it be a period?

Yes, it should.

>> + * @notif: The event notifier to test against.
>> + * @event: The event to test.
>> + *
>> + * Return: Returns %true iff the given event matches the given notifier
>> + * according to the rules set in the notifier's event mask, %false otherwise.
>> + */
> 
>> [...]
> 
>> +static int __ssam_nfblk_remove(struct ssam_nf_head *nh,
>> +			       struct ssam_notifier_block *nb)
>> +{
>> +	struct ssam_notifier_block **link;
>> +
>> +	link = __ssam_nfblk_find_link(nh, nb);
>> +	if (!link)
>> +		return -ENOENT;
> 
> I find it odd that here you return ENOENT, but in `__ssam_nfblk_insert()`
> EINVAL is returned instead of EEXIST. I believe either both should be EINVAL,
> or EEXIST+ENOENT.

Yes, EEXIST is better for insert.

>> +
>> +	__ssam_nfblk_erase(link);
> 
> I'm wondering if it's necessary to create a new function which contains just
> a single line.

I prefer to create these sorts of abstractions as this, in my opinion,
keeps things contained and makes things easier to reason about. In this
case, it also allows for documentation.

>> +	return 0;
>> +}
> 
>> [...]
> 
>> +/**
>> + * ssam_nf_head_destroy() - Deinitialize the given notifier head.
>> + * @nh: The notifier head to deinitialize.
>> + */
>> +static void ssam_nf_head_destroy(struct ssam_nf_head *nh)
>> +{
>> +	cleanup_srcu_struct(&nh->srcu);
>> +}
> 
> I'm also wondering if there's any reason why these static one-liner functions are
> not explicitly marked inline.

Isn't inline more of a linkage keyword at this point? I fully expect any
compiler to inline things like this automatically at the first hint of
an optimization flag.

>> [...]
> 
>> +/**
>> + * struct ssam_nf_refcount_entry - RB-tree entry for referecnce counting event
>> + * activations.
>> + * @node:     The node of this entry in the rb-tree.
>> + * @key:      The key of the event.
>> + * @refcount: The reference-count of the event.
>> + * @flags:    The flags used when enabling the event.
>> + */
>> +struct ssam_nf_refcount_entry {
>> +	struct rb_node node;
>> +	struct ssam_nf_refcount_key key;
>> +	int refcount;
> 
> Is there any reason why a signed type is used for reference counting?

Can help to spot overflows when debugging, not that that should happen.
We also don't need values larger than INT_MAX. We shouldn't even get
remotely close to it.

>> +	u8 flags;
>> +};
>> +
>> +
>> +/**
>> + * ssam_nf_refcount_inc() - Increment reference-/activation-count of the given
>> + * event.
>> + * @nf:  The notifier system reference.
>> + * @reg: The registry used to enable/disable the event.
>> + * @id:  The event ID.
>> + *
>> + * Increments the reference-/activation-count associated with the specified
>> + * event type/ID, allocating a new entry for this event ID if necessary. A
>> + * newly allocated entry will have a refcount of one.
> 
> Shouldn't it be noted that nf->lock(?) must(?) be held when calling?

All functions taking 'struct ssam_nf *' as first parameter must run
synchronized with regards to each other, except for ssam_nf_call().
Same for all access to struct ssam_nf directly. Whether that is
guaranteed by nf->lock or other means (de-/initialization) doesn't
really matter. Only higher-level functions may run concurrently.

On the higher level, the lock must also be held to ensure that the EC
requests are sent at the right time (which is its main purpose).

Arguably that's not well documented here, I'll try to fix that.

[...]

>> + * callback is left or a callback returned a value with the %SSAM_NOTIF_STOP
>> + * bit set. Note that this bit is set automatically when converting non.zero
>                                                                            ^
> maybe "non-zero"?

Right.

>> + * error values via ssam_notifier_from_errno() to notifier values.
>> + *
>> + * Also note that any callback that could handle an event should return a value
>> + * with bit %SSAM_NOTIF_HANDLED set, indicating that the event does not go
>> + * unhandled/ignored. In case no registered callback could handle an event,
>> + * this function will emit a warning.
>> + *
>> + * In case a callback failed, this function will emit an error message.
>> + */
>> +static void ssam_nf_call(struct ssam_nf *nf, struct device *dev, u16 rqid,
>> +			 struct ssam_event *event)
>> +{
>> +	struct ssam_nf_head *nf_head;
>> +	int status, nf_ret;
>> +
>> +	if (!ssh_rqid_is_event(rqid)) {
>> +		dev_warn(dev, "event: unsupported rqid: 0x%04x\n", rqid);
> 
> A small note, "%#04x" would insert the "0x" prefix.

Oh neat, didn't know that. Should be "%#06x" though, right?

>> +		return;
>> +	}
> 
>> [...]
> 
>> +/**
>> + * ssam_event_item_alloc() - Allocate an event item with the given payload size.
>> + * @len:   The event payload length.
>> + * @flags: The flags used for allocation.
>> + *
>> + * Allocate an event item with the given payload size. Sets the item
>> + * operations and payload length values. The item free callback (``ops.free``)
>> + * should not be overwritten after this call.
>> + *
>> + * Return: Returns the newly allocated event item.
>> + */
>> +static struct ssam_event_item *ssam_event_item_alloc(size_t len, gfp_t flags)
> 
> The `flags` argument is seemingly ignored.

Oh, right.

>> +{
>> +	struct ssam_event_item *item;
>> +
>> +	item = kzalloc(sizeof(*item) + len, GFP_KERNEL);
> 
> I believe `struct_size(item, event.data, len)` could be utilized here.

Right, I'll use that.

>> +	if (!item)
>> +		return NULL;
>> +
>> +	item->event.length = len;
>> +	return item;
>> +}
> 
>> [...]
> 
>> +static void ssam_event_queue_work_fn(struct work_struct *work)
>> +{
>> +	struct ssam_event_queue *queue;
>> +	struct ssam_event_item *item;
>> +	struct ssam_nf *nf;
>> +	struct device *dev;
>> +	int i;
>> +
>> +	queue = container_of(work, struct ssam_event_queue, work);
>> +	nf = &queue->cplt->event.notif;
>> +	dev = queue->cplt->dev;
>> +
>> +	// limit number of processed events to avoid livelocking
>> +	for (i = 0; i < 10; i++) {
>> +		item = ssam_event_queue_pop(queue);
>> +		if (item == NULL)
> 
> I believe `!item` is preferred.

Right.

>> +			return;
>> +
>> +		ssam_nf_call(nf, dev, item->rqid, &item->event);
>> +		kfree(item);
>> +	}
>> +
>> +	if (!ssam_event_queue_is_empty(queue))
>> +		ssam_cplt_submit(queue->cplt, &queue->work);
>> +}
> 
>> [...]
> 
>> +static void ssam_handle_event(struct ssh_rtl *rtl,
>> +			      const struct ssh_command *cmd,
>> +			      const struct ssam_span *data)
>> +{
>> +	struct ssam_controller *ctrl = to_ssam_controller(rtl, rtl);
>> +	struct ssam_event_item *item;
>> +
>> +	item = ssam_event_item_alloc(data->len, GFP_KERNEL);
>> +	if (!item)
>> +		return;
>> +
>> +	item->rqid = get_unaligned_le16(&cmd->rqid);
>> +	item->event.target_category = cmd->tc;
>> +	item->event.target_id = cmd->tid_in;
>> +	item->event.command_id = cmd->cid;
>> +	item->event.instance_id = cmd->iid;
>> +	memcpy(&item->event.data[0], data->ptr, data->len);
>> +
>> +	WARN_ON(ssam_cplt_submit_event(&ctrl->cplt, item));
> 
> I believe that if submission fails, `item` is leaked.

Oh, that is correct. Thanks for spotting that!

>> +}
>> +
>> +static const struct ssh_rtl_ops ssam_rtl_ops = {
>> +	.handle_event = ssam_handle_event,
>> +};
>> +
>> +
>> +static bool ssam_notifier_empty(struct ssam_controller *ctrl);
> 
> I think it'd be better to name it `ssam_notifier_is_empty()` to be consistent
> with the rest of the patch.

Okay, I'll change that.

>> [...]
> 
>> +static int ssam_controller_caps_load_from_acpi(
>> +		acpi_handle handle, struct ssam_controller_caps *caps)
>> +{
>> +	u32 d3_closes_handle = false;
> 
> Assinging a boolean like this to a `u32` looks very odd to me.

The value returned by the corresponding DSM call is an integer, but
essentially only contains a bool value (zero vs. one). I thought using
false here explicitly for initialization helps clarify that. That also
makes it consistent with the "caps->d3_closes_handle = false" line
below.

>> +	u64 funcs;
>> +	int status;
>> +
>> +	// set defaults
>> +	caps->ssh_power_profile = (u32)-1;
>> +	caps->screen_on_sleep_idle_timeout = (u32)-1;
>> +	caps->screen_off_sleep_idle_timeout = (u32)-1;
>> +	caps->d3_closes_handle = false;
>> +	caps->ssh_buffer_size = (u32)-1;

[...]

>> +void ssam_controller_shutdown(struct ssam_controller *ctrl)
>> +{
>> +	enum ssam_controller_state s = ctrl->state;
>> +	int status;
>> +
>> +	if (s == SSAM_CONTROLLER_UNINITIALIZED || s == SSAM_CONTROLLER_STOPPED)
>> +		return;
>> +
>> +	// try to flush pending events and requests while everything still works
>> +	status = ssh_rtl_flush(&ctrl->rtl, msecs_to_jiffies(5000));
> 
> Wouldn't it be better to name that 5000?

I'll do that.

>> +	if (status) {
>> +		ssam_err(ctrl, "failed to flush request transport layer: %d\n",
>> +			 status);
>> +	}

[...]

>> +int ssam_request_sync_alloc(size_t payload_len, gfp_t flags,
>> +			    struct ssam_request_sync **rqst,
>> +			    struct ssam_span *buffer)
>> +{
>> +	size_t msglen = SSH_COMMAND_MESSAGE_LENGTH(payload_len);
>> +
>> +	*rqst = kzalloc(sizeof(**rqst) + msglen, flags);
>> +	if (!*rqst)
>> +		return -ENOMEM;
>> +
>> +	buffer->ptr = (u8 *)(*rqst + 1);
>> +	buffer->len = msglen;
>> +
>> +	return 0;
>> +}
> 
> I think there is a bit of incosistency: sometimes you use ** pointer + return int,
> sometimes you return a pointer with potentially embedded errno. I think it would
> be better if you stuck with one or the other.

My rule of thumb is: If it's one output parameter, use a pointer as
return value. If it's two or more output parameters (as in this case,
"buffer" is written to), use two pointer arguments and an int as return.

I noticed that ssam_client_bind doesn't follow that scheme so I'll
change that.

>> [...]
> 
>> +static int ssam_ssh_event_disable(struct ssam_controller *ctrl,
>> +				  struct ssam_event_registry reg,
>> +				  struct ssam_event_id id, u8 flags)
>> +{
>> [...]
>> +	rqst.command_id = reg.cid_disable;
> 
> If I see it correctly, this line is the only significant difference between this one
> and the previous function. Is there any reason they're not combined?

Debug and error messages are also different. I'll try to put the common
code into one function though.

[...]

>> +void ssam_irq_disarm_wakeup(struct ssam_controller *ctrl)
>> +{
>> +	int status;
>> +
>> +	if (ctrl->irq.wakeup_enabled) {
>> +		status = disable_irq_wake(ctrl->irq.num);
>> +		if (status)
>> +			ssam_err(ctrl, "failed to disable wake IRQ: %d\n", status);
>> +
>> +		ctrl->irq.wakeup_enabled = false;
> 
> It's set to false even if `disable_irq_wake()` fails?

Yes, on purpose. The irq.wakeup_enabled flag is there to ensure that
disable_irq_wake() gets called only if there has been a successful
enable_irq_wake() call.

This seems to be a common pattern, see e.g. /drivers/mmc/host/sdhci.c

>> +	}
>> +	disable_irq(ctrl->irq.num);
>> +}
> 
>> [...]
> 
>> +static int ssam_try_set_controller(struct ssam_controller *ctrl)
>> +{
>> +	int status = 0;
>> +
>> +	spin_lock(&__ssam_controller_lock);
>> +	if (!__ssam_controller)
>> +		__ssam_controller = ctrl;
>> +	else
>> +		status = -EBUSY;
> 
> I feel like EBUSY might not be the best errno here.

Okay, I'll replace it with EEXIST.

>> +	spin_unlock(&__ssam_controller_lock);
>> +
>> +	return status;
>> +}
> 
>> [...]
> 
>> +int ssam_client_link(struct ssam_controller *c, struct device *client)
>> +{
>> +	const u32 flags = DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_CONSUMER;
>> +	struct device_link *link;
>> +	struct device *ctrldev;
>> +
>> +	ssam_controller_statelock(c);
>> +
>> +	if (c->state != SSAM_CONTROLLER_STARTED) {
>> +		ssam_controller_stateunlock(c);
>> +		return -ENXIO;
>> +	}
>> +
>> +	ctrldev = ssam_controller_device(c);
>> +	if (!ctrldev) {
>> +		ssam_controller_stateunlock(c);
>> +		return -ENXIO;
>> +	}
>> +
> 
> I'm not sure if ENXIO is the best errno in the last two returns;

I guess I can return ENODEV.

>> +	link = device_link_add(client, ctrldev, flags);
>> +	if (!link) {
>> +		ssam_controller_stateunlock(c);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	/*
>> +	 * Return -ENXIO if supplier driver is on its way to be removed. In this
>> +	 * case, the controller won't be around for much longer and the device
>> +	 * link is not going to save us any more, as unbinding is already in
>> +	 * progress.
>> +	 */
>> +	if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) {
>> +		ssam_controller_stateunlock(c);
>> +		return -ENXIO;
>> +	}
>> +
>> +	ssam_controller_stateunlock(c);
>> +	return 0;
>> +}
> 
>> [...]
> 
>> +int ssam_client_bind(struct device *client, struct ssam_controller **ctrl)
>> +{
>> +	struct ssam_controller *c;
>> +	int status;
>> +
>> +	c = ssam_get_controller();
>> +	if (!c)
>> +		return -ENXIO;
> 
> To me, ENODEV seems like a better fit here.

Okay, I'll change it as well as above.

>> +
>> +	status = ssam_client_link(c, client);
> 
>> [...]
> 
> 
>> +static ssize_t firmware_version_show(struct device *dev,
>> +				     struct device_attribute *attr, char *buf)
>> +{
>> +	struct ssam_controller *ctrl = dev_get_drvdata(dev);
>> +	u32 version, a, b, c;
>> +	int status;
>> +
>> +	status = ssam_get_firmware_version(ctrl, &version);
>> +	if (status < 0)
>> +		return status;
>> +
>> +	a = (version >> 24) & 0xff;
>> +	b = ((version >> 8) & 0xffff);
>> +	c = version & 0xff;
>> +
>> +	return snprintf(buf, PAGE_SIZE - 1, "%u.%u.%u\n", a, b, c);
> 
> `snprintf()` takes care of the null byte, so simply `PAGE_SIZE` would've been
> sufficient. But that doesn't matter much since you should use `sysfs_emit()`.

I didn't know about sysfs_emit(), I'll switch to that then. Also you're
right about that "- 1", not sure where I got that from.

>> +}
>> +static DEVICE_ATTR_RO(firmware_version);
>> +
>> +static struct attribute *ssam_sam_attrs[] = {
>> +	&dev_attr_firmware_version.attr,
>> +	NULL,
>              ^
> I believe it is preferred to omit the comma after the terminating entry.

Right.

>> [...]
> 
>> +/**
>> + * msgb_push_cmd() - Push a SSH command frame with payload to the buffer.
>> + * @msgb: The message buffer.
>> + * @seq:  The sequence ID (SEQ) of the frame/packet.
>> + * @rqid: The request ID (RQID) of the request contained in the frame.
>> + * @rqst: The request to wrap in the frame.
>> + */
>> +static inline void msgb_push_cmd(struct msgbuf *msgb, u8 seq, u16 rqid,
>> +				 const struct ssam_request *rqst)
>> +{
>> +	struct ssh_command *cmd;
>> +	const u8 *cmd_begin;
>> +	const u8 type = SSH_FRAME_TYPE_DATA_SEQ;
>> +
>> +	// SYN
>> +	msgb_push_syn(msgb);
>> +
>> +	// command frame + crc
>> +	msgb_push_frame(msgb, type, sizeof(*cmd) + rqst->length, seq);
>> +
>> +	// frame payload: command struct + payload
>> +	if (WARN_ON(msgb->ptr + sizeof(*cmd) > msgb->end))
>> +		return;
>> +
>> +	cmd_begin = msgb->ptr;
>> +	cmd = (struct ssh_command *)msgb->ptr;
> 
> I believe this violates strict aliasing.

It does. The same happens in msgb_push_frame(), too. The kernel is
built with -fno-strict-aliasing though if I remember correctly.

I guess I can try to avoid that in those two functions with
put_unaligned and offsetof to compute the pointers.

>> [...]
> 
>> + * Note that the packet completion callback is, in case of success and for a
>> + * sequenced packet, guaranteed to run on the receiver thread, thus providing
>> + * a way to reliably identify responses to the packet. The packet completion
>> + * callback is only run once and it does not indicate that the packet has
>> + * fully left the system (for this, one should rely on the release method,
>> + * triggered when the reference count of the packet reaches zero). In case of
>> + * re-submission (and with somewhat unlikely timing), it may be possible that
>> + * the packet is being re-transmitted while the completion callback runs.
>> + * Completion will occur both on success and internal error, as well as when
>> + * the packet is canceled.
> 
> If I understand it correctly, it is possible that submission of a packet fails
> for the first time, but it's scheduled for resubmission, and this retransmission
> happens at the same time when the complete() callback is called. If that's the
> case, then the callback is called with an error condition, no? Thus it is possible
> that a packet is successfully submitted (for the second, third, etc. time), but the
> complete() callback receives notification about failure? Or am I missing something?

Not sure I can follow you here. What do you mean by "fails for the first time"?

A couple of notes before I try to explain the error cases:

  - Resubmission only happens on timeout or NAK.

  - Only sequenced packets can be resubmitted (unsequenced are not added
    to the pending set, thus do not have a packet timeout and do not
    react to NAKs).

  - Completion means the packet gets locked, which prevents it from being
    resubmitted and retransmitted, and is removed from all collections.
  
  - Completion cannot be triggered by NAK. In case a NAK is received and
    all tries have been exceeded, the packet still waits for the timeout
    (or ACK, whatever happens first) one last time.

The error cases:

  - serdev: Failure to send via serdev results in immediate completion
    with error on the transmitter thread. Due to a NAK (or a _very_
    unlikely timeout), the packet can be queued for a second time at that
    point, but, as completion runs on the transmitter thread in this
    case, the packet is removed from the queue before the thread goes on
    to get the next packet.

  - Timeout: On resubmission, the timeout of a packet is disabled. It is
    (re-)armed directly before starting transmission on the transmitter
    thread.
    
    What _could_ now happen is that, for some reason, the transmitter
    thread waits the full timeout period between arming the timeout and
    actually sending it (which I'd consider highly unlikely). Note that
    "actually sending it" includes whatever asynchronous stuff the serdev
    core does after serdev_device_write_buf() returns.

    In that case, the packet can be transmitted _after_ it has received a
    timeout. So assuming this is the last try, then the packet can be
    sent after completion with -ETIMEDOUT.

    I believe this is what you meant?

    Again, I consider this highly unlikely as this would require either
    the scheduler to pause our thread for a full timeout period or
    require the serdev core to basically do nothing for a full timeout
    period. I'd argue this is a lot less likely than a dropped ACK packet
    from the EC (which results in the same thing: EC receives data but we
    think it didn't, causing a timeout error).

    Note that putting the timeout (re-)arming step after the last call to
    serdev_device_write_buf() doesn't really help either, as that only
    writes to the transport buffer. If we would want to guarantee that
    this doesn't happen, we'd have to explicitly wait for the hardware to
    finish transmitting, which essentially breaks the whole point of
    buffering.

    Also I prefer to have the timeout cover the transmission part as
    well. In the very least, this simplifies reasoning about it once I've
    restricted it to be (re-)set under the pending lock only (as
    mentioned in the message to Andy).

  - Finally, there is completion by cancellation or shutdown, for which
    the packet can have been (or currently be) transmitted successfully
    but completion returns with -ECANCELLED or -ESHUTDOWN, respectively.
    This is the expected behavior in those cases. Again, the completion
    procedure takes care of locking the packet and removing all
    references from the collections.

Also there's one success case, when we receive an ACK after a timeout
and during the re-transmission. Then the packet is completed with
success, but might still be transmitted once more. The EC should detect
that packet as re-submission via its packet ID and ignore it.

>> [...]
> 
>> +int ssh_ptl_rx_rcvbuf(struct ssh_ptl *ptl, const u8 *buf, size_t n)
>> +{
>> +	int used;
>> +
>> +	if (test_bit(SSH_PTL_SF_SHUTDOWN_BIT, &ptl->state))
>> +		return -ESHUTDOWN;
>> +
>> +	used = kfifo_in(&ptl->rx.fifo, buf, n);
> 
> Isn't it possible that `n` is greater than the free space in the fifo? What
> happens then?

If I've read the documentation of kfifo_in() right, it takes at most n
elements. If n is greater than the free space, it will only take as much
elements as it has space. The return value of kfifo_in() is the number
of elements actually consumed. We simply return that here (as elements =
bytes) and ultimately let the serdev/tty core take care of handling
that.

>> +	if (used)
>> +		ssh_ptl_rx_wakeup(ptl);
>> +
>> +	return used;
>> +}

[...]

>> +#define __ssam_prcond(func, p, fmt, ...)		\
>> +	do {						\
>> +		if ((p))				\
> 
> I believe `if (p)` is sufficient.

Right.

>> +			func((p), fmt, ##__VA_ARGS__);	\
>> +	} while (0)
> 
>> [...]
> 
>> +int sshp_parse_frame(const struct device *dev, const struct ssam_span *source,
>> +		     struct ssh_frame **frame, struct ssam_span *payload,
>> +		     size_t maxlen)
>> +{
>> [...]
>> +	// ensure packet does not exceed maximum length
>> +	sp.len = get_unaligned_le16(&((struct ssh_frame *)sf.ptr)->len);
>> +	if (unlikely(sp.len + SSH_MESSAGE_LENGTH(0) > maxlen)) {
>> +		dev_warn(dev, "rx: parser: frame too large: %u bytes\n",
> 
> I believe `%hu` would be more appropriate.

Right.

>> +			 ((struct ssh_frame *)sf.ptr)->len);
> 
> Why isn't `get_unaligned_le16()` used here? (Or simply even `sp.len`.)

This should actually be "sp.len + SSH_MESSAGE_LENGTH(0)", and also
format spezifier %zu with that as SSH_MESSAGE_LENGTH returns size_t.

>> +		return -EMSGSIZE;
>> +	}
>> [...]
>> +	*frame = (struct ssh_frame *)sf.ptr;
> 
> This also violates strict aliasing.

Sure, the whole point of this function (and ssh_parse_command below) is
to get pointers to the data structures in the given buffer, i.e. create
aliases of the right types to the right places in the buffer. Not sure
how I'd do that differently, apart from copying.

I think that this would even work properly (in practice) without setting
-fno-strict-aliasing, as all data at this point is effectively
read-only. Furthermore, after the call access is restricted to the
non-aliasing parts respectively (frame and payload), whereas the
underlying buffer they do alias isn't accessed for the rest of that
scope (i.e. until the other references don't exist any more).

Anyway, as far as I know the kernel is built with -fno-strict-aliasing.
Please correct me if I'm wrong or relying on aliasing is frowned upon.

>> +	*payload = sp;
>> +
>> +	dev_dbg(dev, "rx: parser: valid frame found (type: 0x%02x, len: %u)\n",
>> +		(*frame)->type, (*frame)->len);
>> +
>> +	return 0;
>> +}
> 
>> [...]
> 
>> +int sshp_parse_command(const struct device *dev, const struct ssam_span *source,
>> +		       struct ssh_command **command,
>> +		       struct ssam_span *command_data)
>> +{
>> +	// check for minimum length
>> +	if (unlikely(source->len < sizeof(struct ssh_command))) {
>> +		*command = NULL;
>> +		command_data->ptr = NULL;
>> +		command_data->len = 0;
>> +
>> +		dev_err(dev, "rx: parser: command payload is too short\n");
>> +		return -ENOMSG;
>> +	}
>> +
>> +	*command = (struct ssh_command *)source->ptr;
> 
> I'm quite sure this also violates strict aliasing.

It does, see above.

>> +	command_data->ptr = source->ptr + sizeof(struct ssh_command);
>> +	command_data->len = source->len - sizeof(struct ssh_command);
>> +
>> +	dev_dbg(dev, "rx: parser: valid command found (tc: 0x%02x, cid: 0x%02x)\n",
>> +		(*command)->tc, (*command)->cid);
>> +
>> +	return 0;
>> +}
> 
>> [...]
> 
>> +#define SSH_MSGOFFSET_FRAME(field) \
>> +	(sizeof(u16) + offsetof(struct ssh_frame, field))
>> +
>> +/**
>> + * SSH_MSGOFFSET_COMMAND() - Compute offset in SSH message to specified field
>> + * in command.
>> + * @field: The field for which the offset should be computed.
>> + *
>> + * Return: Returns the offset of the specified &struct ssh_command field in
>> + * the raw SSH message data.
>> + */
>> +#define SSH_MSGOFFSET_COMMAND(field) \
>> +	(2ull * sizeof(u16) + sizeof(struct ssh_frame) \
>> +		+ offsetof(struct ssh_command, field))
> 
> I believe it should be noted (here and for `SSH_MSGOFFSET_FRAME()`) why the
> `sizeof(u16)`s are necessary.

Right, I can add an explanation for this. One is for the SYN bytes, the
other one for the frame CRC (between frame and command).

>> +
>> +/*
>> + * SSH_MSG_SYN - SSH message synchronization (SYN) bytes as u16.
>> + */
>> +#define SSH_MSG_SYN		((u16)0x55aa)
> 
>> [...]
> 
>> +enum ssam_ssh_tc {
>> +	/* Known SSH/EC target categories. */
>> +				// category 0x00 is invalid for EC use
>> +	SSAM_SSH_TC_SAM = 0x01,	// generic system functionality, real-time clock
>> +	SSAM_SSH_TC_BAT = 0x02,	// battery/power subsystem
>> +	SSAM_SSH_TC_TMP = 0x03,	// thermal subsystem
>> +	SSAM_SSH_TC_PMC = 0x04,
>> +	SSAM_SSH_TC_FAN = 0x05,
>> +	SSAM_SSH_TC_PoM = 0x06,
>> +	SSAM_SSH_TC_DBG = 0x07,
>> +	SSAM_SSH_TC_KBD = 0x08,	// legacy keyboard (Laptop 1/2)
>> +	SSAM_SSH_TC_FWU = 0x09,
>> +	SSAM_SSH_TC_UNI = 0x0a,
>> +	SSAM_SSH_TC_LPC = 0x0b,
>> +	SSAM_SSH_TC_TCL = 0x0c,
>> +	SSAM_SSH_TC_SFL = 0x0d,
>> +	SSAM_SSH_TC_KIP = 0x0e,
>> +	SSAM_SSH_TC_EXT = 0x0f,
>> +	SSAM_SSH_TC_BLD = 0x10,
>> +	SSAM_SSH_TC_BAS = 0x11,	// detachment system (Surface Book 2/3)
>> +	SSAM_SSH_TC_SEN = 0x12,
>> +	SSAM_SSH_TC_SRQ = 0x13,
>> +	SSAM_SSH_TC_MCU = 0x14,
>> +	SSAM_SSH_TC_HID = 0x15,	// generic HID input subsystem
>> +	SSAM_SSH_TC_TCH = 0x16,
>> +	SSAM_SSH_TC_BKL = 0x17,
>> +	SSAM_SSH_TC_TAM = 0x18,
>> +	SSAM_SSH_TC_ACC = 0x19,
>> +	SSAM_SSH_TC_UFI = 0x1a,
>> +	SSAM_SSH_TC_USC = 0x1b,
>> +	SSAM_SSH_TC_PEN = 0x1c,
>> +	SSAM_SSH_TC_VID = 0x1d,
>> +	SSAM_SSH_TC_AUD = 0x1e,
>> +	SSAM_SSH_TC_SMC = 0x1f,
>> +	SSAM_SSH_TC_KPD = 0x20,
>> +	SSAM_SSH_TC_REG = 0x21,
>> +};
> 
> Is it known what these abbreviations stand for? Maybe I missed them?

The comments state all we really know about these (through observation
and experimentation). The table itself has been extracted from the
Windows driver, but the abbreviations and values are all we're getting
from it.

I could go ahead and guess for a couple of them but for most of them we
haven't really seen any request at all, let alone know what the request
does.

For those that we now, the abbreviations should be fairly easy to guess
based on the comments, e.g. BAT -> battery, TMP -> temperature. I'm not
sure about BAS, which could mean base (as that's the detachment system
allowing to separate the top part of the Surface Books from the base).

REG is currently missing a comment because I'm not entirely sure if it's
just for registring (enabling/disabling) events or if it also has any
other purpose. I'll add a comment for it though, then we have comments
on all that are somewhat known.

>> [...]
> 
>> +/**
>> + * struct ssh_request_ops - Callback operations for a SSH request.
>> + * @release:  Function called when the request's reference count reaches zero.
>> + *            This callback must be relied upon to ensure that the request has
>> + *            left the transport systems (both, packet an request systems).
>> + * @complete: Function called when the request is completed, either with
>> + *            success or failure. The command data for the request response
>> + *            is provided via the &struct ssh_command parameter (``cmd``),
>> + *            the command payload of the request response via the &struct
>> + *            ssh_span parameter (``data``).
>> + *
>> + *            If the request does not have any response or has not been
>> + *            completed with success, both ``cmd`` and ``data`` parameters will
>> + *            be NULL. If the request response does not have any command
>> + *            payload, the ``data`` span will be an empty (zero-length) span.
>> + *
>> + *            In case of failure, the reason for the failure is indicated by
>> + *            the value of the provided status code argument (``status``). This
>> + *            value will be zero in case of success.
> 
> I believe it should be noted if the `status` argument is a regular errno,
> or something different.

Right, I'll add that.

>> + *
>> + *            Note that a call to this callback does not guarantee that the
>> + *            request is not in use by the transport systems any more.
>> + */
>> +struct ssh_request_ops {
>> +	void (*release)(struct ssh_request *rqst);
>> +	void (*complete)(struct ssh_request *rqst,
>> +			 const struct ssh_command *cmd,
>> +			 const struct ssam_span *data, int status);
>> +};
>> [...]
> 
> 
> I have to agree, this is quite a sizable patch, although I think it's well-commented,
> which helps reading the code by a lot, however, in some places I feel like it's a bit
> over-engineered (or maybe I just cannot fully appreciate the subject at hand at the moment),
> nonetheless, I applaud your efforts, I can only imagine the hours that must have gone into it.

I think much of what's probably considered over-engineered is the result
of reworking the core after that had some reliability issues (dropped
packets and timeouts). Specifically, the introduction of dedicated
transmitter and receiver threads combined with the need to properly
handle NAKs has lead to the current asynchronous system.

I have probably gone a bit too far towards generifying that, but I
belive that this should make it easier to adapt for protocol additions
or errors in the future.

Other parts, like the RB-tree for event reference counting seem,
unfortunately, to be necessary. Events are enabled by registry
(specifying which command to use to enable/disable the event) and ID
(which event to enable/disable on the registry). So without knowing
which registry and ID pairs are valid, we'll have to use some sort of
map. Could have been a simple list, but I think an RB-tree makes a bit
more sense here.

Thank you for your review!

Regards,
Max
Barnabás Pőcze Nov. 19, 2020, 3:54 p.m. UTC | #6
Hi


2020. november 19., csütörtök 0:06 keltezéssel, Maximilian Luz <luzmaximilian@gmail.com> írta:

> [...]

> >> +/**
> >> + * ssam_nf_head_destroy() - Deinitialize the given notifier head.
> >> + * @nh: The notifier head to deinitialize.
> >> + */
> >> +static void ssam_nf_head_destroy(struct ssam_nf_head *nh)
> >> +{
> >> +	cleanup_srcu_struct(&nh->srcu);
> >> +}
> >
> > I'm also wondering if there's any reason why these static one-liner functions are
> > not explicitly marked inline.
>
> Isn't inline more of a linkage keyword at this point? I fully expect any
> compiler to inline things like this automatically at the first hint of
> an optimization flag.
>

I also expect the compiler to inline such functions even without the `inline`
specifier. In retrospect I'm not sure why I actually made this comment, possibly
because of my preference, but I believe it's fine either way, so my comment is
moot, sorry.


> [...]

> >> + * error values via ssam_notifier_from_errno() to notifier values.
> >> + *
> >> + * Also note that any callback that could handle an event should return a value
> >> + * with bit %SSAM_NOTIF_HANDLED set, indicating that the event does not go
> >> + * unhandled/ignored. In case no registered callback could handle an event,
> >> + * this function will emit a warning.
> >> + *
> >> + * In case a callback failed, this function will emit an error message.
> >> + */
> >> +static void ssam_nf_call(struct ssam_nf *nf, struct device *dev, u16 rqid,
> >> +			 struct ssam_event *event)
> >> +{
> >> +	struct ssam_nf_head *nf_head;
> >> +	int status, nf_ret;
> >> +
> >> +	if (!ssh_rqid_is_event(rqid)) {
> >> +		dev_warn(dev, "event: unsupported rqid: 0x%04x\n", rqid);
> >
> > A small note, "%#04x" would insert the "0x" prefix.
>
> Oh neat, didn't know that. Should be "%#06x" though, right?
>

Yes, indeed, sorry, it should be "%#06x".


> [...]

> >> +static int ssam_controller_caps_load_from_acpi(
> >> +		acpi_handle handle, struct ssam_controller_caps *caps)
> >> +{
> >> +	u32 d3_closes_handle = false;
> >
> > Assinging a boolean like this to a `u32` looks very odd to me.
>
> The value returned by the corresponding DSM call is an integer, but
> essentially only contains a bool value (zero vs. one). I thought using
> false here explicitly for initialization helps clarify that. That also
> makes it consistent with the "caps->d3_closes_handle = false" line
> below.
>

Although I still find it pretty odd, your explanation makes sense to me.


> >> +int ssam_request_sync_alloc(size_t payload_len, gfp_t flags,
> >> +			    struct ssam_request_sync **rqst,
> >> +			    struct ssam_span *buffer)
> >> +{
> >> +	size_t msglen = SSH_COMMAND_MESSAGE_LENGTH(payload_len);
> >> +
> >> +	*rqst = kzalloc(sizeof(**rqst) + msglen, flags);
> >> +	if (!*rqst)
> >> +		return -ENOMEM;
> >> +
> >> +	buffer->ptr = (u8 *)(*rqst + 1);
> >> +	buffer->len = msglen;
> >> +
> >> +	return 0;
> >> +}
> >
> > I think there is a bit of incosistency: sometimes you use ** pointer + return int,
> > sometimes you return a pointer with potentially embedded errno. I think it would
> > be better if you stuck with one or the other.
>
> My rule of thumb is: If it's one output parameter, use a pointer as
> return value. If it's two or more output parameters (as in this case,
> "buffer" is written to), use two pointer arguments and an int as return.
>
> I noticed that ssam_client_bind doesn't follow that scheme so I'll
> change that.
>

I see, thanks for the explanation, what prompted me to make this comment
was that, as you noted, there were functions with one output parameter that
were not consistent.


> [...]

> >> + * Note that the packet completion callback is, in case of success and for a
> >> + * sequenced packet, guaranteed to run on the receiver thread, thus providing
> >> + * a way to reliably identify responses to the packet. The packet completion
> >> + * callback is only run once and it does not indicate that the packet has
> >> + * fully left the system (for this, one should rely on the release method,
> >> + * triggered when the reference count of the packet reaches zero). In case of
> >> + * re-submission (and with somewhat unlikely timing), it may be possible that
> >> + * the packet is being re-transmitted while the completion callback runs.
> >> + * Completion will occur both on success and internal error, as well as when
> >> + * the packet is canceled.
> >
> > If I understand it correctly, it is possible that submission of a packet fails
> > for the first time, but it's scheduled for resubmission, and this retransmission
> > happens at the same time when the complete() callback is called. If that's the
> > case, then the callback is called with an error condition, no? Thus it is possible
> > that a packet is successfully submitted (for the second, third, etc. time), but the
> > complete() callback receives notification about failure? Or am I missing something?
>
> Not sure I can follow you here. What do you mean by "fails for the first time"?
>
> A couple of notes before I try to explain the error cases:
>
>   - Resubmission only happens on timeout or NAK.
>
>   - Only sequenced packets can be resubmitted (unsequenced are not added
>     to the pending set, thus do not have a packet timeout and do not
>     react to NAKs).
>
>   - Completion means the packet gets locked, which prevents it from being
>     resubmitted and retransmitted, and is removed from all collections.
>
>   - Completion cannot be triggered by NAK. In case a NAK is received and
>     all tries have been exceeded, the packet still waits for the timeout
>     (or ACK, whatever happens first) one last time.
>
> The error cases:
>
>   - serdev: Failure to send via serdev results in immediate completion
>     with error on the transmitter thread. Due to a NAK (or a _very_
>     unlikely timeout), the packet can be queued for a second time at that
>     point, but, as completion runs on the transmitter thread in this
>     case, the packet is removed from the queue before the thread goes on
>     to get the next packet.
>
>   - Timeout: On resubmission, the timeout of a packet is disabled. It is
>     (re-)armed directly before starting transmission on the transmitter
>     thread.
>
>     What _could_ now happen is that, for some reason, the transmitter
>     thread waits the full timeout period between arming the timeout and
>     actually sending it (which I'd consider highly unlikely). Note that
>     "actually sending it" includes whatever asynchronous stuff the serdev
>     core does after serdev_device_write_buf() returns.
>
>     In that case, the packet can be transmitted _after_ it has received a
>     timeout. So assuming this is the last try, then the packet can be
>     sent after completion with -ETIMEDOUT.
>
>     I believe this is what you meant?
>

Yes, that's what I meant, thanks for the clarification and broader explanation
about the things at play here.


>     Again, I consider this highly unlikely as this would require either
>     the scheduler to pause our thread for a full timeout period or
>     require the serdev core to basically do nothing for a full timeout
>     period. I'd argue this is a lot less likely than a dropped ACK packet
>     from the EC (which results in the same thing: EC receives data but we
>     think it didn't, causing a timeout error).
>
>     Note that putting the timeout (re-)arming step after the last call to
>     serdev_device_write_buf() doesn't really help either, as that only
>     writes to the transport buffer. If we would want to guarantee that
>     this doesn't happen, we'd have to explicitly wait for the hardware to
>     finish transmitting, which essentially breaks the whole point of
>     buffering.
>
>     Also I prefer to have the timeout cover the transmission part as
>     well. In the very least, this simplifies reasoning about it once I've
>     restricted it to be (re-)set under the pending lock only (as
>     mentioned in the message to Andy).
>
>   - Finally, there is completion by cancellation or shutdown, for which
>     the packet can have been (or currently be) transmitted successfully
>     but completion returns with -ECANCELLED or -ESHUTDOWN, respectively.
>     This is the expected behavior in those cases. Again, the completion
>     procedure takes care of locking the packet and removing all
>     references from the collections.
>
> Also there's one success case, when we receive an ACK after a timeout
> and during the re-transmission. Then the packet is completed with
> success, but might still be transmitted once more. The EC should detect
> that packet as re-submission via its packet ID and ignore it.
>
> >> [...]
> >
> >> +int ssh_ptl_rx_rcvbuf(struct ssh_ptl *ptl, const u8 *buf, size_t n)
> >> +{
> >> +	int used;
> >> +
> >> +	if (test_bit(SSH_PTL_SF_SHUTDOWN_BIT, &ptl->state))
> >> +		return -ESHUTDOWN;
> >> +
> >> +	used = kfifo_in(&ptl->rx.fifo, buf, n);
> >
> > Isn't it possible that `n` is greater than the free space in the fifo? What
> > happens then?
>
> If I've read the documentation of kfifo_in() right, it takes at most n
> elements. If n is greater than the free space, it will only take as much
> elements as it has space. The return value of kfifo_in() is the number
> of elements actually consumed. We simply return that here (as elements =
> bytes) and ultimately let the serdev/tty core take care of handling
> that.
>

Ahh, sorry, you're right.



> >> +	if (used)
> >> +		ssh_ptl_rx_wakeup(ptl);
> >> +
> >> +	return used;
> >> +}


> [...]

> >> +			 ((struct ssh_frame *)sf.ptr)->len);
> >
> > Why isn't `get_unaligned_le16()` used here? (Or simply even `sp.len`.)
>
> This should actually be "sp.len + SSH_MESSAGE_LENGTH(0)", and also
> format spezifier %zu with that as SSH_MESSAGE_LENGTH returns size_t.
>

Yes, indeed, sorry, I missed the `+ SSH_MESSAGE_LENGTH(0)` part.


> >> +		return -EMSGSIZE;
> >> +	}
> >> [...]
> >> +	*frame = (struct ssh_frame *)sf.ptr;
> >
> > This also violates strict aliasing.
>
> Sure, the whole point of this function (and ssh_parse_command below) is
> to get pointers to the data structures in the given buffer, i.e. create
> aliases of the right types to the right places in the buffer. Not sure
> how I'd do that differently, apart from copying.
>
> I think that this would even work properly (in practice) without setting
> -fno-strict-aliasing, as all data at this point is effectively
> read-only. Furthermore, after the call access is restricted to the
> non-aliasing parts respectively (frame and payload), whereas the
> underlying buffer they do alias isn't accessed for the rest of that
> scope (i.e. until the other references don't exist any more).
>
> Anyway, as far as I know the kernel is built with -fno-strict-aliasing.
> Please correct me if I'm wrong or relying on aliasing is frowned upon.
>

I don't believe it could cause issue here, I just wanted to note it.


> [...]

> >> +enum ssam_ssh_tc {
> >> +	/* Known SSH/EC target categories. */
> >> +				// category 0x00 is invalid for EC use
> >> +	SSAM_SSH_TC_SAM = 0x01,	// generic system functionality, real-time clock
> >> +	SSAM_SSH_TC_BAT = 0x02,	// battery/power subsystem
> >> +	SSAM_SSH_TC_TMP = 0x03,	// thermal subsystem
> >> +	SSAM_SSH_TC_PMC = 0x04,
> >> +	SSAM_SSH_TC_FAN = 0x05,
> >> +	SSAM_SSH_TC_PoM = 0x06,
> >> +	SSAM_SSH_TC_DBG = 0x07,
> >> +	SSAM_SSH_TC_KBD = 0x08,	// legacy keyboard (Laptop 1/2)
> >> +	SSAM_SSH_TC_FWU = 0x09,
> >> +	SSAM_SSH_TC_UNI = 0x0a,
> >> +	SSAM_SSH_TC_LPC = 0x0b,
> >> +	SSAM_SSH_TC_TCL = 0x0c,
> >> +	SSAM_SSH_TC_SFL = 0x0d,
> >> +	SSAM_SSH_TC_KIP = 0x0e,
> >> +	SSAM_SSH_TC_EXT = 0x0f,
> >> +	SSAM_SSH_TC_BLD = 0x10,
> >> +	SSAM_SSH_TC_BAS = 0x11,	// detachment system (Surface Book 2/3)
> >> +	SSAM_SSH_TC_SEN = 0x12,
> >> +	SSAM_SSH_TC_SRQ = 0x13,
> >> +	SSAM_SSH_TC_MCU = 0x14,
> >> +	SSAM_SSH_TC_HID = 0x15,	// generic HID input subsystem
> >> +	SSAM_SSH_TC_TCH = 0x16,
> >> +	SSAM_SSH_TC_BKL = 0x17,
> >> +	SSAM_SSH_TC_TAM = 0x18,
> >> +	SSAM_SSH_TC_ACC = 0x19,
> >> +	SSAM_SSH_TC_UFI = 0x1a,
> >> +	SSAM_SSH_TC_USC = 0x1b,
> >> +	SSAM_SSH_TC_PEN = 0x1c,
> >> +	SSAM_SSH_TC_VID = 0x1d,
> >> +	SSAM_SSH_TC_AUD = 0x1e,
> >> +	SSAM_SSH_TC_SMC = 0x1f,
> >> +	SSAM_SSH_TC_KPD = 0x20,
> >> +	SSAM_SSH_TC_REG = 0x21,
> >> +};
> >
> > Is it known what these abbreviations stand for? Maybe I missed them?
>
> The comments state all we really know about these (through observation
> and experimentation). The table itself has been extracted from the
> Windows driver, but the abbreviations and values are all we're getting
> from it.

I see, thanks for the clarification. For some reason, I believed the
"Known SSH/EC target categories" comment means that those are known, as in,
it is known what they are for, etc., not just the abbreviation.


> [...]


Regards,
Barnabás Pőcze
Maximilian Luz Nov. 19, 2020, 5:35 p.m. UTC | #7
On 11/19/20 4:54 PM, Barnabás Pőcze wrote:
> Hi

[...]

>>>> +enum ssam_ssh_tc {
>>>> +	/* Known SSH/EC target categories. */
>>>> +				// category 0x00 is invalid for EC use
>>>> +	SSAM_SSH_TC_SAM = 0x01,	// generic system functionality, real-time clock
>>>> +	SSAM_SSH_TC_BAT = 0x02,	// battery/power subsystem
>>>> +	SSAM_SSH_TC_TMP = 0x03,	// thermal subsystem
>>>> +	SSAM_SSH_TC_PMC = 0x04,
>>>> +	SSAM_SSH_TC_FAN = 0x05,
>>>> +	SSAM_SSH_TC_PoM = 0x06,
>>>> +	SSAM_SSH_TC_DBG = 0x07,
>>>> +	SSAM_SSH_TC_KBD = 0x08,	// legacy keyboard (Laptop 1/2)
>>>> +	SSAM_SSH_TC_FWU = 0x09,
>>>> +	SSAM_SSH_TC_UNI = 0x0a,
>>>> +	SSAM_SSH_TC_LPC = 0x0b,
>>>> +	SSAM_SSH_TC_TCL = 0x0c,
>>>> +	SSAM_SSH_TC_SFL = 0x0d,
>>>> +	SSAM_SSH_TC_KIP = 0x0e,
>>>> +	SSAM_SSH_TC_EXT = 0x0f,
>>>> +	SSAM_SSH_TC_BLD = 0x10,
>>>> +	SSAM_SSH_TC_BAS = 0x11,	// detachment system (Surface Book 2/3)
>>>> +	SSAM_SSH_TC_SEN = 0x12,
>>>> +	SSAM_SSH_TC_SRQ = 0x13,
>>>> +	SSAM_SSH_TC_MCU = 0x14,
>>>> +	SSAM_SSH_TC_HID = 0x15,	// generic HID input subsystem
>>>> +	SSAM_SSH_TC_TCH = 0x16,
>>>> +	SSAM_SSH_TC_BKL = 0x17,
>>>> +	SSAM_SSH_TC_TAM = 0x18,
>>>> +	SSAM_SSH_TC_ACC = 0x19,
>>>> +	SSAM_SSH_TC_UFI = 0x1a,
>>>> +	SSAM_SSH_TC_USC = 0x1b,
>>>> +	SSAM_SSH_TC_PEN = 0x1c,
>>>> +	SSAM_SSH_TC_VID = 0x1d,
>>>> +	SSAM_SSH_TC_AUD = 0x1e,
>>>> +	SSAM_SSH_TC_SMC = 0x1f,
>>>> +	SSAM_SSH_TC_KPD = 0x20,
>>>> +	SSAM_SSH_TC_REG = 0x21,
>>>> +};
>>>
>>> Is it known what these abbreviations stand for? Maybe I missed them?
>>
>> The comments state all we really know about these (through observation
>> and experimentation). The table itself has been extracted from the
>> Windows driver, but the abbreviations and values are all we're getting
>> from it.
> 
> I see, thanks for the clarification. For some reason, I believed the
> "Known SSH/EC target categories" comment means that those are known, as in,
> it is known what they are for, etc., not just the abbreviation.

Right, that can be misunderstood, sorry. It's probably best if I add a
comment explaining that in a bit more detail and noting where those
values and abbreviations come from.

Thanks,
Max
Hans de Goede Nov. 24, 2020, 11:59 a.m. UTC | #8
Hi,

On 11/15/20 8:21 PM, Maximilian Luz wrote:
> Hello,
> 
>   N.B.: the following text is mostly a repeat of cover letter from the
>   previous RFC for the uninitiated, which can be found at
> 
>   https://lore.kernel.org/linux-serial/20200923151511.3842150-1-luzmaximilian@gmail.com/
> 
>   See "Changes" below for an overview of differences between the RFC and
>   this patchset. I hope I have addressed all comments from that in this
>   version, thank you again for those.
> 
> The Surface System Aggregator Module (we'll refer to it as Surface
> Aggregator or SAM below) is an embedded controller (EC) found on various
> Microsoft Surface devices. Specifically, all 4th and later generation
> Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
> exception of the Surface Go series and the Surface Duo. Notably, it
> seems like this EC can also be found on the ARM-based Surface Pro X [1].

<snip>

> This patch-set can also be found at the following repository and
> reference, if you prefer to look at a kernel tree instead of these
> emails:
> 
>   https://github.com/linux-surface/kernel tags/s/surface-aggregator/v1
> 
> Thanks,
> Max

Thank you for your work on this. It would be great if we can get better
support for the Surface line in the mainline kernel.

Since a lot of people have already commented on this series I think that
you have enough feedback to do a v2 addressing that feedback right? 

For now I'm going to assume that you will do a v2 addressing the
initial round of comments and not review this myself (IOW I'll review
this when v2 is posted).

Let me know if you see things differently.

Regards,

Hans





> [1]: The Surface Pro X is, however, currently considered unsupported due
>      to a lack of test candidates and, as it seems, general lack of
>      Linux support on other parts. AFAIK there is an issue preventing
>      serial devices from being registered, on which the core driver in
>      this series is build on, thus there is no way to even test that at
>      this point. I'd be happy to work out any issues regarding SAM on
>      the Pro X at some point in the future, provided someone can/wants
>      to actually test it.
> 
> [2]: https://github.com/linux-surface/surface-aggregator-module
> [3]: https://github.com/linux-surface/linux-surface
> 
> 
> Note: This patch depends on
> 
>   [PATCH v4] platform/surface: Create a platform subdirectory for
>              Microsoft Surface devices
> 
> which can be found at
> 
>   https://lore.kernel.org/platform-driver-x86/20201009141128.683254-1-luzmaximilian@gmail.com/
> 
> and is currently in platform-drivers-x86/for-next.
> 
> 
> Changes from the previous RFC (overview):
>  - move to platform/surface
>  - add copyright lines
>  - change SPDX identifier to GPL-2.0+ (was GPL-2.0-or-later)
>  - change user-space interface from debugfs to misc-device
>  - address issues in user-space interface
>  - fix typos in commit messages and documentation
>  - fix some bugs, address other issues
> 
> Changes regarding specific patches (and more details) can be found on
> the individual patch.
> 
> 
> Maximilian Luz (9):
>   platform/surface: Add Surface Aggregator subsystem
>   platform/surface: aggregator: Add control packet allocation caching
>   platform/surface: aggregator: Add event item allocation caching
>   platform/surface: aggregator: Add trace points
>   platform/surface: aggregator: Add error injection capabilities
>   platform/surface: aggregator: Add dedicated bus and device type
>   docs: driver-api: Add Surface Aggregator subsystem documentation
>   platform/surface: Add Surface Aggregator user-space interface
>   platform/surface: Add Surface ACPI Notify driver
> 
>  Documentation/driver-api/index.rst            |    1 +
>  .../surface_aggregator/client-api.rst         |   38 +
>  .../driver-api/surface_aggregator/client.rst  |  394 +++
>  .../surface_aggregator/clients/cdev.rst       |   85 +
>  .../surface_aggregator/clients/index.rst      |   21 +
>  .../surface_aggregator/clients/san.rst        |   44 +
>  .../driver-api/surface_aggregator/index.rst   |   21 +
>  .../surface_aggregator/internal-api.rst       |   67 +
>  .../surface_aggregator/internal.rst           |   50 +
>  .../surface_aggregator/overview.rst           |   76 +
>  .../driver-api/surface_aggregator/ssh.rst     |  343 +++
>  MAINTAINERS                                   |   13 +
>  drivers/platform/surface/Kconfig              |   39 +
>  drivers/platform/surface/Makefile             |    3 +
>  drivers/platform/surface/aggregator/Kconfig   |   65 +
>  drivers/platform/surface/aggregator/Makefile  |   17 +
>  drivers/platform/surface/aggregator/bus.c     |  424 +++
>  drivers/platform/surface/aggregator/bus.h     |   27 +
>  .../platform/surface/aggregator/controller.c  | 2557 +++++++++++++++++
>  .../platform/surface/aggregator/controller.h  |  288 ++
>  drivers/platform/surface/aggregator/core.c    |  831 ++++++
>  .../platform/surface/aggregator/ssh_msgb.h    |  201 ++
>  .../surface/aggregator/ssh_packet_layer.c     | 2009 +++++++++++++
>  .../surface/aggregator/ssh_packet_layer.h     |  175 ++
>  .../platform/surface/aggregator/ssh_parser.c  |  229 ++
>  .../platform/surface/aggregator/ssh_parser.h  |  157 +
>  .../surface/aggregator/ssh_request_layer.c    | 1254 ++++++++
>  .../surface/aggregator/ssh_request_layer.h    |  142 +
>  drivers/platform/surface/aggregator/trace.h   |  625 ++++
>  .../platform/surface/surface_acpi_notify.c    |  884 ++++++
>  .../surface/surface_aggregator_cdev.c         |  299 ++
>  include/linux/mod_devicetable.h               |   18 +
>  include/linux/surface_acpi_notify.h           |   39 +
>  include/linux/surface_aggregator/controller.h |  832 ++++++
>  include/linux/surface_aggregator/device.h     |  430 +++
>  include/linux/surface_aggregator/serial_hub.h |  659 +++++
>  include/uapi/linux/surface_aggregator/cdev.h  |   58 +
>  scripts/mod/devicetable-offsets.c             |    8 +
>  scripts/mod/file2alias.c                      |   23 +
>  39 files changed, 13446 insertions(+)
>  create mode 100644 Documentation/driver-api/surface_aggregator/client-api.rst
>  create mode 100644 Documentation/driver-api/surface_aggregator/client.rst
>  create mode 100644 Documentation/driver-api/surface_aggregator/clients/cdev.rst
>  create mode 100644 Documentation/driver-api/surface_aggregator/clients/index.rst
>  create mode 100644 Documentation/driver-api/surface_aggregator/clients/san.rst
>  create mode 100644 Documentation/driver-api/surface_aggregator/index.rst
>  create mode 100644 Documentation/driver-api/surface_aggregator/internal-api.rst
>  create mode 100644 Documentation/driver-api/surface_aggregator/internal.rst
>  create mode 100644 Documentation/driver-api/surface_aggregator/overview.rst
>  create mode 100644 Documentation/driver-api/surface_aggregator/ssh.rst
>  create mode 100644 drivers/platform/surface/aggregator/Kconfig
>  create mode 100644 drivers/platform/surface/aggregator/Makefile
>  create mode 100644 drivers/platform/surface/aggregator/bus.c
>  create mode 100644 drivers/platform/surface/aggregator/bus.h
>  create mode 100644 drivers/platform/surface/aggregator/controller.c
>  create mode 100644 drivers/platform/surface/aggregator/controller.h
>  create mode 100644 drivers/platform/surface/aggregator/core.c
>  create mode 100644 drivers/platform/surface/aggregator/ssh_msgb.h
>  create mode 100644 drivers/platform/surface/aggregator/ssh_packet_layer.c
>  create mode 100644 drivers/platform/surface/aggregator/ssh_packet_layer.h
>  create mode 100644 drivers/platform/surface/aggregator/ssh_parser.c
>  create mode 100644 drivers/platform/surface/aggregator/ssh_parser.h
>  create mode 100644 drivers/platform/surface/aggregator/ssh_request_layer.c
>  create mode 100644 drivers/platform/surface/aggregator/ssh_request_layer.h
>  create mode 100644 drivers/platform/surface/aggregator/trace.h
>  create mode 100644 drivers/platform/surface/surface_acpi_notify.c
>  create mode 100644 drivers/platform/surface/surface_aggregator_cdev.c
>  create mode 100644 include/linux/surface_acpi_notify.h
>  create mode 100644 include/linux/surface_aggregator/controller.h
>  create mode 100644 include/linux/surface_aggregator/device.h
>  create mode 100644 include/linux/surface_aggregator/serial_hub.h
>  create mode 100644 include/uapi/linux/surface_aggregator/cdev.h
>
Maximilian Luz Nov. 24, 2020, 1:43 p.m. UTC | #9
On 11/24/20 12:59 PM, Hans de Goede wrote:
> Hi,
> 
> On 11/15/20 8:21 PM, Maximilian Luz wrote:
>> Hello,
>>
>>    N.B.: the following text is mostly a repeat of cover letter from the
>>    previous RFC for the uninitiated, which can be found at
>>
>>    https://lore.kernel.org/linux-serial/20200923151511.3842150-1-luzmaximilian@gmail.com/
>>
>>    See "Changes" below for an overview of differences between the RFC and
>>    this patchset. I hope I have addressed all comments from that in this
>>    version, thank you again for those.
>>
>> The Surface System Aggregator Module (we'll refer to it as Surface
>> Aggregator or SAM below) is an embedded controller (EC) found on various
>> Microsoft Surface devices. Specifically, all 4th and later generation
>> Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
>> exception of the Surface Go series and the Surface Duo. Notably, it
>> seems like this EC can also be found on the ARM-based Surface Pro X [1].
> 
> <snip>
> 
>> This patch-set can also be found at the following repository and
>> reference, if you prefer to look at a kernel tree instead of these
>> emails:
>>
>>    https://github.com/linux-surface/kernel tags/s/surface-aggregator/v1
>>
>> Thanks,
>> Max
> 
> Thank you for your work on this. It would be great if we can get better
> support for the Surface line in the mainline kernel.
> 
> Since a lot of people have already commented on this series I think that
> you have enough feedback to do a v2 addressing that feedback right?

Yes, I'm already working on it.

> For now I'm going to assume that you will do a v2 addressing the
> initial round of comments and not review this myself (IOW I'll review
> this when v2 is posted).

Sure, no need for you to review v1 at this point.
  
> Let me know if you see things differently.

Thanks,
Max