Message ID | 20201115192143.21571-1-luzmaximilian@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Add support for Microsoft Surface System Aggregator Module | expand |
On Sun, Nov 15, 2020 at 9:25 PM Maximilian Luz <luzmaximilian@gmail.com> wrote: > > Add Surface System Aggregator Module core and Surface Serial Hub driver, > required for the embedded controller found on Microsoft Surface devices. > > The Surface System Aggregator Module (SSAM, SAM or Surface Aggregator) > is an embedded controller (EC) found on 4th and later generation > Microsoft Surface devices, with the exception of the Surface Go series. > This EC provides various functionality, depending on the device in > question. This can include battery status and thermal reporting (5th and > later generations), but also HID keyboard (6th+) and touchpad input > (7th+) on Surface Laptop and Surface Book 3 series devices. > > This patch provides the basic necessities for communication with the SAM > EC on 5th and later generation devices. On these devices, the EC > provides an interface that acts as serial device, called the Surface > Serial Hub (SSH). 4th generation devices, on which the EC interface is > provided via an HID-over-I2C device, are not supported by this patch. > > Specifically, this patch adds a driver for the SSH device (device HID > MSHW0084 in ACPI), as well as a controller structure and associated API. > This represents the functional core of the Surface Aggregator kernel > subsystem, introduced with this patch, and will be expanded upon in > subsequent commits. > > The SSH driver acts as the main attachment point for this subsystem and > sets-up and manages the controller structure. The controller in turn > provides a basic communication interface, allowing to send requests from > host to EC and receiving the corresponding responses, as well as > managing and receiving events, sent from EC to host. It is structured > into multiple layers, with the top layer presenting the API used by > other kernel drivers and the lower layers modeled after the serial > protocol used for communication. > > Said other drivers are then responsible for providing the (Surface model > specific) functionality accessible through the EC (e.g. battery status > reporting, thermal information, ...) via said controller structure and > API, and will be added in future commits. ... > +menuconfig SURFACE_AGGREGATOR > + tristate "Microsoft Surface System Aggregator Module Subsystem and Drivers" > + depends on SERIAL_DEV_BUS > + depends on ACPI > + select CRC_CCITT > + help > + The Surface System Aggregator Module (Surface SAM or SSAM) is an > + embedded controller (EC) found on 5th- and later-generation Microsoft > + Surface devices (i.e. Surface Pro 5, Surface Book 2, Surface Laptop, > + and newer, with exception of Surface Go series devices). > + > + Depending on the device in question, this EC provides varying > + functionality, including: > + - EC access from ACPI via Surface ACPI Notify (5th- and 6th-generation) > + - battery status information (all devices) > + - thermal sensor access (all devices) > + - performance mode / cooling mode control (all devices) > + - clipboard detachment system control (Surface Book 2 and 3) > + - HID / keyboard input (Surface Laptops, Surface Book 3) > + > + This option controls whether the Surface SAM subsystem core will be > + built. This includes a driver for the Surface Serial Hub (SSH), which > + is the device responsible for the communication with the EC, and a > + basic kernel interface exposing the EC functionality to other client > + drivers, i.e. allowing them to make requests to the EC and receive > + events from it. Selecting this option alone will not provide any > + client drivers and therefore no functionality beyond the in-kernel > + interface. Said functionality is the responsibility of the respective > + client drivers. > + > + Note: While 4th-generation Surface devices also make use of a SAM EC, > + due to a difference in the communication interface of the controller, > + only 5th and later generations are currently supported. Specifically, > + devices using SAM-over-SSH are supported, whereas devices using > + SAM-over-HID, which is used on the 4th generation, are currently not > + supported. From this help text I didn't get if it will be a module or what if I chose the above? ... > +/* -- Safe counters. -------------------------------------------------------- */ Why can't XArray be used here? ... > + > + One blank line is enough ... > +static bool ssam_event_matches_notifier( > + const struct ssam_event_notifier *notif, > + const struct ssam_event *event) Perhaps static bool ssam_event_matches_notifier(const struct ssam_event_notifier *n, const struct ssam_event *event) (or even switch to 100 limit, also note notif ->n — no need to repeat same word) ... > + nb = rcu_dereference_raw(nh->head); > + while (nb) { > + nf = container_of(nb, struct ssam_event_notifier, base); > + next_nb = rcu_dereference_raw(nb->next); > + > + if (ssam_event_matches_notifier(nf, event)) { > + ret = (ret & SSAM_NOTIF_STATE_MASK) | nb->fn(nf, event); > + if (ret & SSAM_NOTIF_STOP) > + break; The returned value is a bitmask?! What are you returning at the end? > + } > + > + nb = next_nb; > + } ... > +static int __ssam_nfblk_insert(struct ssam_nf_head *nh, struct ssam_notifier_block *nb) > +{ > + struct ssam_notifier_block **link = &nh->head; > + > + while ((*link) != NULL) { > + if (unlikely((*link) == nb)) { > + WARN(1, "double register detected"); > + return -EINVAL; > + } > + > + if (nb->priority > (*link)->priority) > + break; > + > + link = &((*link)->next); > + } > + > + nb->next = *link; > + rcu_assign_pointer(*link, nb); > + > + return 0; > +} If you need RCU (which is also the Q per se), why not use RCU list? ... > + while ((*link) != NULL) { Redundant parentheses, redundant ' != NULL' part. > + if ((*link) == nb) > + return link; > + > + link = &((*link)->next); > + } ... > + * struct ssam_nf_refcount_entry - RB-tree entry for referecnce counting event In above you mistyped 'which' as 'whic' or so, and here reference. Perhaps go thru spell checker? ... > + * registered, or ``ERR_PTR(-ENOMEM)`` if the entry could not be allocated. Better to spell out "error pointer" instead of cryptic ERR_PTR(). ... > + * Executa registered callbacks in order of their priority until either no > + * callback is left or a callback returned a value with the %SSAM_NOTIF_STOP returns > + * bit set. Note that this bit is set automatically when converting non.zero > + * error values via ssam_notifier_from_errno() to notifier values. ... > + for (i = i - 1; i >= 0; i--) while (i--) > + ssam_nf_head_destroy(&nf->head[i]); ... > + // limit number of processed events to avoid livelocking > + for (i = 0; i < 10; i++) { Magic number! Also, this will be better to read in a form of unsigned int iterations = 10; do { ... } while (--iterations); > + item = ssam_event_queue_pop(queue); > + if (item == NULL) > + return; > + > + ssam_nf_call(nf, dev, item->rqid, &item->event); > + kfree(item); > + } ... > +static const guid_t SSAM_SSH_DSM_GUID = GUID_INIT(0xd5e383e1, 0xd892, 0x4a76, > + 0x89, 0xfc, 0xf6, 0xaa, 0xae, 0x7e, 0xd5, 0xb5); Can you use usual pattern for these UIDs, like static const guid_t SSAM_SSH_DSM_GUID = GUID_INIT(0xd5e383e1, 0xd892, 0x4a76, 0x89, 0xfc, 0xf6, 0xaa, 0xae, 0x7e, 0xd5, 0xb5); ? Also put a comment how this UID will look like in a string representation. ... > + if (!acpi_has_method(handle, "_DSM")) > + return 0; Hmm... What's the expectation? > + obj = acpi_evaluate_dsm_typed(handle, &SSAM_SSH_DSM_GUID, > + SSAM_SSH_DSM_REVISION, 0, NULL, > + ACPI_TYPE_BUFFER); > + if (!obj) > + return -EFAULT; EFAULT?! Perhaps you can simply return 0 here, no? > + for (i = 0; i < obj->buffer.length && i < 8; i++) > + mask |= (((u64)obj->buffer.pointer[i]) << (i * 8)); Don't we have some helpers for this? At least I remember similar code went to one of PDx86 drivers like intel-vbtn or so. > + if (mask & 0x01) BIT(0) ? > + *funcs = mask; ... > + caps->ssh_power_profile = (u32)-1; > + caps->screen_on_sleep_idle_timeout = (u32)-1; > + caps->screen_off_sleep_idle_timeout = (u32)-1; > + caps->d3_closes_handle = false; > + caps->ssh_buffer_size = (u32)-1; Use proper types and their limits (limits.h missed?). ... > + // initialize request and packet transport layers Inconsistent style of comments. ... > + * In the course of this shutdown procedure, all currently registered > + * notifiers will be unregistered. It is, however, strongly recommended to not > + * rely on this behavior, and instead the party registring the notifier should registering > + * unregister it before the controller gets shut down, e.g. via the SSAM bus > + * which guarantees client devices to be removed before a shutdown. > + * Note that events may still be pending after this call, but due to the > + * notifiers being unregistered, the will be dropped when the controller is the?! > + * subsequently being destroyed via ssam_controller_destroy(). ... > + * Ensures that all resources associated with the controller get freed. This > + * function should only be called after the controller has been stopped via > + * ssam_controller_shutdown(). In general, this function should not be called > + * directly. The only valid place to call this function direclty is during directly > + * initialization, before the controller has been fully initialized and passed > + * to other processes. This function is called automatically when the > + * reference count of the controller reaches zero. ... > + * ssam_request_sync_free() - Free a synchronous request. > + * @rqst: The request to free. to be freed? ... > + * Allocates a synchronous request struct on the stack, fully initializes it > + * using the provided buffer as message data buffer, submits it, and then > + * waits for its completion before returning its staus. The status > + * SSH_COMMAND_MESSAGE_LENGTH() macro can be used to compute the required > + * message buffer size. ... > + * This is a wrapper for the raw SAM request to enable an event, thus it does > + * not handle referecnce counting for enable/disable of events. If an event > + * has already been enabled, the EC will ignore this request. Grammar and English language style somehow feels not okay. ... > + u8 buf[1] = { 0x00 }; Can't be simply buf ? ... > + * This function will only send the display-off notification command if > + * display noticications are supported by the EC. Currently all known devices > + * support these notification. Spell check! ... > + * This function will only send the display-on notification command if display > + * noticications are supported by the EC. Currently all known devices support > + * these notification. Ditto. ... > + ssam_err(ctrl, "unexpected response from D0-exit notification:" > + " 0x%02x\n", response); Don't split string literals. Had you run a checkpatch? Please do and use strict mode. ... Overall impression of this submission: - it's quite huge as for a single patch - it feels like quite an overengineered solution: READ_ONCE, RCU, list + spin_lock, RB tree of notifiers, my head! - where is the architectural document of all these?
On 11/16/20 2:33 PM, Andy Shevchenko wrote: > On Sun, Nov 15, 2020 at 9:25 PM Maximilian Luz <luzmaximilian@gmail.com> wrote: >> >> Add Surface System Aggregator Module core and Surface Serial Hub driver, >> required for the embedded controller found on Microsoft Surface devices. >> >> The Surface System Aggregator Module (SSAM, SAM or Surface Aggregator) >> is an embedded controller (EC) found on 4th and later generation >> Microsoft Surface devices, with the exception of the Surface Go series. >> This EC provides various functionality, depending on the device in >> question. This can include battery status and thermal reporting (5th and >> later generations), but also HID keyboard (6th+) and touchpad input >> (7th+) on Surface Laptop and Surface Book 3 series devices. >> >> This patch provides the basic necessities for communication with the SAM >> EC on 5th and later generation devices. On these devices, the EC >> provides an interface that acts as serial device, called the Surface >> Serial Hub (SSH). 4th generation devices, on which the EC interface is >> provided via an HID-over-I2C device, are not supported by this patch. >> >> Specifically, this patch adds a driver for the SSH device (device HID >> MSHW0084 in ACPI), as well as a controller structure and associated API. >> This represents the functional core of the Surface Aggregator kernel >> subsystem, introduced with this patch, and will be expanded upon in >> subsequent commits. >> >> The SSH driver acts as the main attachment point for this subsystem and >> sets-up and manages the controller structure. The controller in turn >> provides a basic communication interface, allowing to send requests from >> host to EC and receiving the corresponding responses, as well as >> managing and receiving events, sent from EC to host. It is structured >> into multiple layers, with the top layer presenting the API used by >> other kernel drivers and the lower layers modeled after the serial >> protocol used for communication. >> >> Said other drivers are then responsible for providing the (Surface model >> specific) functionality accessible through the EC (e.g. battery status >> reporting, thermal information, ...) via said controller structure and >> API, and will be added in future commits. > > ... > >> +menuconfig SURFACE_AGGREGATOR >> + tristate "Microsoft Surface System Aggregator Module Subsystem and Drivers" >> + depends on SERIAL_DEV_BUS >> + depends on ACPI >> + select CRC_CCITT >> + help >> + The Surface System Aggregator Module (Surface SAM or SSAM) is an >> + embedded controller (EC) found on 5th- and later-generation Microsoft >> + Surface devices (i.e. Surface Pro 5, Surface Book 2, Surface Laptop, >> + and newer, with exception of Surface Go series devices). >> + >> + Depending on the device in question, this EC provides varying >> + functionality, including: >> + - EC access from ACPI via Surface ACPI Notify (5th- and 6th-generation) >> + - battery status information (all devices) >> + - thermal sensor access (all devices) >> + - performance mode / cooling mode control (all devices) >> + - clipboard detachment system control (Surface Book 2 and 3) >> + - HID / keyboard input (Surface Laptops, Surface Book 3) >> + >> + This option controls whether the Surface SAM subsystem core will be >> + built. This includes a driver for the Surface Serial Hub (SSH), which >> + is the device responsible for the communication with the EC, and a >> + basic kernel interface exposing the EC functionality to other client >> + drivers, i.e. allowing them to make requests to the EC and receive >> + events from it. Selecting this option alone will not provide any >> + client drivers and therefore no functionality beyond the in-kernel >> + interface. Said functionality is the responsibility of the respective >> + client drivers. >> + >> + Note: While 4th-generation Surface devices also make use of a SAM EC, >> + due to a difference in the communication interface of the controller, >> + only 5th and later generations are currently supported. Specifically, >> + devices using SAM-over-SSH are supported, whereas devices using >> + SAM-over-HID, which is used on the 4th generation, are currently not >> + supported. > > From this help text I didn't get if it will be a module or what if I > chose the above? I thought this would be implied by the tristate options. So should I simply add something like Choose m if you want to build the SAM subsystem core and SSH driver as module, y if you want to build it into the kernel and n if you don't want it at all. > ... > >> +/* -- Safe counters. -------------------------------------------------------- */ > > Why can't XArray be used here? These are basically packet and request IDs (that are sent to the EC and received from it). XArray is a type of collection. I'm not sure how this would work. What happens is: For each new packet/request we get a new ID by incrementing the counter (with wrap-around). This ID is used, for example, to match ACK-packets to the corresponding data packet. The EC will also use this to detect re-transmission, e.g. if we send two packets with the same ID in sequence, the EC will detect the second packet as re-transmission of the first one (which is the reason why simple IDA won't work). Furthermore, both the Windows driver and the EC itself (for packets sent by it) use wrapping counters as well, so it seems to be the safest way to do this without possibly triggering re-transmission detection by accident. > ... > >> + >> + > > One blank line is enough > Can I keep two blank lines for the section separator comments? > ... > >> +static bool ssam_event_matches_notifier( >> + const struct ssam_event_notifier *notif, >> + const struct ssam_event *event) > > Perhaps > > static bool > ssam_event_matches_notifier(const struct ssam_event_notifier *n, > const struct ssam_event *event) > > (or even switch to 100 limit, also note notif ->n — no need to repeat same word) Ack. > ... > >> + nb = rcu_dereference_raw(nh->head); >> + while (nb) { >> + nf = container_of(nb, struct ssam_event_notifier, base); >> + next_nb = rcu_dereference_raw(nb->next); >> + >> + if (ssam_event_matches_notifier(nf, event)) { >> + ret = (ret & SSAM_NOTIF_STATE_MASK) | nb->fn(nf, event); >> + if (ret & SSAM_NOTIF_STOP) >> + break; > > The returned value is a bitmask?! In part, yes. I've referred to this as the "notifier status value" in the (function) documentation. This is basically | bit 0 | bit 1 | bit 2 to 31 | +-------------+----------+---------------------+ | handled-bit | stop-bit | error-value or zero | The ssam_notifier_to_errno() function can be used to convert an error value to this notifier status value. This is the same as the return value of the notifier function and is described a bit better in the documentation of struct ssam_notifier_block. > What are you returning at the end? The notifier status value as described above. This can be tested for the SSAM_NOTIF_HANDLED and SSAM_NOTIF_STOP bits. The error value can, as mentioned in the documentation, be extracted via ssam_notifier_to_errno(). All in all, this is based on notifier_call_chain() and SRCU notifiers (linux/notifier.h, kernel/notifier.c) in general, with the key difference that it includes a bit to indicate if the notification has been handled (or detect if it goes unhandled). > >> + } >> + >> + nb = next_nb; >> + } > > ... > >> +static int __ssam_nfblk_insert(struct ssam_nf_head *nh, struct ssam_notifier_block *nb) >> +{ >> + struct ssam_notifier_block **link = &nh->head; >> + >> + while ((*link) != NULL) { >> + if (unlikely((*link) == nb)) { >> + WARN(1, "double register detected"); >> + return -EINVAL; >> + } >> + >> + if (nb->priority > (*link)->priority) >> + break; >> + >> + link = &((*link)->next); >> + } >> + >> + nb->next = *link; >> + rcu_assign_pointer(*link, nb); >> + >> + return 0; >> +} > > If you need RCU (which is also the Q per se), why not use RCU list? This part has been mostly copied and adapted from notifier_chain_register(), which uses the exact same mechanism. I can change it to use RCU list primitives if that's preferred. > ... > >> + while ((*link) != NULL) { > > Redundant parentheses, redundant ' != NULL' part. Ack. > >> + if ((*link) == nb) >> + return link; >> + >> + link = &((*link)->next); >> + } > > ... > >> + * struct ssam_nf_refcount_entry - RB-tree entry for referecnce counting event > > In above you mistyped 'which' as 'whic' or so, and here reference. > Perhaps go thru spell checker? Ack. I'll do that. > > ... > >> + * registered, or ``ERR_PTR(-ENOMEM)`` if the entry could not be allocated. > > Better to spell out "error pointer" instead of cryptic ERR_PTR(). Ack. > > ... > >> + * Executa registered callbacks in order of their priority until either no >> + * callback is left or a callback returned a value with the %SSAM_NOTIF_STOP > > returns Ack. > >> + * bit set. Note that this bit is set automatically when converting non.zero >> + * error values via ssam_notifier_from_errno() to notifier values. > > ... > >> + for (i = i - 1; i >= 0; i--) > > while (i--) Ack. > >> + ssam_nf_head_destroy(&nf->head[i]); > > ... > >> + // limit number of processed events to avoid livelocking >> + for (i = 0; i < 10; i++) { > > Magic number! Also, this will be better to read in a form of > > unsigned int iterations = 10; > > do { > ... > } while (--iterations); Okay. > >> + item = ssam_event_queue_pop(queue); >> + if (item == NULL) >> + return; >> + >> + ssam_nf_call(nf, dev, item->rqid, &item->event); >> + kfree(item); >> + } > > ... > >> +static const guid_t SSAM_SSH_DSM_GUID = GUID_INIT(0xd5e383e1, 0xd892, 0x4a76, >> + 0x89, 0xfc, 0xf6, 0xaa, 0xae, 0x7e, 0xd5, 0xb5); > > Can you use usual pattern for these UIDs, like > static const guid_t SSAM_SSH_DSM_GUID = > GUID_INIT(0xd5e383e1, 0xd892, 0x4a76, > 0x89, 0xfc, 0xf6, 0xaa, 0xae, 0x7e, 0xd5, 0xb5); > ? > > Also put a comment how this UID will look like in a string representation. Will change that. > > ... > >> + if (!acpi_has_method(handle, "_DSM")) >> + return 0; > > Hmm... What's the expectation? Depends on the device. Old devices don't have that _DSM (thus fallback to defaults), newer devices do have it. > >> + obj = acpi_evaluate_dsm_typed(handle, &SSAM_SSH_DSM_GUID, >> + SSAM_SSH_DSM_REVISION, 0, NULL, >> + ACPI_TYPE_BUFFER); >> + if (!obj) >> + return -EFAULT; > > EFAULT?! Perhaps you can simply return 0 here, no? I'd prefer returning an error (_DSM index 0 should always be present and return a buffer), but I agree that EFAULT is probably the wrong one. > >> + for (i = 0; i < obj->buffer.length && i < 8; i++) >> + mask |= (((u64)obj->buffer.pointer[i]) << (i * 8)); > > Don't we have some helpers for this? At least I remember similar code > went to one of PDx86 drivers like intel-vbtn or so. There is acpi_check_dsm(), which checks if a single function is supported, but as far as I can tell no helper that returns a bitmask of all supported functions. I can change it to use acpi_check_dsm() in ssam_dsm_load_u32() though. > >> + if (mask & 0x01) > > BIT(0) ? BIT(0) looks better, ack. > >> + *funcs = mask; > > ... > >> + caps->ssh_power_profile = (u32)-1; >> + caps->screen_on_sleep_idle_timeout = (u32)-1; >> + caps->screen_off_sleep_idle_timeout = (u32)-1; >> + caps->d3_closes_handle = false; >> + caps->ssh_buffer_size = (u32)-1; > > Use proper types and their limits (limits.h missed?). Okay. > > ... > >> + // initialize request and packet transport layers > > Inconsistent style of comments. Will change that. > > ... > >> + * In the course of this shutdown procedure, all currently registered >> + * notifiers will be unregistered. It is, however, strongly recommended to not >> + * rely on this behavior, and instead the party registring the notifier should > > registering Ack. > >> + * unregister it before the controller gets shut down, e.g. via the SSAM bus >> + * which guarantees client devices to be removed before a shutdown. > >> + * Note that events may still be pending after this call, but due to the >> + * notifiers being unregistered, the will be dropped when the controller is > > the?! Should be "the events". > >> + * subsequently being destroyed via ssam_controller_destroy(). > > ... > >> + * Ensures that all resources associated with the controller get freed. This >> + * function should only be called after the controller has been stopped via >> + * ssam_controller_shutdown(). In general, this function should not be called >> + * directly. The only valid place to call this function direclty is during > > directly Ack. > >> + * initialization, before the controller has been fully initialized and passed >> + * to other processes. This function is called automatically when the >> + * reference count of the controller reaches zero. > > ... > >> + * ssam_request_sync_free() - Free a synchronous request. >> + * @rqst: The request to free. > > to be freed? I think both should be grammatically correct, but I'll change it. > > ... > >> + * Allocates a synchronous request struct on the stack, fully initializes it >> + * using the provided buffer as message data buffer, submits it, and then >> + * waits for its completion before returning its staus. The > > status Ack. > >> + * SSH_COMMAND_MESSAGE_LENGTH() macro can be used to compute the required >> + * message buffer size. > > ... > >> + * This is a wrapper for the raw SAM request to enable an event, thus it does >> + * not handle referecnce counting for enable/disable of events. If an event >> + * has already been enabled, the EC will ignore this request. > > Grammar and English language style somehow feels not okay. I'll try to reword that. I agree that this feels a bit off. > > ... > >> + u8 buf[1] = { 0x00 }; > > Can't be simply buf ? It can be, I'll change that > > ... > >> + * This function will only send the display-off notification command if >> + * display noticications are supported by the EC. Currently all known devices >> + * support these notification. > > Spell check! Will do that. > > ... > >> + * This function will only send the display-on notification command if display >> + * noticications are supported by the EC. Currently all known devices support >> + * these notification. > > Ditto. Ack. > > ... > >> + ssam_err(ctrl, "unexpected response from D0-exit notification:" >> + " 0x%02x\n", response); > > Don't split string literals. Had you run a checkpatch? > Please do and use strict mode. I did run checkpatch, however, I decided to split strings as some of them are a bit long. When doing that, I only split at places where I personally wouldn't expect a grep to succeed for, e.g. before or after format specifiers. I though that this is okay as the main reason for "don't split string literals" is searchability. > > > ... > > Overall impression of this submission: > - it's quite huge as for a single patch I tried to split it into parts, but this is the smallest I've been able to get it while also keeping it a functional unit. > - it feels like quite an overengineered solution: READ_ONCE, RCU, > list + spin_lock, RB tree of notifiers, my head! Sorry about any headaches. My main goal was stability and reliability, while keeping things somewhat performant and generic. The "generic" part mostly because this whole thing is based on reverse-engineering, and I believe that makes it easier to correct for things that we got wrong, e.g. in what we think the protocol looks like. Packet state is tracked via atomic flags (which is also laid out in a comment), and is the simplest solution I found to track that. I found that a simple enum based state will complicate things as the driver has to support and properly handle re-submission of packets. READ_ONCE and WRITE_ONCE are used to ensure proper access to state that can be changed outside of the queue/pending locks (or under any one of them). In general, I have tried to document all critical access of such state with an explanation of why it is safe to do so. RCU is only used for notifier callbacks, which is based on the SRCU notifiers from linux/notifier.h and kernel/notifier.c, but adapted to detect unhandled events. While it is somewhat interwoven, the (RCU-based) notifier chain stores the notifiers (one chain per target category) and the RB-tree stores the number of times an event has been enabled. Essentially, those should be seen as two separate parts (see e.g. documentation of struct ssam_nf, which brings those two together). So the RB-tree is not for notifiers, but for tracking which events are enabled. Basically reference-counting for enabled events, as the EC itself only allows turning them on/off. It's necessary to do that as there can be multiple clients listening to the same event. It's also required to do this with some sort of map, as (in context of enabling/disabling) the events are uniquely identified by both their registry and ID. This combination is a bit too complex to do that with a simple array, so a RB-tree looked like the best solution. Note that we store notifiers per target category only, so for that we can use a simple array. > - where is the architectural document of all these? What specifically do you want to see documented? I've documented the basic communication and principles in another patch of this series, but kept implementation details somewhat vague as I think those are better covered by struct and function documentation (which won't become outdated as easily as a separate documentation). Thank you for your comments. Regards, Max
On 11/16/20 6:03 PM, Maximilian Luz wrote: > On 11/16/20 2:33 PM, Andy Shevchenko wrote: >> On Sun, Nov 15, 2020 at 9:25 PM Maximilian Luz <luzmaximilian@gmail.com> wrote: [...] > READ_ONCE and WRITE_ONCE are used to ensure proper access to state that > can be changed outside of the queue/pending locks (or under any one of > them). In general, I have tried to document all critical access of such > state with an explanation of why it is safe to do so. I've looked at this again and noticed that I can guard the packet timestamp by the pending lock and the packet priority by the queue lock (after first submission). This makes reasoning about access to them significantly easier and removes the need for WRITE_ONCE / READ_ONCE. After that, READ_ONCE is used - to access the controller state for smoke-testing to (hopefully) detect invalid request function usage (note that all other access to this state happens under the controller state lock) - for the "safe counters", where access to the shared value is, after initialization, restricted to the increment function - to update the timeout reaper, where access to the shared value (rtx_timeout.expires) is, after initialization, restricted to its modification function (ssh_ptl_timeout_reaper_mod() / ssh_rtl_timeout_reaper_mod()) and the timer function - to access the request timestamp, which is, after initialization, only set once in the lifetime of a request (all other access is read-only) - to access the 'ptl' reference of the packet, which, after initialization, is only set once, either at packet or request submission (all other access is read-only). Note due to this, READ_ONCE is only required for functions that can run concurrently with ssh_ptl_submit() and ssh_rtl_submit(), i.e. ssh_ptl_cancel() and ssh_rtl_cancel(). - to access request state outside of bit-ops when canceling I'd argue that all of these cases can be checked and verified with a reasonable amount of effort. Cancellation (last two points) is probably the most complex one. Unfortunately, I don't see any way to simplify that part without disallowing cancellation to run concurrently to submission, which is something I'd like to support as this makes implementing asynchronous requests in the future easier. Regards, Max
Hi I have attached some thoughts and comments inline. 2020. november 15., vasárnap 20:21 keltezéssel, Maximilian Luz írta: > [...] > +/* -- Event notifier/callbacks. --------------------------------------------- */ > +/* > + * The notifier system is based on linux/notifier.h, specifically the SRCU > + * implementation. The difference to that is, that some bits of the notifier > + * call return value can be tracked across multiple calls. This is done so that > + * handling of events can be tracked and a warning can be issued in case an > + * event goes unhandled. The idea of that waring is that it should help discover ^ "warning" > + * and identify new/currently unimplemented features. > + */ > + > + > +/** > + * ssam_event_matches_notifier() - Test if an event matches a notifier; ^ Shouldn't it be a period? > + * @notif: The event notifier to test against. > + * @event: The event to test. > + * > + * Return: Returns %true iff the given event matches the given notifier > + * according to the rules set in the notifier's event mask, %false otherwise. > + */ > [...] > +static int __ssam_nfblk_remove(struct ssam_nf_head *nh, > + struct ssam_notifier_block *nb) > +{ > + struct ssam_notifier_block **link; > + > + link = __ssam_nfblk_find_link(nh, nb); > + if (!link) > + return -ENOENT; I find it odd that here you return ENOENT, but in `__ssam_nfblk_insert()` EINVAL is returned instead of EEXIST. I believe either both should be EINVAL, or EEXIST+ENOENT. > + > + __ssam_nfblk_erase(link); I'm wondering if it's necessary to create a new function which contains just a single line. > + return 0; > +} > [...] > +/** > + * ssam_nf_head_destroy() - Deinitialize the given notifier head. > + * @nh: The notifier head to deinitialize. > + */ > +static void ssam_nf_head_destroy(struct ssam_nf_head *nh) > +{ > + cleanup_srcu_struct(&nh->srcu); > +} I'm also wondering if there's any reason why these static one-liner functions are not explicitly marked inline. > [...] > +/** > + * struct ssam_nf_refcount_entry - RB-tree entry for referecnce counting event > + * activations. > + * @node: The node of this entry in the rb-tree. > + * @key: The key of the event. > + * @refcount: The reference-count of the event. > + * @flags: The flags used when enabling the event. > + */ > +struct ssam_nf_refcount_entry { > + struct rb_node node; > + struct ssam_nf_refcount_key key; > + int refcount; Is there any reason why a signed type is used for reference counting? > + u8 flags; > +}; > + > + > +/** > + * ssam_nf_refcount_inc() - Increment reference-/activation-count of the given > + * event. > + * @nf: The notifier system reference. > + * @reg: The registry used to enable/disable the event. > + * @id: The event ID. > + * > + * Increments the reference-/activation-count associated with the specified > + * event type/ID, allocating a new entry for this event ID if necessary. A > + * newly allocated entry will have a refcount of one. Shouldn't it be noted that nf->lock(?) must(?) be held when calling? > [...] > +/** > + * ssam_nf_call() - Call notification callbacks for the provided event. > + * @nf: The notifier system > + * @dev: The associated device, only used for logging. > + * @rqid: The request ID of the event. > + * @event: The event provided to the callbacks. > + * > + * Executa registered callbacks in order of their priority until either no ^ "execute" > + * callback is left or a callback returned a value with the %SSAM_NOTIF_STOP > + * bit set. Note that this bit is set automatically when converting non.zero ^ maybe "non-zero"? > + * error values via ssam_notifier_from_errno() to notifier values. > + * > + * Also note that any callback that could handle an event should return a value > + * with bit %SSAM_NOTIF_HANDLED set, indicating that the event does not go > + * unhandled/ignored. In case no registered callback could handle an event, > + * this function will emit a warning. > + * > + * In case a callback failed, this function will emit an error message. > + */ > +static void ssam_nf_call(struct ssam_nf *nf, struct device *dev, u16 rqid, > + struct ssam_event *event) > +{ > + struct ssam_nf_head *nf_head; > + int status, nf_ret; > + > + if (!ssh_rqid_is_event(rqid)) { > + dev_warn(dev, "event: unsupported rqid: 0x%04x\n", rqid); A small note, "%#04x" would insert the "0x" prefix. > + return; > + } > [...] > +/** > + * ssam_event_item_alloc() - Allocate an event item with the given payload size. > + * @len: The event payload length. > + * @flags: The flags used for allocation. > + * > + * Allocate an event item with the given payload size. Sets the item > + * operations and payload length values. The item free callback (``ops.free``) > + * should not be overwritten after this call. > + * > + * Return: Returns the newly allocated event item. > + */ > +static struct ssam_event_item *ssam_event_item_alloc(size_t len, gfp_t flags) The `flags` argument is seemingly ignored. > +{ > + struct ssam_event_item *item; > + > + item = kzalloc(sizeof(*item) + len, GFP_KERNEL); I believe `struct_size(item, event.data, len)` could be utilized here. > + if (!item) > + return NULL; > + > + item->event.length = len; > + return item; > +} > [...] > +static void ssam_event_queue_work_fn(struct work_struct *work) > +{ > + struct ssam_event_queue *queue; > + struct ssam_event_item *item; > + struct ssam_nf *nf; > + struct device *dev; > + int i; > + > + queue = container_of(work, struct ssam_event_queue, work); > + nf = &queue->cplt->event.notif; > + dev = queue->cplt->dev; > + > + // limit number of processed events to avoid livelocking > + for (i = 0; i < 10; i++) { > + item = ssam_event_queue_pop(queue); > + if (item == NULL) I believe `!item` is preferred. > + return; > + > + ssam_nf_call(nf, dev, item->rqid, &item->event); > + kfree(item); > + } > + > + if (!ssam_event_queue_is_empty(queue)) > + ssam_cplt_submit(queue->cplt, &queue->work); > +} > [...] > +static void ssam_handle_event(struct ssh_rtl *rtl, > + const struct ssh_command *cmd, > + const struct ssam_span *data) > +{ > + struct ssam_controller *ctrl = to_ssam_controller(rtl, rtl); > + struct ssam_event_item *item; > + > + item = ssam_event_item_alloc(data->len, GFP_KERNEL); > + if (!item) > + return; > + > + item->rqid = get_unaligned_le16(&cmd->rqid); > + item->event.target_category = cmd->tc; > + item->event.target_id = cmd->tid_in; > + item->event.command_id = cmd->cid; > + item->event.instance_id = cmd->iid; > + memcpy(&item->event.data[0], data->ptr, data->len); > + > + WARN_ON(ssam_cplt_submit_event(&ctrl->cplt, item)); I believe that if submission fails, `item` is leaked. > +} > + > +static const struct ssh_rtl_ops ssam_rtl_ops = { > + .handle_event = ssam_handle_event, > +}; > + > + > +static bool ssam_notifier_empty(struct ssam_controller *ctrl); I think it'd be better to name it `ssam_notifier_is_empty()` to be consistent with the rest of the patch. > [...] > +static int ssam_controller_caps_load_from_acpi( > + acpi_handle handle, struct ssam_controller_caps *caps) > +{ > + u32 d3_closes_handle = false; Assinging a boolean like this to a `u32` looks very odd to me. > + u64 funcs; > + int status; > + > + // set defaults > + caps->ssh_power_profile = (u32)-1; > + caps->screen_on_sleep_idle_timeout = (u32)-1; > + caps->screen_off_sleep_idle_timeout = (u32)-1; > + caps->d3_closes_handle = false; > + caps->ssh_buffer_size = (u32)-1; > [...] > + > +/** > + * ssam_controller_start() - Start the receiver and transmitter threads of the > + * controller. > + * @ctrl: The controller. > + * > + * Note: When this function is called, the controller shouldbe properly hooked ^ space > + * up to the serdev core via &struct serdev_device_ops. Please refert to ^ "refer" > [...] > +void ssam_controller_shutdown(struct ssam_controller *ctrl) > +{ > + enum ssam_controller_state s = ctrl->state; > + int status; > + > + if (s == SSAM_CONTROLLER_UNINITIALIZED || s == SSAM_CONTROLLER_STOPPED) > + return; > + > + // try to flush pending events and requests while everything still works > + status = ssh_rtl_flush(&ctrl->rtl, msecs_to_jiffies(5000)); Wouldn't it be better to name that 5000? > + if (status) { > + ssam_err(ctrl, "failed to flush request transport layer: %d\n", > + status); > + } > [...] > + > +/** > + * ssam_controller_destroy() - Destroy the controller and free its resources. > + * @ctrl: The controller. > + * > + * Ensures that all resources associated with the controller get freed. This > + * function should only be called after the controller has been stopped via > + * ssam_controller_shutdown(). In general, this function should not be called > + * directly. The only valid place to call this function direclty is during ^ "directly" > + * initialization, before the controller has been fully initialized and passed > + * to other processes. This function is called automatically when the > + * reference count of the controller reaches zero. > + * > + * Must be called from an exclusive context with regards to the controller > + * state. > + */ > [...] > +int ssam_request_sync_alloc(size_t payload_len, gfp_t flags, > + struct ssam_request_sync **rqst, > + struct ssam_span *buffer) > +{ > + size_t msglen = SSH_COMMAND_MESSAGE_LENGTH(payload_len); > + > + *rqst = kzalloc(sizeof(**rqst) + msglen, flags); > + if (!*rqst) > + return -ENOMEM; > + > + buffer->ptr = (u8 *)(*rqst + 1); > + buffer->len = msglen; > + > + return 0; > +} I think there is a bit of incosistency: sometimes you use ** pointer + return int, sometimes you return a pointer with potentially embedded errno. I think it would be better if you stuck with one or the other. > [...] > +static int ssam_ssh_event_disable(struct ssam_controller *ctrl, > + struct ssam_event_registry reg, > + struct ssam_event_id id, u8 flags) > +{ > [...] > + rqst.command_id = reg.cid_disable; If I see it correctly, this line is the only significant difference between this one and the previous function. Is there any reason they're not combined? > [...] > +/** > + * ssam_notifier_disable_registered() - Disable events for all registered > + * notifiers. > + * @ctrl: The controller for which to disable the notifiers/events. > + * > + * Disables events for all currently registered notifiers. In case of an error > + * (EC command failing), all previously disabled events will be restored and > + * the error code returned. > + * > + * This function is intended to disable all events prior to hibenration entry. ^ "hibernation" > [...] > +void ssam_irq_disarm_wakeup(struct ssam_controller *ctrl) > +{ > + int status; > + > + if (ctrl->irq.wakeup_enabled) { > + status = disable_irq_wake(ctrl->irq.num); > + if (status) > + ssam_err(ctrl, "failed to disable wake IRQ: %d\n", status); > + > + ctrl->irq.wakeup_enabled = false; It's set to false even if `disable_irq_wake()` fails? > + } > + disable_irq(ctrl->irq.num); > +} > [...] > +static int ssam_try_set_controller(struct ssam_controller *ctrl) > +{ > + int status = 0; > + > + spin_lock(&__ssam_controller_lock); > + if (!__ssam_controller) > + __ssam_controller = ctrl; > + else > + status = -EBUSY; I feel like EBUSY might not be the best errno here. > + spin_unlock(&__ssam_controller_lock); > + > + return status; > +} > [...] > +int ssam_client_link(struct ssam_controller *c, struct device *client) > +{ > + const u32 flags = DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_CONSUMER; > + struct device_link *link; > + struct device *ctrldev; > + > + ssam_controller_statelock(c); > + > + if (c->state != SSAM_CONTROLLER_STARTED) { > + ssam_controller_stateunlock(c); > + return -ENXIO; > + } > + > + ctrldev = ssam_controller_device(c); > + if (!ctrldev) { > + ssam_controller_stateunlock(c); > + return -ENXIO; > + } > + I'm not sure if ENXIO is the best errno in the last two returns; > + link = device_link_add(client, ctrldev, flags); > + if (!link) { > + ssam_controller_stateunlock(c); > + return -ENOMEM; > + } > + > + /* > + * Return -ENXIO if supplier driver is on its way to be removed. In this > + * case, the controller won't be around for much longer and the device > + * link is not going to save us any more, as unbinding is already in > + * progress. > + */ > + if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) { > + ssam_controller_stateunlock(c); > + return -ENXIO; > + } > + > + ssam_controller_stateunlock(c); > + return 0; > +} > [...] > +int ssam_client_bind(struct device *client, struct ssam_controller **ctrl) > +{ > + struct ssam_controller *c; > + int status; > + > + c = ssam_get_controller(); > + if (!c) > + return -ENXIO; To me, ENODEV seems like a better fit here. > + > + status = ssam_client_link(c, client); > [...] > +static ssize_t firmware_version_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + struct ssam_controller *ctrl = dev_get_drvdata(dev); > + u32 version, a, b, c; > + int status; > + > + status = ssam_get_firmware_version(ctrl, &version); > + if (status < 0) > + return status; > + > + a = (version >> 24) & 0xff; > + b = ((version >> 8) & 0xffff); > + c = version & 0xff; > + > + return snprintf(buf, PAGE_SIZE - 1, "%u.%u.%u\n", a, b, c); `snprintf()` takes care of the null byte, so simply `PAGE_SIZE` would've been sufficient. But that doesn't matter much since you should use `sysfs_emit()`. > +} > +static DEVICE_ATTR_RO(firmware_version); > + > +static struct attribute *ssam_sam_attrs[] = { > + &dev_attr_firmware_version.attr, > + NULL, ^ I believe it is preferred to omit the comma after the terminating entry. > [...] > +/** > + * msgb_push_cmd() - Push a SSH command frame with payload to the buffer. > + * @msgb: The message buffer. > + * @seq: The sequence ID (SEQ) of the frame/packet. > + * @rqid: The request ID (RQID) of the request contained in the frame. > + * @rqst: The request to wrap in the frame. > + */ > +static inline void msgb_push_cmd(struct msgbuf *msgb, u8 seq, u16 rqid, > + const struct ssam_request *rqst) > +{ > + struct ssh_command *cmd; > + const u8 *cmd_begin; > + const u8 type = SSH_FRAME_TYPE_DATA_SEQ; > + > + // SYN > + msgb_push_syn(msgb); > + > + // command frame + crc > + msgb_push_frame(msgb, type, sizeof(*cmd) + rqst->length, seq); > + > + // frame payload: command struct + payload > + if (WARN_ON(msgb->ptr + sizeof(*cmd) > msgb->end)) > + return; > + > + cmd_begin = msgb->ptr; > + cmd = (struct ssh_command *)msgb->ptr; I believe this violates strict aliasing. > [...] > + * Note that the packet completion callback is, in case of success and for a > + * sequenced packet, guaranteed to run on the receiver thread, thus providing > + * a way to reliably identify responses to the packet. The packet completion > + * callback is only run once and it does not indicate that the packet has > + * fully left the system (for this, one should rely on the release method, > + * triggered when the reference count of the packet reaches zero). In case of > + * re-submission (and with somewhat unlikely timing), it may be possible that > + * the packet is being re-transmitted while the completion callback runs. > + * Completion will occur both on success and internal error, as well as when > + * the packet is canceled. If I understand it correctly, it is possible that submission of a packet fails for the first time, but it's scheduled for resubmission, and this retransmission happens at the same time when the complete() callback is called. If that's the case, then the callback is called with an error condition, no? Thus it is possible that a packet is successfully submitted (for the second, third, etc. time), but the complete() callback receives notification about failure? Or am I missing something? > [...] > +int ssh_ptl_rx_rcvbuf(struct ssh_ptl *ptl, const u8 *buf, size_t n) > +{ > + int used; > + > + if (test_bit(SSH_PTL_SF_SHUTDOWN_BIT, &ptl->state)) > + return -ESHUTDOWN; > + > + used = kfifo_in(&ptl->rx.fifo, buf, n); Isn't it possible that `n` is greater than the free space in the fifo? What happens then? > + if (used) > + ssh_ptl_rx_wakeup(ptl); > + > + return used; > +} > [...] > +void ssh_ptl_shutdown(struct ssh_ptl *ptl) > +{ > [...] > + * Note 2: We can re-use queue_node (or pending_node) if we mark the > + * packet as locked an then remove it from the queue (or pending set > + * respecitvely). Marking the packet as locked avoids re-queueing > + * (which should already be prevented by having stopped the treads...) > + * and not setting QUEUED_BIT (or PENDING_BIT) prevents removal from a > + * new list via other threads (e.g. canellation). ^ "cancellation" > + * > + * Note 3: There may be overlap between complete_p and complete_q. > + * This is handled via test_and_set_bit() on the "completed" flag > + * (also handles cancellation). > + */ > [...] > +#define __ssam_prcond(func, p, fmt, ...) \ > + do { \ > + if ((p)) \ I believe `if (p)` is sufficient. > + func((p), fmt, ##__VA_ARGS__); \ > + } while (0) > [...] > +int sshp_parse_frame(const struct device *dev, const struct ssam_span *source, > + struct ssh_frame **frame, struct ssam_span *payload, > + size_t maxlen) > +{ > [...] > + // ensure packet does not exceed maximum length > + sp.len = get_unaligned_le16(&((struct ssh_frame *)sf.ptr)->len); > + if (unlikely(sp.len + SSH_MESSAGE_LENGTH(0) > maxlen)) { > + dev_warn(dev, "rx: parser: frame too large: %u bytes\n", I believe `%hu` would be more appropriate. > + ((struct ssh_frame *)sf.ptr)->len); Why isn't `get_unaligned_le16()` used here? (Or simply even `sp.len`.) > + return -EMSGSIZE; > + } > [...] > + *frame = (struct ssh_frame *)sf.ptr; This also violates strict aliasing. > + *payload = sp; > + > + dev_dbg(dev, "rx: parser: valid frame found (type: 0x%02x, len: %u)\n", > + (*frame)->type, (*frame)->len); > + > + return 0; > +} > [...] > +int sshp_parse_command(const struct device *dev, const struct ssam_span *source, > + struct ssh_command **command, > + struct ssam_span *command_data) > +{ > + // check for minimum length > + if (unlikely(source->len < sizeof(struct ssh_command))) { > + *command = NULL; > + command_data->ptr = NULL; > + command_data->len = 0; > + > + dev_err(dev, "rx: parser: command payload is too short\n"); > + return -ENOMSG; > + } > + > + *command = (struct ssh_command *)source->ptr; I'm quite sure this also violates strict aliasing. > + command_data->ptr = source->ptr + sizeof(struct ssh_command); > + command_data->len = source->len - sizeof(struct ssh_command); > + > + dev_dbg(dev, "rx: parser: valid command found (tc: 0x%02x, cid: 0x%02x)\n", > + (*command)->tc, (*command)->cid); > + > + return 0; > +} > [...] > +#define SSH_MSGOFFSET_FRAME(field) \ > + (sizeof(u16) + offsetof(struct ssh_frame, field)) > + > +/** > + * SSH_MSGOFFSET_COMMAND() - Compute offset in SSH message to specified field > + * in command. > + * @field: The field for which the offset should be computed. > + * > + * Return: Returns the offset of the specified &struct ssh_command field in > + * the raw SSH message data. > + */ > +#define SSH_MSGOFFSET_COMMAND(field) \ > + (2ull * sizeof(u16) + sizeof(struct ssh_frame) \ > + + offsetof(struct ssh_command, field)) I believe it should be noted (here and for `SSH_MSGOFFSET_FRAME()`) why the `sizeof(u16)`s are necessary. > + > +/* > + * SSH_MSG_SYN - SSH message synchronization (SYN) bytes as u16. > + */ > +#define SSH_MSG_SYN ((u16)0x55aa) > [...] > +enum ssam_ssh_tc { > + /* Known SSH/EC target categories. */ > + // category 0x00 is invalid for EC use > + SSAM_SSH_TC_SAM = 0x01, // generic system functionality, real-time clock > + SSAM_SSH_TC_BAT = 0x02, // battery/power subsystem > + SSAM_SSH_TC_TMP = 0x03, // thermal subsystem > + SSAM_SSH_TC_PMC = 0x04, > + SSAM_SSH_TC_FAN = 0x05, > + SSAM_SSH_TC_PoM = 0x06, > + SSAM_SSH_TC_DBG = 0x07, > + SSAM_SSH_TC_KBD = 0x08, // legacy keyboard (Laptop 1/2) > + SSAM_SSH_TC_FWU = 0x09, > + SSAM_SSH_TC_UNI = 0x0a, > + SSAM_SSH_TC_LPC = 0x0b, > + SSAM_SSH_TC_TCL = 0x0c, > + SSAM_SSH_TC_SFL = 0x0d, > + SSAM_SSH_TC_KIP = 0x0e, > + SSAM_SSH_TC_EXT = 0x0f, > + SSAM_SSH_TC_BLD = 0x10, > + SSAM_SSH_TC_BAS = 0x11, // detachment system (Surface Book 2/3) > + SSAM_SSH_TC_SEN = 0x12, > + SSAM_SSH_TC_SRQ = 0x13, > + SSAM_SSH_TC_MCU = 0x14, > + SSAM_SSH_TC_HID = 0x15, // generic HID input subsystem > + SSAM_SSH_TC_TCH = 0x16, > + SSAM_SSH_TC_BKL = 0x17, > + SSAM_SSH_TC_TAM = 0x18, > + SSAM_SSH_TC_ACC = 0x19, > + SSAM_SSH_TC_UFI = 0x1a, > + SSAM_SSH_TC_USC = 0x1b, > + SSAM_SSH_TC_PEN = 0x1c, > + SSAM_SSH_TC_VID = 0x1d, > + SSAM_SSH_TC_AUD = 0x1e, > + SSAM_SSH_TC_SMC = 0x1f, > + SSAM_SSH_TC_KPD = 0x20, > + SSAM_SSH_TC_REG = 0x21, > +}; Is it known what these abbreviations stand for? Maybe I missed them? > [...] > +/** > + * struct ssh_request_ops - Callback operations for a SSH request. > + * @release: Function called when the request's reference count reaches zero. > + * This callback must be relied upon to ensure that the request has > + * left the transport systems (both, packet an request systems). > + * @complete: Function called when the request is completed, either with > + * success or failure. The command data for the request response > + * is provided via the &struct ssh_command parameter (``cmd``), > + * the command payload of the request response via the &struct > + * ssh_span parameter (``data``). > + * > + * If the request does not have any response or has not been > + * completed with success, both ``cmd`` and ``data`` parameters will > + * be NULL. If the request response does not have any command > + * payload, the ``data`` span will be an empty (zero-length) span. > + * > + * In case of failure, the reason for the failure is indicated by > + * the value of the provided status code argument (``status``). This > + * value will be zero in case of success. I believe it should be noted if the `status` argument is a regular errno, or something different. > + * > + * Note that a call to this callback does not guarantee that the > + * request is not in use by the transport systems any more. > + */ > +struct ssh_request_ops { > + void (*release)(struct ssh_request *rqst); > + void (*complete)(struct ssh_request *rqst, > + const struct ssh_command *cmd, > + const struct ssam_span *data, int status); > +}; > [...] I have to agree, this is quite a sizable patch, although I think it's well-commented, which helps reading the code by a lot, however, in some places I feel like it's a bit over-engineered (or maybe I just cannot fully appreciate the subject at hand at the moment), nonetheless, I applaud your efforts, I can only imagine the hours that must have gone into it. Regards, Barnabás Pőcze
On 11/18/20 1:28 AM, Barnabás Pőcze wrote: > Hi > > I have attached some thoughts and comments inline. Thanks! > 2020. november 15., vasárnap 20:21 keltezéssel, Maximilian Luz írta: > >> [...] >> +/* -- Event notifier/callbacks. --------------------------------------------- */ >> +/* >> + * The notifier system is based on linux/notifier.h, specifically the SRCU >> + * implementation. The difference to that is, that some bits of the notifier >> + * call return value can be tracked across multiple calls. This is done so that >> + * handling of events can be tracked and a warning can be issued in case an >> + * event goes unhandled. The idea of that waring is that it should help discover > ^ > "warning" Thanks, I'll run a spell-checker over the comments and (hopefully) fix all typos. >> + * and identify new/currently unimplemented features. >> + */ >> + >> + >> +/** >> + * ssam_event_matches_notifier() - Test if an event matches a notifier; > ^ > Shouldn't it be a period? Yes, it should. >> + * @notif: The event notifier to test against. >> + * @event: The event to test. >> + * >> + * Return: Returns %true iff the given event matches the given notifier >> + * according to the rules set in the notifier's event mask, %false otherwise. >> + */ > >> [...] > >> +static int __ssam_nfblk_remove(struct ssam_nf_head *nh, >> + struct ssam_notifier_block *nb) >> +{ >> + struct ssam_notifier_block **link; >> + >> + link = __ssam_nfblk_find_link(nh, nb); >> + if (!link) >> + return -ENOENT; > > I find it odd that here you return ENOENT, but in `__ssam_nfblk_insert()` > EINVAL is returned instead of EEXIST. I believe either both should be EINVAL, > or EEXIST+ENOENT. Yes, EEXIST is better for insert. >> + >> + __ssam_nfblk_erase(link); > > I'm wondering if it's necessary to create a new function which contains just > a single line. I prefer to create these sorts of abstractions as this, in my opinion, keeps things contained and makes things easier to reason about. In this case, it also allows for documentation. >> + return 0; >> +} > >> [...] > >> +/** >> + * ssam_nf_head_destroy() - Deinitialize the given notifier head. >> + * @nh: The notifier head to deinitialize. >> + */ >> +static void ssam_nf_head_destroy(struct ssam_nf_head *nh) >> +{ >> + cleanup_srcu_struct(&nh->srcu); >> +} > > I'm also wondering if there's any reason why these static one-liner functions are > not explicitly marked inline. Isn't inline more of a linkage keyword at this point? I fully expect any compiler to inline things like this automatically at the first hint of an optimization flag. >> [...] > >> +/** >> + * struct ssam_nf_refcount_entry - RB-tree entry for referecnce counting event >> + * activations. >> + * @node: The node of this entry in the rb-tree. >> + * @key: The key of the event. >> + * @refcount: The reference-count of the event. >> + * @flags: The flags used when enabling the event. >> + */ >> +struct ssam_nf_refcount_entry { >> + struct rb_node node; >> + struct ssam_nf_refcount_key key; >> + int refcount; > > Is there any reason why a signed type is used for reference counting? Can help to spot overflows when debugging, not that that should happen. We also don't need values larger than INT_MAX. We shouldn't even get remotely close to it. >> + u8 flags; >> +}; >> + >> + >> +/** >> + * ssam_nf_refcount_inc() - Increment reference-/activation-count of the given >> + * event. >> + * @nf: The notifier system reference. >> + * @reg: The registry used to enable/disable the event. >> + * @id: The event ID. >> + * >> + * Increments the reference-/activation-count associated with the specified >> + * event type/ID, allocating a new entry for this event ID if necessary. A >> + * newly allocated entry will have a refcount of one. > > Shouldn't it be noted that nf->lock(?) must(?) be held when calling? All functions taking 'struct ssam_nf *' as first parameter must run synchronized with regards to each other, except for ssam_nf_call(). Same for all access to struct ssam_nf directly. Whether that is guaranteed by nf->lock or other means (de-/initialization) doesn't really matter. Only higher-level functions may run concurrently. On the higher level, the lock must also be held to ensure that the EC requests are sent at the right time (which is its main purpose). Arguably that's not well documented here, I'll try to fix that. [...] >> + * callback is left or a callback returned a value with the %SSAM_NOTIF_STOP >> + * bit set. Note that this bit is set automatically when converting non.zero > ^ > maybe "non-zero"? Right. >> + * error values via ssam_notifier_from_errno() to notifier values. >> + * >> + * Also note that any callback that could handle an event should return a value >> + * with bit %SSAM_NOTIF_HANDLED set, indicating that the event does not go >> + * unhandled/ignored. In case no registered callback could handle an event, >> + * this function will emit a warning. >> + * >> + * In case a callback failed, this function will emit an error message. >> + */ >> +static void ssam_nf_call(struct ssam_nf *nf, struct device *dev, u16 rqid, >> + struct ssam_event *event) >> +{ >> + struct ssam_nf_head *nf_head; >> + int status, nf_ret; >> + >> + if (!ssh_rqid_is_event(rqid)) { >> + dev_warn(dev, "event: unsupported rqid: 0x%04x\n", rqid); > > A small note, "%#04x" would insert the "0x" prefix. Oh neat, didn't know that. Should be "%#06x" though, right? >> + return; >> + } > >> [...] > >> +/** >> + * ssam_event_item_alloc() - Allocate an event item with the given payload size. >> + * @len: The event payload length. >> + * @flags: The flags used for allocation. >> + * >> + * Allocate an event item with the given payload size. Sets the item >> + * operations and payload length values. The item free callback (``ops.free``) >> + * should not be overwritten after this call. >> + * >> + * Return: Returns the newly allocated event item. >> + */ >> +static struct ssam_event_item *ssam_event_item_alloc(size_t len, gfp_t flags) > > The `flags` argument is seemingly ignored. Oh, right. >> +{ >> + struct ssam_event_item *item; >> + >> + item = kzalloc(sizeof(*item) + len, GFP_KERNEL); > > I believe `struct_size(item, event.data, len)` could be utilized here. Right, I'll use that. >> + if (!item) >> + return NULL; >> + >> + item->event.length = len; >> + return item; >> +} > >> [...] > >> +static void ssam_event_queue_work_fn(struct work_struct *work) >> +{ >> + struct ssam_event_queue *queue; >> + struct ssam_event_item *item; >> + struct ssam_nf *nf; >> + struct device *dev; >> + int i; >> + >> + queue = container_of(work, struct ssam_event_queue, work); >> + nf = &queue->cplt->event.notif; >> + dev = queue->cplt->dev; >> + >> + // limit number of processed events to avoid livelocking >> + for (i = 0; i < 10; i++) { >> + item = ssam_event_queue_pop(queue); >> + if (item == NULL) > > I believe `!item` is preferred. Right. >> + return; >> + >> + ssam_nf_call(nf, dev, item->rqid, &item->event); >> + kfree(item); >> + } >> + >> + if (!ssam_event_queue_is_empty(queue)) >> + ssam_cplt_submit(queue->cplt, &queue->work); >> +} > >> [...] > >> +static void ssam_handle_event(struct ssh_rtl *rtl, >> + const struct ssh_command *cmd, >> + const struct ssam_span *data) >> +{ >> + struct ssam_controller *ctrl = to_ssam_controller(rtl, rtl); >> + struct ssam_event_item *item; >> + >> + item = ssam_event_item_alloc(data->len, GFP_KERNEL); >> + if (!item) >> + return; >> + >> + item->rqid = get_unaligned_le16(&cmd->rqid); >> + item->event.target_category = cmd->tc; >> + item->event.target_id = cmd->tid_in; >> + item->event.command_id = cmd->cid; >> + item->event.instance_id = cmd->iid; >> + memcpy(&item->event.data[0], data->ptr, data->len); >> + >> + WARN_ON(ssam_cplt_submit_event(&ctrl->cplt, item)); > > I believe that if submission fails, `item` is leaked. Oh, that is correct. Thanks for spotting that! >> +} >> + >> +static const struct ssh_rtl_ops ssam_rtl_ops = { >> + .handle_event = ssam_handle_event, >> +}; >> + >> + >> +static bool ssam_notifier_empty(struct ssam_controller *ctrl); > > I think it'd be better to name it `ssam_notifier_is_empty()` to be consistent > with the rest of the patch. Okay, I'll change that. >> [...] > >> +static int ssam_controller_caps_load_from_acpi( >> + acpi_handle handle, struct ssam_controller_caps *caps) >> +{ >> + u32 d3_closes_handle = false; > > Assinging a boolean like this to a `u32` looks very odd to me. The value returned by the corresponding DSM call is an integer, but essentially only contains a bool value (zero vs. one). I thought using false here explicitly for initialization helps clarify that. That also makes it consistent with the "caps->d3_closes_handle = false" line below. >> + u64 funcs; >> + int status; >> + >> + // set defaults >> + caps->ssh_power_profile = (u32)-1; >> + caps->screen_on_sleep_idle_timeout = (u32)-1; >> + caps->screen_off_sleep_idle_timeout = (u32)-1; >> + caps->d3_closes_handle = false; >> + caps->ssh_buffer_size = (u32)-1; [...] >> +void ssam_controller_shutdown(struct ssam_controller *ctrl) >> +{ >> + enum ssam_controller_state s = ctrl->state; >> + int status; >> + >> + if (s == SSAM_CONTROLLER_UNINITIALIZED || s == SSAM_CONTROLLER_STOPPED) >> + return; >> + >> + // try to flush pending events and requests while everything still works >> + status = ssh_rtl_flush(&ctrl->rtl, msecs_to_jiffies(5000)); > > Wouldn't it be better to name that 5000? I'll do that. >> + if (status) { >> + ssam_err(ctrl, "failed to flush request transport layer: %d\n", >> + status); >> + } [...] >> +int ssam_request_sync_alloc(size_t payload_len, gfp_t flags, >> + struct ssam_request_sync **rqst, >> + struct ssam_span *buffer) >> +{ >> + size_t msglen = SSH_COMMAND_MESSAGE_LENGTH(payload_len); >> + >> + *rqst = kzalloc(sizeof(**rqst) + msglen, flags); >> + if (!*rqst) >> + return -ENOMEM; >> + >> + buffer->ptr = (u8 *)(*rqst + 1); >> + buffer->len = msglen; >> + >> + return 0; >> +} > > I think there is a bit of incosistency: sometimes you use ** pointer + return int, > sometimes you return a pointer with potentially embedded errno. I think it would > be better if you stuck with one or the other. My rule of thumb is: If it's one output parameter, use a pointer as return value. If it's two or more output parameters (as in this case, "buffer" is written to), use two pointer arguments and an int as return. I noticed that ssam_client_bind doesn't follow that scheme so I'll change that. >> [...] > >> +static int ssam_ssh_event_disable(struct ssam_controller *ctrl, >> + struct ssam_event_registry reg, >> + struct ssam_event_id id, u8 flags) >> +{ >> [...] >> + rqst.command_id = reg.cid_disable; > > If I see it correctly, this line is the only significant difference between this one > and the previous function. Is there any reason they're not combined? Debug and error messages are also different. I'll try to put the common code into one function though. [...] >> +void ssam_irq_disarm_wakeup(struct ssam_controller *ctrl) >> +{ >> + int status; >> + >> + if (ctrl->irq.wakeup_enabled) { >> + status = disable_irq_wake(ctrl->irq.num); >> + if (status) >> + ssam_err(ctrl, "failed to disable wake IRQ: %d\n", status); >> + >> + ctrl->irq.wakeup_enabled = false; > > It's set to false even if `disable_irq_wake()` fails? Yes, on purpose. The irq.wakeup_enabled flag is there to ensure that disable_irq_wake() gets called only if there has been a successful enable_irq_wake() call. This seems to be a common pattern, see e.g. /drivers/mmc/host/sdhci.c >> + } >> + disable_irq(ctrl->irq.num); >> +} > >> [...] > >> +static int ssam_try_set_controller(struct ssam_controller *ctrl) >> +{ >> + int status = 0; >> + >> + spin_lock(&__ssam_controller_lock); >> + if (!__ssam_controller) >> + __ssam_controller = ctrl; >> + else >> + status = -EBUSY; > > I feel like EBUSY might not be the best errno here. Okay, I'll replace it with EEXIST. >> + spin_unlock(&__ssam_controller_lock); >> + >> + return status; >> +} > >> [...] > >> +int ssam_client_link(struct ssam_controller *c, struct device *client) >> +{ >> + const u32 flags = DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_CONSUMER; >> + struct device_link *link; >> + struct device *ctrldev; >> + >> + ssam_controller_statelock(c); >> + >> + if (c->state != SSAM_CONTROLLER_STARTED) { >> + ssam_controller_stateunlock(c); >> + return -ENXIO; >> + } >> + >> + ctrldev = ssam_controller_device(c); >> + if (!ctrldev) { >> + ssam_controller_stateunlock(c); >> + return -ENXIO; >> + } >> + > > I'm not sure if ENXIO is the best errno in the last two returns; I guess I can return ENODEV. >> + link = device_link_add(client, ctrldev, flags); >> + if (!link) { >> + ssam_controller_stateunlock(c); >> + return -ENOMEM; >> + } >> + >> + /* >> + * Return -ENXIO if supplier driver is on its way to be removed. In this >> + * case, the controller won't be around for much longer and the device >> + * link is not going to save us any more, as unbinding is already in >> + * progress. >> + */ >> + if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) { >> + ssam_controller_stateunlock(c); >> + return -ENXIO; >> + } >> + >> + ssam_controller_stateunlock(c); >> + return 0; >> +} > >> [...] > >> +int ssam_client_bind(struct device *client, struct ssam_controller **ctrl) >> +{ >> + struct ssam_controller *c; >> + int status; >> + >> + c = ssam_get_controller(); >> + if (!c) >> + return -ENXIO; > > To me, ENODEV seems like a better fit here. Okay, I'll change it as well as above. >> + >> + status = ssam_client_link(c, client); > >> [...] > > >> +static ssize_t firmware_version_show(struct device *dev, >> + struct device_attribute *attr, char *buf) >> +{ >> + struct ssam_controller *ctrl = dev_get_drvdata(dev); >> + u32 version, a, b, c; >> + int status; >> + >> + status = ssam_get_firmware_version(ctrl, &version); >> + if (status < 0) >> + return status; >> + >> + a = (version >> 24) & 0xff; >> + b = ((version >> 8) & 0xffff); >> + c = version & 0xff; >> + >> + return snprintf(buf, PAGE_SIZE - 1, "%u.%u.%u\n", a, b, c); > > `snprintf()` takes care of the null byte, so simply `PAGE_SIZE` would've been > sufficient. But that doesn't matter much since you should use `sysfs_emit()`. I didn't know about sysfs_emit(), I'll switch to that then. Also you're right about that "- 1", not sure where I got that from. >> +} >> +static DEVICE_ATTR_RO(firmware_version); >> + >> +static struct attribute *ssam_sam_attrs[] = { >> + &dev_attr_firmware_version.attr, >> + NULL, > ^ > I believe it is preferred to omit the comma after the terminating entry. Right. >> [...] > >> +/** >> + * msgb_push_cmd() - Push a SSH command frame with payload to the buffer. >> + * @msgb: The message buffer. >> + * @seq: The sequence ID (SEQ) of the frame/packet. >> + * @rqid: The request ID (RQID) of the request contained in the frame. >> + * @rqst: The request to wrap in the frame. >> + */ >> +static inline void msgb_push_cmd(struct msgbuf *msgb, u8 seq, u16 rqid, >> + const struct ssam_request *rqst) >> +{ >> + struct ssh_command *cmd; >> + const u8 *cmd_begin; >> + const u8 type = SSH_FRAME_TYPE_DATA_SEQ; >> + >> + // SYN >> + msgb_push_syn(msgb); >> + >> + // command frame + crc >> + msgb_push_frame(msgb, type, sizeof(*cmd) + rqst->length, seq); >> + >> + // frame payload: command struct + payload >> + if (WARN_ON(msgb->ptr + sizeof(*cmd) > msgb->end)) >> + return; >> + >> + cmd_begin = msgb->ptr; >> + cmd = (struct ssh_command *)msgb->ptr; > > I believe this violates strict aliasing. It does. The same happens in msgb_push_frame(), too. The kernel is built with -fno-strict-aliasing though if I remember correctly. I guess I can try to avoid that in those two functions with put_unaligned and offsetof to compute the pointers. >> [...] > >> + * Note that the packet completion callback is, in case of success and for a >> + * sequenced packet, guaranteed to run on the receiver thread, thus providing >> + * a way to reliably identify responses to the packet. The packet completion >> + * callback is only run once and it does not indicate that the packet has >> + * fully left the system (for this, one should rely on the release method, >> + * triggered when the reference count of the packet reaches zero). In case of >> + * re-submission (and with somewhat unlikely timing), it may be possible that >> + * the packet is being re-transmitted while the completion callback runs. >> + * Completion will occur both on success and internal error, as well as when >> + * the packet is canceled. > > If I understand it correctly, it is possible that submission of a packet fails > for the first time, but it's scheduled for resubmission, and this retransmission > happens at the same time when the complete() callback is called. If that's the > case, then the callback is called with an error condition, no? Thus it is possible > that a packet is successfully submitted (for the second, third, etc. time), but the > complete() callback receives notification about failure? Or am I missing something? Not sure I can follow you here. What do you mean by "fails for the first time"? A couple of notes before I try to explain the error cases: - Resubmission only happens on timeout or NAK. - Only sequenced packets can be resubmitted (unsequenced are not added to the pending set, thus do not have a packet timeout and do not react to NAKs). - Completion means the packet gets locked, which prevents it from being resubmitted and retransmitted, and is removed from all collections. - Completion cannot be triggered by NAK. In case a NAK is received and all tries have been exceeded, the packet still waits for the timeout (or ACK, whatever happens first) one last time. The error cases: - serdev: Failure to send via serdev results in immediate completion with error on the transmitter thread. Due to a NAK (or a _very_ unlikely timeout), the packet can be queued for a second time at that point, but, as completion runs on the transmitter thread in this case, the packet is removed from the queue before the thread goes on to get the next packet. - Timeout: On resubmission, the timeout of a packet is disabled. It is (re-)armed directly before starting transmission on the transmitter thread. What _could_ now happen is that, for some reason, the transmitter thread waits the full timeout period between arming the timeout and actually sending it (which I'd consider highly unlikely). Note that "actually sending it" includes whatever asynchronous stuff the serdev core does after serdev_device_write_buf() returns. In that case, the packet can be transmitted _after_ it has received a timeout. So assuming this is the last try, then the packet can be sent after completion with -ETIMEDOUT. I believe this is what you meant? Again, I consider this highly unlikely as this would require either the scheduler to pause our thread for a full timeout period or require the serdev core to basically do nothing for a full timeout period. I'd argue this is a lot less likely than a dropped ACK packet from the EC (which results in the same thing: EC receives data but we think it didn't, causing a timeout error). Note that putting the timeout (re-)arming step after the last call to serdev_device_write_buf() doesn't really help either, as that only writes to the transport buffer. If we would want to guarantee that this doesn't happen, we'd have to explicitly wait for the hardware to finish transmitting, which essentially breaks the whole point of buffering. Also I prefer to have the timeout cover the transmission part as well. In the very least, this simplifies reasoning about it once I've restricted it to be (re-)set under the pending lock only (as mentioned in the message to Andy). - Finally, there is completion by cancellation or shutdown, for which the packet can have been (or currently be) transmitted successfully but completion returns with -ECANCELLED or -ESHUTDOWN, respectively. This is the expected behavior in those cases. Again, the completion procedure takes care of locking the packet and removing all references from the collections. Also there's one success case, when we receive an ACK after a timeout and during the re-transmission. Then the packet is completed with success, but might still be transmitted once more. The EC should detect that packet as re-submission via its packet ID and ignore it. >> [...] > >> +int ssh_ptl_rx_rcvbuf(struct ssh_ptl *ptl, const u8 *buf, size_t n) >> +{ >> + int used; >> + >> + if (test_bit(SSH_PTL_SF_SHUTDOWN_BIT, &ptl->state)) >> + return -ESHUTDOWN; >> + >> + used = kfifo_in(&ptl->rx.fifo, buf, n); > > Isn't it possible that `n` is greater than the free space in the fifo? What > happens then? If I've read the documentation of kfifo_in() right, it takes at most n elements. If n is greater than the free space, it will only take as much elements as it has space. The return value of kfifo_in() is the number of elements actually consumed. We simply return that here (as elements = bytes) and ultimately let the serdev/tty core take care of handling that. >> + if (used) >> + ssh_ptl_rx_wakeup(ptl); >> + >> + return used; >> +} [...] >> +#define __ssam_prcond(func, p, fmt, ...) \ >> + do { \ >> + if ((p)) \ > > I believe `if (p)` is sufficient. Right. >> + func((p), fmt, ##__VA_ARGS__); \ >> + } while (0) > >> [...] > >> +int sshp_parse_frame(const struct device *dev, const struct ssam_span *source, >> + struct ssh_frame **frame, struct ssam_span *payload, >> + size_t maxlen) >> +{ >> [...] >> + // ensure packet does not exceed maximum length >> + sp.len = get_unaligned_le16(&((struct ssh_frame *)sf.ptr)->len); >> + if (unlikely(sp.len + SSH_MESSAGE_LENGTH(0) > maxlen)) { >> + dev_warn(dev, "rx: parser: frame too large: %u bytes\n", > > I believe `%hu` would be more appropriate. Right. >> + ((struct ssh_frame *)sf.ptr)->len); > > Why isn't `get_unaligned_le16()` used here? (Or simply even `sp.len`.) This should actually be "sp.len + SSH_MESSAGE_LENGTH(0)", and also format spezifier %zu with that as SSH_MESSAGE_LENGTH returns size_t. >> + return -EMSGSIZE; >> + } >> [...] >> + *frame = (struct ssh_frame *)sf.ptr; > > This also violates strict aliasing. Sure, the whole point of this function (and ssh_parse_command below) is to get pointers to the data structures in the given buffer, i.e. create aliases of the right types to the right places in the buffer. Not sure how I'd do that differently, apart from copying. I think that this would even work properly (in practice) without setting -fno-strict-aliasing, as all data at this point is effectively read-only. Furthermore, after the call access is restricted to the non-aliasing parts respectively (frame and payload), whereas the underlying buffer they do alias isn't accessed for the rest of that scope (i.e. until the other references don't exist any more). Anyway, as far as I know the kernel is built with -fno-strict-aliasing. Please correct me if I'm wrong or relying on aliasing is frowned upon. >> + *payload = sp; >> + >> + dev_dbg(dev, "rx: parser: valid frame found (type: 0x%02x, len: %u)\n", >> + (*frame)->type, (*frame)->len); >> + >> + return 0; >> +} > >> [...] > >> +int sshp_parse_command(const struct device *dev, const struct ssam_span *source, >> + struct ssh_command **command, >> + struct ssam_span *command_data) >> +{ >> + // check for minimum length >> + if (unlikely(source->len < sizeof(struct ssh_command))) { >> + *command = NULL; >> + command_data->ptr = NULL; >> + command_data->len = 0; >> + >> + dev_err(dev, "rx: parser: command payload is too short\n"); >> + return -ENOMSG; >> + } >> + >> + *command = (struct ssh_command *)source->ptr; > > I'm quite sure this also violates strict aliasing. It does, see above. >> + command_data->ptr = source->ptr + sizeof(struct ssh_command); >> + command_data->len = source->len - sizeof(struct ssh_command); >> + >> + dev_dbg(dev, "rx: parser: valid command found (tc: 0x%02x, cid: 0x%02x)\n", >> + (*command)->tc, (*command)->cid); >> + >> + return 0; >> +} > >> [...] > >> +#define SSH_MSGOFFSET_FRAME(field) \ >> + (sizeof(u16) + offsetof(struct ssh_frame, field)) >> + >> +/** >> + * SSH_MSGOFFSET_COMMAND() - Compute offset in SSH message to specified field >> + * in command. >> + * @field: The field for which the offset should be computed. >> + * >> + * Return: Returns the offset of the specified &struct ssh_command field in >> + * the raw SSH message data. >> + */ >> +#define SSH_MSGOFFSET_COMMAND(field) \ >> + (2ull * sizeof(u16) + sizeof(struct ssh_frame) \ >> + + offsetof(struct ssh_command, field)) > > I believe it should be noted (here and for `SSH_MSGOFFSET_FRAME()`) why the > `sizeof(u16)`s are necessary. Right, I can add an explanation for this. One is for the SYN bytes, the other one for the frame CRC (between frame and command). >> + >> +/* >> + * SSH_MSG_SYN - SSH message synchronization (SYN) bytes as u16. >> + */ >> +#define SSH_MSG_SYN ((u16)0x55aa) > >> [...] > >> +enum ssam_ssh_tc { >> + /* Known SSH/EC target categories. */ >> + // category 0x00 is invalid for EC use >> + SSAM_SSH_TC_SAM = 0x01, // generic system functionality, real-time clock >> + SSAM_SSH_TC_BAT = 0x02, // battery/power subsystem >> + SSAM_SSH_TC_TMP = 0x03, // thermal subsystem >> + SSAM_SSH_TC_PMC = 0x04, >> + SSAM_SSH_TC_FAN = 0x05, >> + SSAM_SSH_TC_PoM = 0x06, >> + SSAM_SSH_TC_DBG = 0x07, >> + SSAM_SSH_TC_KBD = 0x08, // legacy keyboard (Laptop 1/2) >> + SSAM_SSH_TC_FWU = 0x09, >> + SSAM_SSH_TC_UNI = 0x0a, >> + SSAM_SSH_TC_LPC = 0x0b, >> + SSAM_SSH_TC_TCL = 0x0c, >> + SSAM_SSH_TC_SFL = 0x0d, >> + SSAM_SSH_TC_KIP = 0x0e, >> + SSAM_SSH_TC_EXT = 0x0f, >> + SSAM_SSH_TC_BLD = 0x10, >> + SSAM_SSH_TC_BAS = 0x11, // detachment system (Surface Book 2/3) >> + SSAM_SSH_TC_SEN = 0x12, >> + SSAM_SSH_TC_SRQ = 0x13, >> + SSAM_SSH_TC_MCU = 0x14, >> + SSAM_SSH_TC_HID = 0x15, // generic HID input subsystem >> + SSAM_SSH_TC_TCH = 0x16, >> + SSAM_SSH_TC_BKL = 0x17, >> + SSAM_SSH_TC_TAM = 0x18, >> + SSAM_SSH_TC_ACC = 0x19, >> + SSAM_SSH_TC_UFI = 0x1a, >> + SSAM_SSH_TC_USC = 0x1b, >> + SSAM_SSH_TC_PEN = 0x1c, >> + SSAM_SSH_TC_VID = 0x1d, >> + SSAM_SSH_TC_AUD = 0x1e, >> + SSAM_SSH_TC_SMC = 0x1f, >> + SSAM_SSH_TC_KPD = 0x20, >> + SSAM_SSH_TC_REG = 0x21, >> +}; > > Is it known what these abbreviations stand for? Maybe I missed them? The comments state all we really know about these (through observation and experimentation). The table itself has been extracted from the Windows driver, but the abbreviations and values are all we're getting from it. I could go ahead and guess for a couple of them but for most of them we haven't really seen any request at all, let alone know what the request does. For those that we now, the abbreviations should be fairly easy to guess based on the comments, e.g. BAT -> battery, TMP -> temperature. I'm not sure about BAS, which could mean base (as that's the detachment system allowing to separate the top part of the Surface Books from the base). REG is currently missing a comment because I'm not entirely sure if it's just for registring (enabling/disabling) events or if it also has any other purpose. I'll add a comment for it though, then we have comments on all that are somewhat known. >> [...] > >> +/** >> + * struct ssh_request_ops - Callback operations for a SSH request. >> + * @release: Function called when the request's reference count reaches zero. >> + * This callback must be relied upon to ensure that the request has >> + * left the transport systems (both, packet an request systems). >> + * @complete: Function called when the request is completed, either with >> + * success or failure. The command data for the request response >> + * is provided via the &struct ssh_command parameter (``cmd``), >> + * the command payload of the request response via the &struct >> + * ssh_span parameter (``data``). >> + * >> + * If the request does not have any response or has not been >> + * completed with success, both ``cmd`` and ``data`` parameters will >> + * be NULL. If the request response does not have any command >> + * payload, the ``data`` span will be an empty (zero-length) span. >> + * >> + * In case of failure, the reason for the failure is indicated by >> + * the value of the provided status code argument (``status``). This >> + * value will be zero in case of success. > > I believe it should be noted if the `status` argument is a regular errno, > or something different. Right, I'll add that. >> + * >> + * Note that a call to this callback does not guarantee that the >> + * request is not in use by the transport systems any more. >> + */ >> +struct ssh_request_ops { >> + void (*release)(struct ssh_request *rqst); >> + void (*complete)(struct ssh_request *rqst, >> + const struct ssh_command *cmd, >> + const struct ssam_span *data, int status); >> +}; >> [...] > > > I have to agree, this is quite a sizable patch, although I think it's well-commented, > which helps reading the code by a lot, however, in some places I feel like it's a bit > over-engineered (or maybe I just cannot fully appreciate the subject at hand at the moment), > nonetheless, I applaud your efforts, I can only imagine the hours that must have gone into it. I think much of what's probably considered over-engineered is the result of reworking the core after that had some reliability issues (dropped packets and timeouts). Specifically, the introduction of dedicated transmitter and receiver threads combined with the need to properly handle NAKs has lead to the current asynchronous system. I have probably gone a bit too far towards generifying that, but I belive that this should make it easier to adapt for protocol additions or errors in the future. Other parts, like the RB-tree for event reference counting seem, unfortunately, to be necessary. Events are enabled by registry (specifying which command to use to enable/disable the event) and ID (which event to enable/disable on the registry). So without knowing which registry and ID pairs are valid, we'll have to use some sort of map. Could have been a simple list, but I think an RB-tree makes a bit more sense here. Thank you for your review! Regards, Max
Hi 2020. november 19., csütörtök 0:06 keltezéssel, Maximilian Luz <luzmaximilian@gmail.com> írta: > [...] > >> +/** > >> + * ssam_nf_head_destroy() - Deinitialize the given notifier head. > >> + * @nh: The notifier head to deinitialize. > >> + */ > >> +static void ssam_nf_head_destroy(struct ssam_nf_head *nh) > >> +{ > >> + cleanup_srcu_struct(&nh->srcu); > >> +} > > > > I'm also wondering if there's any reason why these static one-liner functions are > > not explicitly marked inline. > > Isn't inline more of a linkage keyword at this point? I fully expect any > compiler to inline things like this automatically at the first hint of > an optimization flag. > I also expect the compiler to inline such functions even without the `inline` specifier. In retrospect I'm not sure why I actually made this comment, possibly because of my preference, but I believe it's fine either way, so my comment is moot, sorry. > [...] > >> + * error values via ssam_notifier_from_errno() to notifier values. > >> + * > >> + * Also note that any callback that could handle an event should return a value > >> + * with bit %SSAM_NOTIF_HANDLED set, indicating that the event does not go > >> + * unhandled/ignored. In case no registered callback could handle an event, > >> + * this function will emit a warning. > >> + * > >> + * In case a callback failed, this function will emit an error message. > >> + */ > >> +static void ssam_nf_call(struct ssam_nf *nf, struct device *dev, u16 rqid, > >> + struct ssam_event *event) > >> +{ > >> + struct ssam_nf_head *nf_head; > >> + int status, nf_ret; > >> + > >> + if (!ssh_rqid_is_event(rqid)) { > >> + dev_warn(dev, "event: unsupported rqid: 0x%04x\n", rqid); > > > > A small note, "%#04x" would insert the "0x" prefix. > > Oh neat, didn't know that. Should be "%#06x" though, right? > Yes, indeed, sorry, it should be "%#06x". > [...] > >> +static int ssam_controller_caps_load_from_acpi( > >> + acpi_handle handle, struct ssam_controller_caps *caps) > >> +{ > >> + u32 d3_closes_handle = false; > > > > Assinging a boolean like this to a `u32` looks very odd to me. > > The value returned by the corresponding DSM call is an integer, but > essentially only contains a bool value (zero vs. one). I thought using > false here explicitly for initialization helps clarify that. That also > makes it consistent with the "caps->d3_closes_handle = false" line > below. > Although I still find it pretty odd, your explanation makes sense to me. > >> +int ssam_request_sync_alloc(size_t payload_len, gfp_t flags, > >> + struct ssam_request_sync **rqst, > >> + struct ssam_span *buffer) > >> +{ > >> + size_t msglen = SSH_COMMAND_MESSAGE_LENGTH(payload_len); > >> + > >> + *rqst = kzalloc(sizeof(**rqst) + msglen, flags); > >> + if (!*rqst) > >> + return -ENOMEM; > >> + > >> + buffer->ptr = (u8 *)(*rqst + 1); > >> + buffer->len = msglen; > >> + > >> + return 0; > >> +} > > > > I think there is a bit of incosistency: sometimes you use ** pointer + return int, > > sometimes you return a pointer with potentially embedded errno. I think it would > > be better if you stuck with one or the other. > > My rule of thumb is: If it's one output parameter, use a pointer as > return value. If it's two or more output parameters (as in this case, > "buffer" is written to), use two pointer arguments and an int as return. > > I noticed that ssam_client_bind doesn't follow that scheme so I'll > change that. > I see, thanks for the explanation, what prompted me to make this comment was that, as you noted, there were functions with one output parameter that were not consistent. > [...] > >> + * Note that the packet completion callback is, in case of success and for a > >> + * sequenced packet, guaranteed to run on the receiver thread, thus providing > >> + * a way to reliably identify responses to the packet. The packet completion > >> + * callback is only run once and it does not indicate that the packet has > >> + * fully left the system (for this, one should rely on the release method, > >> + * triggered when the reference count of the packet reaches zero). In case of > >> + * re-submission (and with somewhat unlikely timing), it may be possible that > >> + * the packet is being re-transmitted while the completion callback runs. > >> + * Completion will occur both on success and internal error, as well as when > >> + * the packet is canceled. > > > > If I understand it correctly, it is possible that submission of a packet fails > > for the first time, but it's scheduled for resubmission, and this retransmission > > happens at the same time when the complete() callback is called. If that's the > > case, then the callback is called with an error condition, no? Thus it is possible > > that a packet is successfully submitted (for the second, third, etc. time), but the > > complete() callback receives notification about failure? Or am I missing something? > > Not sure I can follow you here. What do you mean by "fails for the first time"? > > A couple of notes before I try to explain the error cases: > > - Resubmission only happens on timeout or NAK. > > - Only sequenced packets can be resubmitted (unsequenced are not added > to the pending set, thus do not have a packet timeout and do not > react to NAKs). > > - Completion means the packet gets locked, which prevents it from being > resubmitted and retransmitted, and is removed from all collections. > > - Completion cannot be triggered by NAK. In case a NAK is received and > all tries have been exceeded, the packet still waits for the timeout > (or ACK, whatever happens first) one last time. > > The error cases: > > - serdev: Failure to send via serdev results in immediate completion > with error on the transmitter thread. Due to a NAK (or a _very_ > unlikely timeout), the packet can be queued for a second time at that > point, but, as completion runs on the transmitter thread in this > case, the packet is removed from the queue before the thread goes on > to get the next packet. > > - Timeout: On resubmission, the timeout of a packet is disabled. It is > (re-)armed directly before starting transmission on the transmitter > thread. > > What _could_ now happen is that, for some reason, the transmitter > thread waits the full timeout period between arming the timeout and > actually sending it (which I'd consider highly unlikely). Note that > "actually sending it" includes whatever asynchronous stuff the serdev > core does after serdev_device_write_buf() returns. > > In that case, the packet can be transmitted _after_ it has received a > timeout. So assuming this is the last try, then the packet can be > sent after completion with -ETIMEDOUT. > > I believe this is what you meant? > Yes, that's what I meant, thanks for the clarification and broader explanation about the things at play here. > Again, I consider this highly unlikely as this would require either > the scheduler to pause our thread for a full timeout period or > require the serdev core to basically do nothing for a full timeout > period. I'd argue this is a lot less likely than a dropped ACK packet > from the EC (which results in the same thing: EC receives data but we > think it didn't, causing a timeout error). > > Note that putting the timeout (re-)arming step after the last call to > serdev_device_write_buf() doesn't really help either, as that only > writes to the transport buffer. If we would want to guarantee that > this doesn't happen, we'd have to explicitly wait for the hardware to > finish transmitting, which essentially breaks the whole point of > buffering. > > Also I prefer to have the timeout cover the transmission part as > well. In the very least, this simplifies reasoning about it once I've > restricted it to be (re-)set under the pending lock only (as > mentioned in the message to Andy). > > - Finally, there is completion by cancellation or shutdown, for which > the packet can have been (or currently be) transmitted successfully > but completion returns with -ECANCELLED or -ESHUTDOWN, respectively. > This is the expected behavior in those cases. Again, the completion > procedure takes care of locking the packet and removing all > references from the collections. > > Also there's one success case, when we receive an ACK after a timeout > and during the re-transmission. Then the packet is completed with > success, but might still be transmitted once more. The EC should detect > that packet as re-submission via its packet ID and ignore it. > > >> [...] > > > >> +int ssh_ptl_rx_rcvbuf(struct ssh_ptl *ptl, const u8 *buf, size_t n) > >> +{ > >> + int used; > >> + > >> + if (test_bit(SSH_PTL_SF_SHUTDOWN_BIT, &ptl->state)) > >> + return -ESHUTDOWN; > >> + > >> + used = kfifo_in(&ptl->rx.fifo, buf, n); > > > > Isn't it possible that `n` is greater than the free space in the fifo? What > > happens then? > > If I've read the documentation of kfifo_in() right, it takes at most n > elements. If n is greater than the free space, it will only take as much > elements as it has space. The return value of kfifo_in() is the number > of elements actually consumed. We simply return that here (as elements = > bytes) and ultimately let the serdev/tty core take care of handling > that. > Ahh, sorry, you're right. > >> + if (used) > >> + ssh_ptl_rx_wakeup(ptl); > >> + > >> + return used; > >> +} > [...] > >> + ((struct ssh_frame *)sf.ptr)->len); > > > > Why isn't `get_unaligned_le16()` used here? (Or simply even `sp.len`.) > > This should actually be "sp.len + SSH_MESSAGE_LENGTH(0)", and also > format spezifier %zu with that as SSH_MESSAGE_LENGTH returns size_t. > Yes, indeed, sorry, I missed the `+ SSH_MESSAGE_LENGTH(0)` part. > >> + return -EMSGSIZE; > >> + } > >> [...] > >> + *frame = (struct ssh_frame *)sf.ptr; > > > > This also violates strict aliasing. > > Sure, the whole point of this function (and ssh_parse_command below) is > to get pointers to the data structures in the given buffer, i.e. create > aliases of the right types to the right places in the buffer. Not sure > how I'd do that differently, apart from copying. > > I think that this would even work properly (in practice) without setting > -fno-strict-aliasing, as all data at this point is effectively > read-only. Furthermore, after the call access is restricted to the > non-aliasing parts respectively (frame and payload), whereas the > underlying buffer they do alias isn't accessed for the rest of that > scope (i.e. until the other references don't exist any more). > > Anyway, as far as I know the kernel is built with -fno-strict-aliasing. > Please correct me if I'm wrong or relying on aliasing is frowned upon. > I don't believe it could cause issue here, I just wanted to note it. > [...] > >> +enum ssam_ssh_tc { > >> + /* Known SSH/EC target categories. */ > >> + // category 0x00 is invalid for EC use > >> + SSAM_SSH_TC_SAM = 0x01, // generic system functionality, real-time clock > >> + SSAM_SSH_TC_BAT = 0x02, // battery/power subsystem > >> + SSAM_SSH_TC_TMP = 0x03, // thermal subsystem > >> + SSAM_SSH_TC_PMC = 0x04, > >> + SSAM_SSH_TC_FAN = 0x05, > >> + SSAM_SSH_TC_PoM = 0x06, > >> + SSAM_SSH_TC_DBG = 0x07, > >> + SSAM_SSH_TC_KBD = 0x08, // legacy keyboard (Laptop 1/2) > >> + SSAM_SSH_TC_FWU = 0x09, > >> + SSAM_SSH_TC_UNI = 0x0a, > >> + SSAM_SSH_TC_LPC = 0x0b, > >> + SSAM_SSH_TC_TCL = 0x0c, > >> + SSAM_SSH_TC_SFL = 0x0d, > >> + SSAM_SSH_TC_KIP = 0x0e, > >> + SSAM_SSH_TC_EXT = 0x0f, > >> + SSAM_SSH_TC_BLD = 0x10, > >> + SSAM_SSH_TC_BAS = 0x11, // detachment system (Surface Book 2/3) > >> + SSAM_SSH_TC_SEN = 0x12, > >> + SSAM_SSH_TC_SRQ = 0x13, > >> + SSAM_SSH_TC_MCU = 0x14, > >> + SSAM_SSH_TC_HID = 0x15, // generic HID input subsystem > >> + SSAM_SSH_TC_TCH = 0x16, > >> + SSAM_SSH_TC_BKL = 0x17, > >> + SSAM_SSH_TC_TAM = 0x18, > >> + SSAM_SSH_TC_ACC = 0x19, > >> + SSAM_SSH_TC_UFI = 0x1a, > >> + SSAM_SSH_TC_USC = 0x1b, > >> + SSAM_SSH_TC_PEN = 0x1c, > >> + SSAM_SSH_TC_VID = 0x1d, > >> + SSAM_SSH_TC_AUD = 0x1e, > >> + SSAM_SSH_TC_SMC = 0x1f, > >> + SSAM_SSH_TC_KPD = 0x20, > >> + SSAM_SSH_TC_REG = 0x21, > >> +}; > > > > Is it known what these abbreviations stand for? Maybe I missed them? > > The comments state all we really know about these (through observation > and experimentation). The table itself has been extracted from the > Windows driver, but the abbreviations and values are all we're getting > from it. I see, thanks for the clarification. For some reason, I believed the "Known SSH/EC target categories" comment means that those are known, as in, it is known what they are for, etc., not just the abbreviation. > [...] Regards, Barnabás Pőcze
On 11/19/20 4:54 PM, Barnabás Pőcze wrote: > Hi [...] >>>> +enum ssam_ssh_tc { >>>> + /* Known SSH/EC target categories. */ >>>> + // category 0x00 is invalid for EC use >>>> + SSAM_SSH_TC_SAM = 0x01, // generic system functionality, real-time clock >>>> + SSAM_SSH_TC_BAT = 0x02, // battery/power subsystem >>>> + SSAM_SSH_TC_TMP = 0x03, // thermal subsystem >>>> + SSAM_SSH_TC_PMC = 0x04, >>>> + SSAM_SSH_TC_FAN = 0x05, >>>> + SSAM_SSH_TC_PoM = 0x06, >>>> + SSAM_SSH_TC_DBG = 0x07, >>>> + SSAM_SSH_TC_KBD = 0x08, // legacy keyboard (Laptop 1/2) >>>> + SSAM_SSH_TC_FWU = 0x09, >>>> + SSAM_SSH_TC_UNI = 0x0a, >>>> + SSAM_SSH_TC_LPC = 0x0b, >>>> + SSAM_SSH_TC_TCL = 0x0c, >>>> + SSAM_SSH_TC_SFL = 0x0d, >>>> + SSAM_SSH_TC_KIP = 0x0e, >>>> + SSAM_SSH_TC_EXT = 0x0f, >>>> + SSAM_SSH_TC_BLD = 0x10, >>>> + SSAM_SSH_TC_BAS = 0x11, // detachment system (Surface Book 2/3) >>>> + SSAM_SSH_TC_SEN = 0x12, >>>> + SSAM_SSH_TC_SRQ = 0x13, >>>> + SSAM_SSH_TC_MCU = 0x14, >>>> + SSAM_SSH_TC_HID = 0x15, // generic HID input subsystem >>>> + SSAM_SSH_TC_TCH = 0x16, >>>> + SSAM_SSH_TC_BKL = 0x17, >>>> + SSAM_SSH_TC_TAM = 0x18, >>>> + SSAM_SSH_TC_ACC = 0x19, >>>> + SSAM_SSH_TC_UFI = 0x1a, >>>> + SSAM_SSH_TC_USC = 0x1b, >>>> + SSAM_SSH_TC_PEN = 0x1c, >>>> + SSAM_SSH_TC_VID = 0x1d, >>>> + SSAM_SSH_TC_AUD = 0x1e, >>>> + SSAM_SSH_TC_SMC = 0x1f, >>>> + SSAM_SSH_TC_KPD = 0x20, >>>> + SSAM_SSH_TC_REG = 0x21, >>>> +}; >>> >>> Is it known what these abbreviations stand for? Maybe I missed them? >> >> The comments state all we really know about these (through observation >> and experimentation). The table itself has been extracted from the >> Windows driver, but the abbreviations and values are all we're getting >> from it. > > I see, thanks for the clarification. For some reason, I believed the > "Known SSH/EC target categories" comment means that those are known, as in, > it is known what they are for, etc., not just the abbreviation. Right, that can be misunderstood, sorry. It's probably best if I add a comment explaining that in a bit more detail and noting where those values and abbreviations come from. Thanks, Max
Hi, On 11/15/20 8:21 PM, Maximilian Luz wrote: > Hello, > > N.B.: the following text is mostly a repeat of cover letter from the > previous RFC for the uninitiated, which can be found at > > https://lore.kernel.org/linux-serial/20200923151511.3842150-1-luzmaximilian@gmail.com/ > > See "Changes" below for an overview of differences between the RFC and > this patchset. I hope I have addressed all comments from that in this > version, thank you again for those. > > The Surface System Aggregator Module (we'll refer to it as Surface > Aggregator or SAM below) is an embedded controller (EC) found on various > Microsoft Surface devices. Specifically, all 4th and later generation > Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the > exception of the Surface Go series and the Surface Duo. Notably, it > seems like this EC can also be found on the ARM-based Surface Pro X [1]. <snip> > This patch-set can also be found at the following repository and > reference, if you prefer to look at a kernel tree instead of these > emails: > > https://github.com/linux-surface/kernel tags/s/surface-aggregator/v1 > > Thanks, > Max Thank you for your work on this. It would be great if we can get better support for the Surface line in the mainline kernel. Since a lot of people have already commented on this series I think that you have enough feedback to do a v2 addressing that feedback right? For now I'm going to assume that you will do a v2 addressing the initial round of comments and not review this myself (IOW I'll review this when v2 is posted). Let me know if you see things differently. Regards, Hans > [1]: The Surface Pro X is, however, currently considered unsupported due > to a lack of test candidates and, as it seems, general lack of > Linux support on other parts. AFAIK there is an issue preventing > serial devices from being registered, on which the core driver in > this series is build on, thus there is no way to even test that at > this point. I'd be happy to work out any issues regarding SAM on > the Pro X at some point in the future, provided someone can/wants > to actually test it. > > [2]: https://github.com/linux-surface/surface-aggregator-module > [3]: https://github.com/linux-surface/linux-surface > > > Note: This patch depends on > > [PATCH v4] platform/surface: Create a platform subdirectory for > Microsoft Surface devices > > which can be found at > > https://lore.kernel.org/platform-driver-x86/20201009141128.683254-1-luzmaximilian@gmail.com/ > > and is currently in platform-drivers-x86/for-next. > > > Changes from the previous RFC (overview): > - move to platform/surface > - add copyright lines > - change SPDX identifier to GPL-2.0+ (was GPL-2.0-or-later) > - change user-space interface from debugfs to misc-device > - address issues in user-space interface > - fix typos in commit messages and documentation > - fix some bugs, address other issues > > Changes regarding specific patches (and more details) can be found on > the individual patch. > > > Maximilian Luz (9): > platform/surface: Add Surface Aggregator subsystem > platform/surface: aggregator: Add control packet allocation caching > platform/surface: aggregator: Add event item allocation caching > platform/surface: aggregator: Add trace points > platform/surface: aggregator: Add error injection capabilities > platform/surface: aggregator: Add dedicated bus and device type > docs: driver-api: Add Surface Aggregator subsystem documentation > platform/surface: Add Surface Aggregator user-space interface > platform/surface: Add Surface ACPI Notify driver > > Documentation/driver-api/index.rst | 1 + > .../surface_aggregator/client-api.rst | 38 + > .../driver-api/surface_aggregator/client.rst | 394 +++ > .../surface_aggregator/clients/cdev.rst | 85 + > .../surface_aggregator/clients/index.rst | 21 + > .../surface_aggregator/clients/san.rst | 44 + > .../driver-api/surface_aggregator/index.rst | 21 + > .../surface_aggregator/internal-api.rst | 67 + > .../surface_aggregator/internal.rst | 50 + > .../surface_aggregator/overview.rst | 76 + > .../driver-api/surface_aggregator/ssh.rst | 343 +++ > MAINTAINERS | 13 + > drivers/platform/surface/Kconfig | 39 + > drivers/platform/surface/Makefile | 3 + > drivers/platform/surface/aggregator/Kconfig | 65 + > drivers/platform/surface/aggregator/Makefile | 17 + > drivers/platform/surface/aggregator/bus.c | 424 +++ > drivers/platform/surface/aggregator/bus.h | 27 + > .../platform/surface/aggregator/controller.c | 2557 +++++++++++++++++ > .../platform/surface/aggregator/controller.h | 288 ++ > drivers/platform/surface/aggregator/core.c | 831 ++++++ > .../platform/surface/aggregator/ssh_msgb.h | 201 ++ > .../surface/aggregator/ssh_packet_layer.c | 2009 +++++++++++++ > .../surface/aggregator/ssh_packet_layer.h | 175 ++ > .../platform/surface/aggregator/ssh_parser.c | 229 ++ > .../platform/surface/aggregator/ssh_parser.h | 157 + > .../surface/aggregator/ssh_request_layer.c | 1254 ++++++++ > .../surface/aggregator/ssh_request_layer.h | 142 + > drivers/platform/surface/aggregator/trace.h | 625 ++++ > .../platform/surface/surface_acpi_notify.c | 884 ++++++ > .../surface/surface_aggregator_cdev.c | 299 ++ > include/linux/mod_devicetable.h | 18 + > include/linux/surface_acpi_notify.h | 39 + > include/linux/surface_aggregator/controller.h | 832 ++++++ > include/linux/surface_aggregator/device.h | 430 +++ > include/linux/surface_aggregator/serial_hub.h | 659 +++++ > include/uapi/linux/surface_aggregator/cdev.h | 58 + > scripts/mod/devicetable-offsets.c | 8 + > scripts/mod/file2alias.c | 23 + > 39 files changed, 13446 insertions(+) > create mode 100644 Documentation/driver-api/surface_aggregator/client-api.rst > create mode 100644 Documentation/driver-api/surface_aggregator/client.rst > create mode 100644 Documentation/driver-api/surface_aggregator/clients/cdev.rst > create mode 100644 Documentation/driver-api/surface_aggregator/clients/index.rst > create mode 100644 Documentation/driver-api/surface_aggregator/clients/san.rst > create mode 100644 Documentation/driver-api/surface_aggregator/index.rst > create mode 100644 Documentation/driver-api/surface_aggregator/internal-api.rst > create mode 100644 Documentation/driver-api/surface_aggregator/internal.rst > create mode 100644 Documentation/driver-api/surface_aggregator/overview.rst > create mode 100644 Documentation/driver-api/surface_aggregator/ssh.rst > create mode 100644 drivers/platform/surface/aggregator/Kconfig > create mode 100644 drivers/platform/surface/aggregator/Makefile > create mode 100644 drivers/platform/surface/aggregator/bus.c > create mode 100644 drivers/platform/surface/aggregator/bus.h > create mode 100644 drivers/platform/surface/aggregator/controller.c > create mode 100644 drivers/platform/surface/aggregator/controller.h > create mode 100644 drivers/platform/surface/aggregator/core.c > create mode 100644 drivers/platform/surface/aggregator/ssh_msgb.h > create mode 100644 drivers/platform/surface/aggregator/ssh_packet_layer.c > create mode 100644 drivers/platform/surface/aggregator/ssh_packet_layer.h > create mode 100644 drivers/platform/surface/aggregator/ssh_parser.c > create mode 100644 drivers/platform/surface/aggregator/ssh_parser.h > create mode 100644 drivers/platform/surface/aggregator/ssh_request_layer.c > create mode 100644 drivers/platform/surface/aggregator/ssh_request_layer.h > create mode 100644 drivers/platform/surface/aggregator/trace.h > create mode 100644 drivers/platform/surface/surface_acpi_notify.c > create mode 100644 drivers/platform/surface/surface_aggregator_cdev.c > create mode 100644 include/linux/surface_acpi_notify.h > create mode 100644 include/linux/surface_aggregator/controller.h > create mode 100644 include/linux/surface_aggregator/device.h > create mode 100644 include/linux/surface_aggregator/serial_hub.h > create mode 100644 include/uapi/linux/surface_aggregator/cdev.h >
On 11/24/20 12:59 PM, Hans de Goede wrote: > Hi, > > On 11/15/20 8:21 PM, Maximilian Luz wrote: >> Hello, >> >> N.B.: the following text is mostly a repeat of cover letter from the >> previous RFC for the uninitiated, which can be found at >> >> https://lore.kernel.org/linux-serial/20200923151511.3842150-1-luzmaximilian@gmail.com/ >> >> See "Changes" below for an overview of differences between the RFC and >> this patchset. I hope I have addressed all comments from that in this >> version, thank you again for those. >> >> The Surface System Aggregator Module (we'll refer to it as Surface >> Aggregator or SAM below) is an embedded controller (EC) found on various >> Microsoft Surface devices. Specifically, all 4th and later generation >> Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the >> exception of the Surface Go series and the Surface Duo. Notably, it >> seems like this EC can also be found on the ARM-based Surface Pro X [1]. > > <snip> > >> This patch-set can also be found at the following repository and >> reference, if you prefer to look at a kernel tree instead of these >> emails: >> >> https://github.com/linux-surface/kernel tags/s/surface-aggregator/v1 >> >> Thanks, >> Max > > Thank you for your work on this. It would be great if we can get better > support for the Surface line in the mainline kernel. > > Since a lot of people have already commented on this series I think that > you have enough feedback to do a v2 addressing that feedback right? Yes, I'm already working on it. > For now I'm going to assume that you will do a v2 addressing the > initial round of comments and not review this myself (IOW I'll review > this when v2 is posted). Sure, no need for you to review v1 at this point. > Let me know if you see things differently. Thanks, Max