diff mbox series

[v2] acpi/ghes: Prevent sleeping with spinlock held

Message ID 20240206-cxl-cper-smatch-v2-1-84ed07563c31@intel.com
State New, archived
Headers show
Series [v2] acpi/ghes: Prevent sleeping with spinlock held | expand

Commit Message

Ira Weiny Feb. 6, 2024, 10:15 p.m. UTC
Smatch caught that cxl_cper_post_event() is called with a spinlock held
or preemption disabled.[1]  The callback takes the device lock to
perform address translation and therefore might sleep.  The record data
is released back to BIOS in ghes_clear_estatus() which requires it to be
copied for use in the workqueue.

Copy the record to a lockless list and schedule a work item to process
the record outside of atomic context.

[1] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
Changes in v2:
- djbw: device_lock() sleeps so we need to call the callback in process context
- iweiny: create work queue to handle processing the callback
- Link to v1: https://lore.kernel.org/r/20240202-cxl-cper-smatch-v1-1-7a4103c7f5a0@intel.com
---
 drivers/acpi/apei/ghes.c | 44 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 3 deletions(-)


---
base-commit: 99bd3cb0d12e85d5114425353552121ec8f93adc
change-id: 20240201-cxl-cper-smatch-82b129498498

Best regards,

Comments

Jonathan Cameron Feb. 14, 2024, 12:11 p.m. UTC | #1
On Tue, 06 Feb 2024 14:15:32 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> Smatch caught that cxl_cper_post_event() is called with a spinlock held
> or preemption disabled.[1]  The callback takes the device lock to
> perform address translation and therefore might sleep.  The record data
> is released back to BIOS in ghes_clear_estatus() which requires it to be
> copied for use in the workqueue.
> 
> Copy the record to a lockless list and schedule a work item to process
> the record outside of atomic context.
> 
> [1] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/
> 
> Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

+CC tracing folk for the splat below (the second one as first one is fixed!)

Lock debugging is slow (on an emulated machine) :(
Testing with my gitlab.com/jic23/qemu cxl-2024-02-05-draft branch
which is only one I've put out with the FW first injection patches so far

For reference without this patch I got a nice spat identifying the original bug.
So far so good.

With this patch (and tp_printk in command line and trace points enabled)
[  193.048229] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[  193.049636] {1}[Hardware Error]: event severity: recoverable
[  193.050472] {1}[Hardware Error]:  Error 0, type: recoverable
[  193.051337] {1}[Hardware Error]:   section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
[  193.052270] {1}[Hardware Error]:   section length: 0x90
[  193.053402] {1}[Hardware Error]:   00000000: 00000090 00000007 00000000 0d938086  ................
[  193.055036] {1}[Hardware Error]:   00000010: 000e0000 00000000 00000005 00000000  ................
[  193.058592] {1}[Hardware Error]:   00000020: 00000180 00000000 1626fa24 17b3b158  ........$.&.X...
[  193.062289] {1}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000  ................
[  193.065959] {1}[Hardware Error]:   00000040: 000007d0 00000000 0fc00307 05210300  ..............!.
[  193.069782] {1}[Hardware Error]:   00000050: 72690000 6d207361 00326d65 00000000  ..iras mem2.....
[  193.072760] {1}[Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
[  193.074062] {1}[Hardware Error]:   00000070: 00000000 00000000 00000000 00000000  ................
[  193.075346] {1}[Hardware Error]:   00000080: 00000000 00000000 00000000 00000000  ................

I rebased after this so now we get the smaller print - but that's not really relevant here.

[  193.086589] cxl_general_media: memdev=mem1 host=0000:0e:00.0 serial=5 log=Warning : time=1707903675590441508 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=128 flags='0x1' handle=0 related_handle=0 maint_op_class=0 : dpa=7c0 dpa_flags='0x10' descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT|POISON_LIST_OVERFLOW' type='0x3' transaction_type='0xc0' channel=3 rank=33 device=5 comp_id=69 72 61 73 20 6d 65 6d 32 00 00 00 00 00 00 00 validity_flags='CHANNEL|RANK|DEVICE|COMPONENT'
[  193.087181]                                                                                                                                                                                                                                                      
[  193.087361] =============================
[  193.087399] [ BUG: Invalid wait context ]
[  193.087504] 6.8.0-rc3 #1200 Not tainted
[  193.087660] -----------------------------
[  193.087858] kworker/3:0/31 is trying to lock:
[  193.087966] ffff0000c0ce1898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x148/0x1c8
[  193.089754] other info that might help us debug this:
[  193.089820] context-{5:5}[  193.089900] 8 locks held by kworker/3:0/31:
[  193.089990]  #0: ffff0000c0018738 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x154/0x500
[  193.090439]  #1: ffff800083793de0 (cxl_cper_work){+.+.}-{0:0}, at: process_one_work+0x154/0x500
[  193.090718]  #2: ffff800082883310 (cxl_cper_rw_sem){++++}-{4:4}, at: cxl_cper_work_fn+0x2c/0xb0
[  193.091019]  #3: ffff0000c0a7b1a8 (&dev->mutex){....}-{4:4}, at: pci_dev_lock+0x28/0x48
[  193.091431]  #4: ffff800082738f18 (tracepoint_iter_lock){....}-{2:2}, at: trace_event_buffer_commit+0xd8/0x2c8
[  193.091772]  #5: ffff8000826b3ce8 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x124/0x398
[  193.092031]  #6: ffff8000826b3d40 (console_srcu){....}-{0:0}, at: console_flush_all+0x88/0x4b0
[  193.092363]  #7: ffff8000826b3ef8 (console_owner){....}-{0:0}, at: console_flush_all+0x1bc/0x4b0
[  193.092799] stack backtrace:
[  193.092973] CPU: 3 PID: 31 Comm: kworker/3:0 Not tainted 6.8.0-rc3 #1200
[  193.093118] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown unknown
[  193.093468] Workqueue: events cxl_cper_work_fn
[  193.093782] Call trace:
[  193.094004]  dump_backtrace+0xa4/0x130
[  193.094145]  show_stack+0x20/0x38
[  193.094231]  dump_stack_lvl+0x60/0xb0
[  193.094315]  dump_stack+0x18/0x28
[  193.094395]  __lock_acquire+0x9e8/0x1ee8
[  193.094477]  lock_acquire+0x118/0x2e8
[  193.094557]  _raw_spin_lock+0x50/0x70
[  193.094820]  pl011_console_write+0x148/0x1c8
[  193.094904]  console_flush_all+0x218/0x4b0
[  193.094985]  console_unlock+0x74/0x140
[  193.095066]  vprintk_emit+0x230/0x398
[  193.095146]  vprintk_default+0x40/0x58
[  193.095226]  vprintk+0x98/0xb0
[  193.095306]  _printk+0x64/0x98
[  193.095385]  trace_event_buffer_commit+0x138/0x2c8
[  193.095467]  trace_event_raw_event_cxl_general_media+0x1a8/0x280 [cxl_core]
[  193.096191]  __traceiter_cxl_general_media+0x50/0x78 [cxl_core]
[  193.096359]  cxl_event_trace_record+0x204/0x2d0 [cxl_core]
[  193.096520]  cxl_cper_event_call+0xb0/0xe0 [cxl_pci]

The rw_sem is held to protect the callback pointer.

[  193.096713]  cxl_cper_work_fn+0x7c/0xb0
[  193.096808]  process_one_work+0x1f4/0x500
[  193.096891]  worker_thread+0x1f0/0x3f0
[  193.096971]  kthread+0x110/0x120
[  193.097052]  ret_from_fork+0x10/0x20

So I'm not sure how to fix this or if we even want to.

We could try and release locks before calling the tracepoint but that looks
very fiddly and would require ghes.c to be able to see more of the
CXL tracepoint infrastructure which isn't great.

Just because I was feeling cheeky I did a quick test with following.
I have a sneaky suspicion this won't got down well even though it 'fixes'
our issue...  My google fu / lore search terms are failing to find
much previous discussion of this.

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 9ff8a439d674..7ee45f22f56f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2957,7 +2957,7 @@ static void output_printk(struct trace_event_buffer *fbuffer)
        iter->ent = fbuffer->entry;
        event_call->event.funcs->trace(iter, 0, event);
        trace_seq_putc(&iter->seq, 0);
-       printk("%s", iter->seq.buffer);
+       printk_deferred("%s", iter->seq.buffer);

        raw_spin_unlock_irqrestore(&tracepoint_iter_lock, flags);
 }

My assumption is similar views will apply here to Peter Zijlstra's comment on
not using printk_deferred.

https://lore.kernel.org/all/20231010141244.GM377@noisy.programming.kicks-ass.net/

Note I also tried the hacks Peter links to from there. They trip issues in the initial
CPER print - so I assume not appropriate here.

So I'm thinking this is a won't fix - wait for the printk rework to land and
assume this will be resolved as well?

Jonathan

> ---
> Changes in v2:
> - djbw: device_lock() sleeps so we need to call the callback in process context
> - iweiny: create work queue to handle processing the callback
> - Link to v1: https://lore.kernel.org/r/20240202-cxl-cper-smatch-v1-1-7a4103c7f5a0@intel.com
> ---
>  drivers/acpi/apei/ghes.c | 44 +++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 7b7c605166e0..aa41e9128118 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -679,6 +679,12 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
>   */
>  static DECLARE_RWSEM(cxl_cper_rw_sem);
>  static cxl_cper_callback cper_callback;
> +static LLIST_HEAD(cxl_cper_rec_list);
> +struct cxl_cper_work_item {
> +	struct llist_node node;
> +	enum cxl_event_type event_type;
> +	struct cxl_cper_event_rec rec;
> +};
>  
>  /* CXL Event record UUIDs are formatted as GUIDs and reported in section type */
>  
> @@ -706,9 +712,34 @@ static cxl_cper_callback cper_callback;
>  	GUID_INIT(0xfe927475, 0xdd59, 0x4339,				\
>  		  0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
>  
> +static void cxl_cper_work_fn(struct work_struct *work)
> +{
> +	struct llist_node *entries, *cur, *n;
> +	struct cxl_cper_work_item *wi;
> +
> +	guard(rwsem_read)(&cxl_cper_rw_sem);
> +
> +	entries = llist_del_all(&cxl_cper_rec_list);
> +	if (!entries)
> +		return;

Unfortunately the rw_sem is protecting the cper_callback itself
so I'm not sure how to do this.

Maybe a dance where cper_callback creates the record but returns
the data to here so that we can release the lock and issue
the tracepoint?

> +	/* Process oldest to newest */
> +	entries = llist_reverse_order(entries);
> +	llist_for_each_safe(cur, n, entries) {
> +		wi = llist_entry(cur, struct cxl_cper_work_item, node);
> +
> +		if (cper_callback)
> +			cper_callback(wi->event_type, &wi->rec);
> +		kfree(wi);
> +	}
> +}
> +static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
> +
>  static void cxl_cper_post_event(enum cxl_event_type event_type,
>  				struct cxl_cper_event_rec *rec)
>  {
> +	struct cxl_cper_work_item *wi;
> +
>  	if (rec->hdr.length <= sizeof(rec->hdr) ||
>  	    rec->hdr.length > sizeof(*rec)) {
>  		pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
> @@ -721,9 +752,16 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
>  		return;
>  	}
>  
> -	guard(rwsem_read)(&cxl_cper_rw_sem);
> -	if (cper_callback)
> -		cper_callback(event_type, rec);
> +	wi = kmalloc(sizeof(*wi), GFP_ATOMIC);
> +	if (!wi) {
> +		pr_err(FW_WARN "CXL CPER failed to allocate work item\n");
> +		return;
> +	}
> +
> +	wi->event_type = event_type;
> +	memcpy(&wi->rec, rec, sizeof(wi->rec));
> +	llist_add(&wi->node, &cxl_cper_rec_list);
> +	schedule_work(&cxl_cper_work);
>  }
>  
>  int cxl_cper_register_callback(cxl_cper_callback callback)
> 
> ---
> base-commit: 99bd3cb0d12e85d5114425353552121ec8f93adc
> change-id: 20240201-cxl-cper-smatch-82b129498498
> 
> Best regards,
Steven Rostedt Feb. 14, 2024, 3:23 p.m. UTC | #2
On Wed, 14 Feb 2024 12:11:53 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> So I'm thinking this is a won't fix - wait for the printk rework to land and
> assume this will be resolved as well?

That pretty much sums up what I was about to say ;-)

tp_printk is more of a hack and not to be used sparingly. With the right
trace events it can hang the machine.

So, you can use your internal patch locally, but I would recommend waiting
for the new printk changes to land. I'm really hoping that will be soon!

-- Steve
Ira Weiny Feb. 14, 2024, 4:40 p.m. UTC | #3
Jonathan Cameron wrote:
> On Tue, 06 Feb 2024 14:15:32 -0800
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> > Smatch caught that cxl_cper_post_event() is called with a spinlock held
> > or preemption disabled.[1]  The callback takes the device lock to
> > perform address translation and therefore might sleep.  The record data
> > is released back to BIOS in ghes_clear_estatus() which requires it to be
> > copied for use in the workqueue.
> > 
> > Copy the record to a lockless list and schedule a work item to process
> > the record outside of atomic context.
> > 
> > [1] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/
> > 
> > Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> +CC tracing folk for the splat below (the second one as first one is fixed!)
> 
> Lock debugging is slow (on an emulated machine) :(
> Testing with my gitlab.com/jic23/qemu cxl-2024-02-05-draft branch
> which is only one I've put out with the FW first injection patches so far
> 
> For reference without this patch I got a nice spat identifying the original bug.
> So far so good.
> 
> With this patch (and tp_printk in command line and trace points enabled)
> [  193.048229] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> [  193.049636] {1}[Hardware Error]: event severity: recoverable
> [  193.050472] {1}[Hardware Error]:  Error 0, type: recoverable
> [  193.051337] {1}[Hardware Error]:   section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
> [  193.052270] {1}[Hardware Error]:   section length: 0x90
> [  193.053402] {1}[Hardware Error]:   00000000: 00000090 00000007 00000000 0d938086  ................
> [  193.055036] {1}[Hardware Error]:   00000010: 000e0000 00000000 00000005 00000000  ................
> [  193.058592] {1}[Hardware Error]:   00000020: 00000180 00000000 1626fa24 17b3b158  ........$.&.X...
> [  193.062289] {1}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000  ................
> [  193.065959] {1}[Hardware Error]:   00000040: 000007d0 00000000 0fc00307 05210300  ..............!.
> [  193.069782] {1}[Hardware Error]:   00000050: 72690000 6d207361 00326d65 00000000  ..iras mem2.....
> [  193.072760] {1}[Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
> [  193.074062] {1}[Hardware Error]:   00000070: 00000000 00000000 00000000 00000000  ................
> [  193.075346] {1}[Hardware Error]:   00000080: 00000000 00000000 00000000 00000000  ................
> 
> I rebased after this so now we get the smaller print - but that's not really relevant here.
> 
> [  193.086589] cxl_general_media: memdev=mem1 host=0000:0e:00.0 serial=5 log=Warning : time=1707903675590441508 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=128 flags='0x1' handle=0 related_handle=0 maint_op_class=0 : dpa=7c0 dpa_flags='0x10' descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT|POISON_LIST_OVERFLOW' type='0x3' transaction_type='0xc0' channel=3 rank=33 device=5 comp_id=69 72 61 73 20 6d 65 6d 32 00 00 00 00 00 00 00 validity_flags='CHANNEL|RANK|DEVICE|COMPONENT'
> [  193.087181]                                                                                                                                                                                                                                                      
> [  193.087361] =============================
> [  193.087399] [ BUG: Invalid wait context ]
> [  193.087504] 6.8.0-rc3 #1200 Not tainted
> [  193.087660] -----------------------------
> [  193.087858] kworker/3:0/31 is trying to lock:
> [  193.087966] ffff0000c0ce1898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x148/0x1c8
> [  193.089754] other info that might help us debug this:
> [  193.089820] context-{5:5}[  193.089900] 8 locks held by kworker/3:0/31:
> [  193.089990]  #0: ffff0000c0018738 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x154/0x500
> [  193.090439]  #1: ffff800083793de0 (cxl_cper_work){+.+.}-{0:0}, at: process_one_work+0x154/0x500
> [  193.090718]  #2: ffff800082883310 (cxl_cper_rw_sem){++++}-{4:4}, at: cxl_cper_work_fn+0x2c/0xb0
> [  193.091019]  #3: ffff0000c0a7b1a8 (&dev->mutex){....}-{4:4}, at: pci_dev_lock+0x28/0x48
> [  193.091431]  #4: ffff800082738f18 (tracepoint_iter_lock){....}-{2:2}, at: trace_event_buffer_commit+0xd8/0x2c8
> [  193.091772]  #5: ffff8000826b3ce8 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x124/0x398
> [  193.092031]  #6: ffff8000826b3d40 (console_srcu){....}-{0:0}, at: console_flush_all+0x88/0x4b0
> [  193.092363]  #7: ffff8000826b3ef8 (console_owner){....}-{0:0}, at: console_flush_all+0x1bc/0x4b0
> [  193.092799] stack backtrace:
> [  193.092973] CPU: 3 PID: 31 Comm: kworker/3:0 Not tainted 6.8.0-rc3 #1200
> [  193.093118] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown unknown
> [  193.093468] Workqueue: events cxl_cper_work_fn
> [  193.093782] Call trace:
> [  193.094004]  dump_backtrace+0xa4/0x130
> [  193.094145]  show_stack+0x20/0x38
> [  193.094231]  dump_stack_lvl+0x60/0xb0
> [  193.094315]  dump_stack+0x18/0x28
> [  193.094395]  __lock_acquire+0x9e8/0x1ee8
> [  193.094477]  lock_acquire+0x118/0x2e8
> [  193.094557]  _raw_spin_lock+0x50/0x70
> [  193.094820]  pl011_console_write+0x148/0x1c8
> [  193.094904]  console_flush_all+0x218/0x4b0
> [  193.094985]  console_unlock+0x74/0x140
> [  193.095066]  vprintk_emit+0x230/0x398
> [  193.095146]  vprintk_default+0x40/0x58
> [  193.095226]  vprintk+0x98/0xb0
> [  193.095306]  _printk+0x64/0x98
> [  193.095385]  trace_event_buffer_commit+0x138/0x2c8
> [  193.095467]  trace_event_raw_event_cxl_general_media+0x1a8/0x280 [cxl_core]
> [  193.096191]  __traceiter_cxl_general_media+0x50/0x78 [cxl_core]
> [  193.096359]  cxl_event_trace_record+0x204/0x2d0 [cxl_core]
> [  193.096520]  cxl_cper_event_call+0xb0/0xe0 [cxl_pci]
> 
> The rw_sem is held to protect the callback pointer.
> 
> [  193.096713]  cxl_cper_work_fn+0x7c/0xb0
> [  193.096808]  process_one_work+0x1f4/0x500
> [  193.096891]  worker_thread+0x1f0/0x3f0
> [  193.096971]  kthread+0x110/0x120
> [  193.097052]  ret_from_fork+0x10/0x20
> 
> So I'm not sure how to fix this or if we even want to.
> 
> We could try and release locks before calling the tracepoint but that looks
> very fiddly and would require ghes.c to be able to see more of the
> CXL tracepoint infrastructure which isn't great.
> 
> Just because I was feeling cheeky I did a quick test with following.
> I have a sneaky suspicion this won't got down well even though it 'fixes'
> our issue...  My google fu / lore search terms are failing to find
> much previous discussion of this.
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 9ff8a439d674..7ee45f22f56f 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2957,7 +2957,7 @@ static void output_printk(struct trace_event_buffer *fbuffer)
>         iter->ent = fbuffer->entry;
>         event_call->event.funcs->trace(iter, 0, event);
>         trace_seq_putc(&iter->seq, 0);
> -       printk("%s", iter->seq.buffer);
> +       printk_deferred("%s", iter->seq.buffer);
> 
>         raw_spin_unlock_irqrestore(&tracepoint_iter_lock, flags);
>  }
> 
> My assumption is similar views will apply here to Peter Zijlstra's comment on
> not using printk_deferred.
> 
> https://lore.kernel.org/all/20231010141244.GM377@noisy.programming.kicks-ass.net/
> 
> Note I also tried the hacks Peter links to from there. They trip issues in the initial
> CPER print - so I assume not appropriate here.
> 
> So I'm thinking this is a won't fix - wait for the printk rework to land and
> assume this will be resolved as well?
> 

Or could we avoid the situation entirely by using a static call?

The work queue still needs to be created because of the atomicness of
ghes_do_proc() but it avoids cxl_cper_rw_sem.

I think the hardest thing may be writing the commit message to explain all
this.  :-(

Ira
Ira Weiny Feb. 14, 2024, 5:11 p.m. UTC | #4
Ira Weiny wrote:
> Jonathan Cameron wrote:
> > On Tue, 06 Feb 2024 14:15:32 -0800
> > Ira Weiny <ira.weiny@intel.com> wrote:
> > 
> >

[snip]

> > 
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 9ff8a439d674..7ee45f22f56f 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -2957,7 +2957,7 @@ static void output_printk(struct trace_event_buffer *fbuffer)
> >         iter->ent = fbuffer->entry;
> >         event_call->event.funcs->trace(iter, 0, event);
> >         trace_seq_putc(&iter->seq, 0);
> > -       printk("%s", iter->seq.buffer);
> > +       printk_deferred("%s", iter->seq.buffer);
> > 
> >         raw_spin_unlock_irqrestore(&tracepoint_iter_lock, flags);
> >  }
> > 
> > My assumption is similar views will apply here to Peter Zijlstra's comment on
> > not using printk_deferred.
> > 
> > https://lore.kernel.org/all/20231010141244.GM377@noisy.programming.kicks-ass.net/
> > 
> > Note I also tried the hacks Peter links to from there. They trip issues in the initial
> > CPER print - so I assume not appropriate here.
> > 
> > So I'm thinking this is a won't fix - wait for the printk rework to land and
> > assume this will be resolved as well?
> > 
> 
> Or could we avoid the situation entirely by using a static call?
> 
> The work queue still needs to be created because of the atomicness of
> ghes_do_proc() but it avoids cxl_cper_rw_sem.
> 
> I think the hardest thing may be writing the commit message to explain all
> this.  :-(
> 

Never mind, I already went down that path.  I was surprised I did not
mention it in this commit message.  I did in V1.  :-(

"A static call was considered but ARM does not select HAVE_STATIC_CALL
and in that case setting the function pointer uses a RW semaphore."
	-- https://lore.kernel.org/all/20240202-cxl-cper-smatch-v1-1-7a4103c7f5a0@intel.com/

Should have carried that through.

So should we revert ...

	Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") 

... and wait for the printk fix or just move forward with this patch?

Sorry for the noise,
Ira
Jonathan Cameron Feb. 14, 2024, 6:12 p.m. UTC | #5
On Wed, 14 Feb 2024 10:23:10 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 14 Feb 2024 12:11:53 +0000
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > So I'm thinking this is a won't fix - wait for the printk rework to land and
> > assume this will be resolved as well?  
> 
> That pretty much sums up what I was about to say ;-)
> 
> tp_printk is more of a hack and not to be used sparingly. With the right
> trace events it can hang the machine.
> 
> So, you can use your internal patch locally, but I would recommend waiting
> for the new printk changes to land. I'm really hoping that will be soon!
> 
> -- Steve

Thanks Steve,

Ira's fix is needed for other valid locking reasons - this was 'just another'
lock debugging report that came up whilst testing it.

For this patch (not a potential additional one that we aren't going to do ;)

Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Ira Weiny Feb. 14, 2024, 9:22 p.m. UTC | #6
Jonathan Cameron wrote:
> On Wed, 14 Feb 2024 10:23:10 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Wed, 14 Feb 2024 12:11:53 +0000
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > 
> > > So I'm thinking this is a won't fix - wait for the printk rework to land and
> > > assume this will be resolved as well?  
> > 
> > That pretty much sums up what I was about to say ;-)
> > 
> > tp_printk is more of a hack and not to be used sparingly. With the right
> > trace events it can hang the machine.
> > 
> > So, you can use your internal patch locally, but I would recommend waiting
> > for the new printk changes to land. I'm really hoping that will be soon!
> > 
> > -- Steve
> 
> Thanks Steve,
> 
> Ira's fix is needed for other valid locking reasons - this was 'just another'
> lock debugging report that came up whilst testing it.
> 
> For this patch (not a potential additional one that we aren't going to do ;)
> 
> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Thanks Jonathan!  I really appreciate the testing,
Ira
Ira Weiny Feb. 14, 2024, 10:19 p.m. UTC | #7
Jonathan Cameron wrote:
> On Wed, 14 Feb 2024 10:23:10 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Wed, 14 Feb 2024 12:11:53 +0000
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > 
> > > So I'm thinking this is a won't fix - wait for the printk rework to land and
> > > assume this will be resolved as well?  
> > 
> > That pretty much sums up what I was about to say ;-)
> > 
> > tp_printk is more of a hack and not to be used sparingly. With the right
> > trace events it can hang the machine.
> > 
> > So, you can use your internal patch locally, but I would recommend waiting
> > for the new printk changes to land.

Steven, Do you think that will land in 6.9?

> >
> > I'm really hoping that will be soon!
> > 
> > -- Steve
> 
> Thanks Steve,
> 
> Ira's fix is needed for other valid locking reasons - this was 'just another'
> lock debugging report that came up whilst testing it.
> 
> For this patch (not a potential additional one that we aren't going to do ;)
> 
> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Jonathan,

Again thanks for the testing!  However, Dan and I just discussed this and
he has an uneasy feeling about going forward with this for 6.8 final.

If we revert the following patch I can squash this fix and wait for the
tp_printk() fix to land in 6.9 and resubmit.

Dan here is the patch which backs out the actual bug:

	Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") 

Thanks everyone,
Ira
Steven Rostedt Feb. 14, 2024, 10:33 p.m. UTC | #8
On Wed, 14 Feb 2024 14:19:19 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > >   
> > > > So I'm thinking this is a won't fix - wait for the printk rework to land and
> > > > assume this will be resolved as well?    
> > > 
> > > That pretty much sums up what I was about to say ;-)
> > > 
> > > tp_printk is more of a hack and not to be used sparingly. With the right
> > > trace events it can hang the machine.
> > > 
> > > So, you can use your internal patch locally, but I would recommend waiting
> > > for the new printk changes to land.  
> 
> Steven, Do you think that will land in 6.9?
> 
> > >
> > > I'm really hoping that will be soon!
> > > 

I may be like Jon Corbet predicting RT will land in mainline if I do.

-- Steve
Ira Weiny Feb. 14, 2024, 11:34 p.m. UTC | #9
Ira Weiny wrote:
> Jonathan Cameron wrote:
> > On Wed, 14 Feb 2024 10:23:10 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > On Wed, 14 Feb 2024 12:11:53 +0000
> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > > 
> > > > So I'm thinking this is a won't fix - wait for the printk rework to land and
> > > > assume this will be resolved as well?  
> > > 
> > > That pretty much sums up what I was about to say ;-)
> > > 
> > > tp_printk is more of a hack and not to be used sparingly. With the right
> > > trace events it can hang the machine.
> > > 
> > > So, you can use your internal patch locally, but I would recommend waiting
> > > for the new printk changes to land.
> 
> Steven, Do you think that will land in 6.9?
> 
> > >
> > > I'm really hoping that will be soon!
> > > 
> > > -- Steve
> > 
> > Thanks Steve,
> > 
> > Ira's fix is needed for other valid locking reasons - this was 'just another'
> > lock debugging report that came up whilst testing it.
> > 
> > For this patch (not a potential additional one that we aren't going to do ;)
> > 
> > Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Jonathan,
> 
> Again thanks for the testing!  However, Dan and I just discussed this and
> he has an uneasy feeling about going forward with this for 6.8 final.
> 
> If we revert the following patch I can squash this fix and wait for the
> tp_printk() fix to land in 6.9 and resubmit.
> 
> Dan here is the patch which backs out the actual bug:
> 
> 	Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") 

Unfortunately this is not the only patch.

We need to revert this too:

Fixes: dc97f6344f20 ("cxl/pci: Register for and process CPER events") 

And then revert ...
	Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") 

... but there is a conflict.

Dan, below is the correct revert patch.  Let me know if you need more.

Ira

commit 807fbe9cac9b190dab83e3ff377a30d18859c8ab
Author: Ira Weiny <ira.weiny@intel.com>
Date:   Wed Feb 14 15:25:24 2024 -0800

    Revert "acpi/ghes: Process CXL Component Events"
    
    This reverts commit 671a794c33c6e048ca5cedd5ad6af44d52d5d7e5.

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fe825a432c5b..ab2a82cb1b0b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -26,7 +26,6 @@
 #include <linux/interrupt.h>
 #include <linux/timer.h>
 #include <linux/cper.h>
-#include <linux/cxl-event.h>
 #include <linux/platform_device.h>
 #include <linux/mutex.h>
 #include <linux/ratelimit.h>
@@ -674,52 +673,6 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
 	schedule_work(&entry->work);
 }
 
-/*
- * Only a single callback can be registered for CXL CPER events.
- */
-static DECLARE_RWSEM(cxl_cper_rw_sem);
-static cxl_cper_callback cper_callback;
-
-static void cxl_cper_post_event(enum cxl_event_type event_type,
-				struct cxl_cper_event_rec *rec)
-{
-	if (rec->hdr.length <= sizeof(rec->hdr) ||
-	    rec->hdr.length > sizeof(*rec)) {
-		pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
-		       rec->hdr.length);
-		return;
-	}
-
-	if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
-		pr_err(FW_WARN "CXL CPER invalid event\n");
-		return;
-	}
-
-	guard(rwsem_read)(&cxl_cper_rw_sem);
-	if (cper_callback)
-		cper_callback(event_type, rec);
-}
-
-int cxl_cper_register_callback(cxl_cper_callback callback)
-{
-	guard(rwsem_write)(&cxl_cper_rw_sem);
-	if (cper_callback)
-		return -EINVAL;
-	cper_callback = callback;
-	return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_register_callback, CXL);
-
-int cxl_cper_unregister_callback(cxl_cper_callback callback)
-{
-	guard(rwsem_write)(&cxl_cper_rw_sem);
-	if (callback != cper_callback)
-		return -EINVAL;
-	cper_callback = NULL;
-	return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_callback, CXL);
-
 static bool ghes_do_proc(struct ghes *ghes,
 			 const struct acpi_hest_generic_status *estatus)
 {
@@ -754,22 +707,6 @@ static bool ghes_do_proc(struct ghes *ghes,
 		}
 		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
 			queued = ghes_handle_arm_hw_error(gdata, sev, sync);
-		} else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
-			struct cxl_cper_event_rec *rec =
-				acpi_hest_get_payload(gdata);
-
-			cxl_cper_post_event(CXL_CPER_EVENT_GEN_MEDIA, rec);
-		} else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID)) {
-			struct cxl_cper_event_rec *rec =
-				acpi_hest_get_payload(gdata);
-
-			cxl_cper_post_event(CXL_CPER_EVENT_DRAM, rec);
-		} else if (guid_equal(sec_type,
-				      &CPER_SEC_CXL_MEM_MODULE_GUID)) {
-			struct cxl_cper_event_rec *rec =
-				acpi_hest_get_payload(gdata);
-
-			cxl_cper_post_event(CXL_CPER_EVENT_MEM_MODULE, rec);
 		} else {
 			void *err = acpi_hest_get_payload(gdata);
 
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 95841750a383..4d6c05f535f8 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -107,54 +107,4 @@ struct cxl_event_record_raw {
 	union cxl_event event;
 } __packed;
 
-enum cxl_event_type {
-	CXL_CPER_EVENT_GEN_MEDIA,
-	CXL_CPER_EVENT_DRAM,
-	CXL_CPER_EVENT_MEM_MODULE,
-};
-
-#define CPER_CXL_DEVICE_ID_VALID		BIT(0)
-#define CPER_CXL_DEVICE_SN_VALID		BIT(1)
-#define CPER_CXL_COMP_EVENT_LOG_VALID		BIT(2)
-struct cxl_cper_event_rec {
-	struct {
-		u32 length;
-		u64 validation_bits;
-		struct cper_cxl_event_devid {
-			u16 vendor_id;
-			u16 device_id;
-			u8 func_num;
-			u8 device_num;
-			u8 bus_num;
-			u16 segment_num;
-			u16 slot_num; /* bits 2:0 reserved */
-			u8 reserved;
-		} __packed device_id;
-		struct cper_cxl_event_sn {
-			u32 lower_dw;
-			u32 upper_dw;
-		} __packed dev_serial_num;
-	} __packed hdr;
-
-	union cxl_event event;
-} __packed;
-
-typedef void (*cxl_cper_callback)(enum cxl_event_type type,
-				  struct cxl_cper_event_rec *rec);
-
-#ifdef CONFIG_ACPI_APEI_GHES
-int cxl_cper_register_callback(cxl_cper_callback callback);
-int cxl_cper_unregister_callback(cxl_cper_callback callback);
-#else
-static inline int cxl_cper_register_callback(cxl_cper_callback callback)
-{
-	return 0;
-}
-
-static inline int cxl_cper_unregister_callback(cxl_cper_callback callback)
-{
-	return 0;
-}
-#endif
-
 #endif /* _LINUX_CXL_EVENT_H */
Jonathan Cameron Feb. 15, 2024, 9:25 a.m. UTC | #10
On Wed, 14 Feb 2024 17:33:18 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 14 Feb 2024 14:19:19 -0800
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > > >     
> > > > > So I'm thinking this is a won't fix - wait for the printk rework to land and
> > > > > assume this will be resolved as well?      
> > > > 
> > > > That pretty much sums up what I was about to say ;-)
> > > > 
> > > > tp_printk is more of a hack and not to be used sparingly. With the right
> > > > trace events it can hang the machine.
> > > > 
> > > > So, you can use your internal patch locally, but I would recommend waiting
> > > > for the new printk changes to land.    
> > 
> > Steven, Do you think that will land in 6.9?
> >   
> > > >
> > > > I'm really hoping that will be soon!
> > > >   
> 
> I may be like Jon Corbet predicting RT will land in mainline if I do.
> 
> -- Steve
> 


Agreed. Don't wait on printk fixes landing. (Well unless you are sure
it's the year of the Linux desktop.) Reverting is fine for 6.8
if you and Dan feel it's unwise to take this forwards (all the distros
will backport it anyway and 6.8 isn't an LTS so no great rush)
so fine to just queue it up again for 6.9 with this fix.

As Steve said, tp_printk is a hack (a very useful one) and
hopefully no one runs it in production.

Jonathan
Ira Weiny Feb. 15, 2024, 5:39 p.m. UTC | #11
Jonathan Cameron wrote:
> On Wed, 14 Feb 2024 17:33:18 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Wed, 14 Feb 2024 14:19:19 -0800
> > Ira Weiny <ira.weiny@intel.com> wrote:
> > 
> > > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > > > >     
> > > > > > So I'm thinking this is a won't fix - wait for the printk rework to land and
> > > > > > assume this will be resolved as well?      
> > > > > 
> > > > > That pretty much sums up what I was about to say ;-)
> > > > > 
> > > > > tp_printk is more of a hack and not to be used sparingly. With the right
> > > > > trace events it can hang the machine.
> > > > > 
> > > > > So, you can use your internal patch locally, but I would recommend waiting
> > > > > for the new printk changes to land.    
> > > 
> > > Steven, Do you think that will land in 6.9?
> > >   
> > > > >
> > > > > I'm really hoping that will be soon!
> > > > >   
> > 
> > I may be like Jon Corbet predicting RT will land in mainline if I do.
> > 
> > -- Steve
> > 
> 
> 
> Agreed. Don't wait on printk fixes landing. (Well unless you are sure
> it's the year of the Linux desktop.) Reverting is fine for 6.8
> if you and Dan feel it's unwise to take this forwards (all the distros
> will backport it anyway and 6.8 isn't an LTS so no great rush)
> so fine to just queue it up again for 6.9 with this fix.
> 
> As Steve said, tp_printk is a hack (a very useful one) and
> hopefully no one runs it in production.

OMG...  I did not realize what tp_printk() was exactly.  I should have
looked closer.

Do we have evidence of its use in production?

I would love to not have to revert/respin,
Ira
Dan Williams Feb. 17, 2024, 1:02 a.m. UTC | #12
Ira Weiny wrote:
[..]
> > As Steve said, tp_printk is a hack (a very useful one) and
> > hopefully no one runs it in production.
> 
> OMG...  I did not realize what tp_printk() was exactly.  I should have
> looked closer.
> 
> Do we have evidence of its use in production?
> 
> I would love to not have to revert/respin,

The revert is for 2 non-trivial fixes needed in one new feature, lets
just circle back and get it right for v6.9. The tp_printk() was not the
final straw for me.
Dan Williams Feb. 17, 2024, 1:17 a.m. UTC | #13
Ira Weiny wrote:
> Ira Weiny wrote:
> > Jonathan Cameron wrote:
> > > On Wed, 14 Feb 2024 10:23:10 -0500
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > On Wed, 14 Feb 2024 12:11:53 +0000
> > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > > > 
> > > > > So I'm thinking this is a won't fix - wait for the printk rework to land and
> > > > > assume this will be resolved as well?  
> > > > 
> > > > That pretty much sums up what I was about to say ;-)
> > > > 
> > > > tp_printk is more of a hack and not to be used sparingly. With the right
> > > > trace events it can hang the machine.
> > > > 
> > > > So, you can use your internal patch locally, but I would recommend waiting
> > > > for the new printk changes to land.
> > 
> > Steven, Do you think that will land in 6.9?
> > 
> > > >
> > > > I'm really hoping that will be soon!
> > > > 
> > > > -- Steve
> > > 
> > > Thanks Steve,
> > > 
> > > Ira's fix is needed for other valid locking reasons - this was 'just another'
> > > lock debugging report that came up whilst testing it.
> > > 
> > > For this patch (not a potential additional one that we aren't going to do ;)
> > > 
> > > Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> > Jonathan,
> > 
> > Again thanks for the testing!  However, Dan and I just discussed this and
> > he has an uneasy feeling about going forward with this for 6.8 final.
> > 
> > If we revert the following patch I can squash this fix and wait for the
> > tp_printk() fix to land in 6.9 and resubmit.
> > 
> > Dan here is the patch which backs out the actual bug:
> > 
> > 	Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") 
> 
> Unfortunately this is not the only patch.
> 
> We need to revert this too:
> 
> Fixes: dc97f6344f20 ("cxl/pci: Register for and process CPER events") 
> 
> And then revert ...
> 	Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") 
> 
> ... but there is a conflict.
> 
> Dan, below is the correct revert patch.  Let me know if you need more.
> 
> Ira
> 
> commit 807fbe9cac9b190dab83e3ff377a30d18859c8ab
> Author: Ira Weiny <ira.weiny@intel.com>
> Date:   Wed Feb 14 15:25:24 2024 -0800
> 
>     Revert "acpi/ghes: Process CXL Component Events"
>     
>     This reverts commit 671a794c33c6e048ca5cedd5ad6af44d52d5d7e5.

Even reverts need changelogs, I can add one. I got conflicts trying to
apply this to current fixes branch. I think I am going to just
surgically backout the drivers/acpi/apei/ghes.c changes.
Dan Williams Feb. 17, 2024, 8:07 p.m. UTC | #14
Ira Weiny wrote:
> Smatch caught that cxl_cper_post_event() is called with a spinlock held
> or preemption disabled.[1]  The callback takes the device lock to
> perform address translation and therefore might sleep.  The record data
> is released back to BIOS in ghes_clear_estatus() which requires it to be
> copied for use in the workqueue.
> 
> Copy the record to a lockless list and schedule a work item to process
> the record outside of atomic context.
> 
> [1] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/
> 
> Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
> Changes in v2:
> - djbw: device_lock() sleeps so we need to call the callback in process context
> - iweiny: create work queue to handle processing the callback
> - Link to v1: https://lore.kernel.org/r/20240202-cxl-cper-smatch-v1-1-7a4103c7f5a0@intel.com
> ---
>  drivers/acpi/apei/ghes.c | 44 +++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 41 insertions(+), 3 deletions(-)
> 
[..]
> +static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
> +
>  static void cxl_cper_post_event(enum cxl_event_type event_type,
>  				struct cxl_cper_event_rec *rec)
>  {
> +	struct cxl_cper_work_item *wi;
> +
>  	if (rec->hdr.length <= sizeof(rec->hdr) ||
>  	    rec->hdr.length > sizeof(*rec)) {
>  		pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
> @@ -721,9 +752,16 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
>  		return;
>  	}
>  
> -	guard(rwsem_read)(&cxl_cper_rw_sem);
> -	if (cper_callback)
> -		cper_callback(event_type, rec);

Given a work function can be set atomically there is no need to create /
manage a registration lock. Set a 'struct work' instance to a CXL
provided routine on cxl_pci module load and restore it to a nop function
+ cancel_work_sync() on cxl_pci module exit.

> +	wi = kmalloc(sizeof(*wi), GFP_ATOMIC);

The system is already under distress trying to report an error it should
not dip into emergency memory reserves to report errors. Use a kfifo()
similar to how memory_failure_queue() avoids memory allocation in the
error reporting path.
Dan Carpenter Feb. 19, 2024, 9:07 a.m. UTC | #15
On Fri, Feb 16, 2024 at 05:17:20PM -0800, Dan Williams wrote:
> > commit 807fbe9cac9b190dab83e3ff377a30d18859c8ab
> > Author: Ira Weiny <ira.weiny@intel.com>
> > Date:   Wed Feb 14 15:25:24 2024 -0800
> > 
> >     Revert "acpi/ghes: Process CXL Component Events"
> >     
> >     This reverts commit 671a794c33c6e048ca5cedd5ad6af44d52d5d7e5.
> 
> Even reverts need changelogs, I can add one. I got conflicts trying to
> apply this to current fixes branch. I think I am going to just
> surgically backout the drivers/acpi/apei/ghes.c changes.

The `git revert` command comes from ancient times and it just sets
people up for failure...  :/

regards,
dan carpenter
Ira Weiny Feb. 21, 2024, 4:59 a.m. UTC | #16
Dan Williams wrote:
> Ira Weiny wrote:

[snip]

> >  
> > -	guard(rwsem_read)(&cxl_cper_rw_sem);
> > -	if (cper_callback)
> > -		cper_callback(event_type, rec);
> 
> Given a work function can be set atomically there is no need to create /
> manage a registration lock. Set a 'struct work' instance to a CXL
> provided routine on cxl_pci module load and restore it to a nop function
> + cancel_work_sync() on cxl_pci module exit.

Ok I'll look into this.

> 
> > +	wi = kmalloc(sizeof(*wi), GFP_ATOMIC);
> 
> The system is already under distress trying to report an error it should
> not dip into emergency memory reserves to report errors. Use a kfifo()
> similar to how memory_failure_queue() avoids memory allocation in the
> error reporting path.

I have a question on ghes_proc() [ghes_do_proc()].  Can they be called by
2 threads at the same time?  It seems like there could be multiple
platform devices which end up queueing into the single kfifo.  So either
there needs to be a kfifo per device or synchronization with multiple
writers.

Ira
Dan Williams Feb. 21, 2024, 7:57 p.m. UTC | #17
Ira Weiny wrote:
> Dan Williams wrote:
> > Ira Weiny wrote:
> 
> [snip]
> 
> > >  
> > > -	guard(rwsem_read)(&cxl_cper_rw_sem);
> > > -	if (cper_callback)
> > > -		cper_callback(event_type, rec);
> > 
> > Given a work function can be set atomically there is no need to create /
> > manage a registration lock. Set a 'struct work' instance to a CXL
> > provided routine on cxl_pci module load and restore it to a nop function
> > + cancel_work_sync() on cxl_pci module exit.
> 
> Ok I'll look into this.
> 
> > 
> > > +	wi = kmalloc(sizeof(*wi), GFP_ATOMIC);
> > 
> > The system is already under distress trying to report an error it should
> > not dip into emergency memory reserves to report errors. Use a kfifo()
> > similar to how memory_failure_queue() avoids memory allocation in the
> > error reporting path.
> 
> I have a question on ghes_proc() [ghes_do_proc()].  Can they be called by
> 2 threads at the same time?  It seems like there could be multiple
> platform devices which end up queueing into the single kfifo.

Yes, that is already the case for memory_failure_queue() and
aer_recover_queue().

> there needs to be a kfifo per device or synchronization with multiple
> writers.

Yes, follow the other _queue() examples. kfifo_in_spinlocked() looks
useful for this purpose.

I expect no lock needed on the read side since the reader is only the
single workqueue context.
diff mbox series

Patch

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 7b7c605166e0..aa41e9128118 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -679,6 +679,12 @@  static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
  */
 static DECLARE_RWSEM(cxl_cper_rw_sem);
 static cxl_cper_callback cper_callback;
+static LLIST_HEAD(cxl_cper_rec_list);
+struct cxl_cper_work_item {
+	struct llist_node node;
+	enum cxl_event_type event_type;
+	struct cxl_cper_event_rec rec;
+};
 
 /* CXL Event record UUIDs are formatted as GUIDs and reported in section type */
 
@@ -706,9 +712,34 @@  static cxl_cper_callback cper_callback;
 	GUID_INIT(0xfe927475, 0xdd59, 0x4339,				\
 		  0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
 
+static void cxl_cper_work_fn(struct work_struct *work)
+{
+	struct llist_node *entries, *cur, *n;
+	struct cxl_cper_work_item *wi;
+
+	guard(rwsem_read)(&cxl_cper_rw_sem);
+
+	entries = llist_del_all(&cxl_cper_rec_list);
+	if (!entries)
+		return;
+
+	/* Process oldest to newest */
+	entries = llist_reverse_order(entries);
+	llist_for_each_safe(cur, n, entries) {
+		wi = llist_entry(cur, struct cxl_cper_work_item, node);
+
+		if (cper_callback)
+			cper_callback(wi->event_type, &wi->rec);
+		kfree(wi);
+	}
+}
+static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
+
 static void cxl_cper_post_event(enum cxl_event_type event_type,
 				struct cxl_cper_event_rec *rec)
 {
+	struct cxl_cper_work_item *wi;
+
 	if (rec->hdr.length <= sizeof(rec->hdr) ||
 	    rec->hdr.length > sizeof(*rec)) {
 		pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
@@ -721,9 +752,16 @@  static void cxl_cper_post_event(enum cxl_event_type event_type,
 		return;
 	}
 
-	guard(rwsem_read)(&cxl_cper_rw_sem);
-	if (cper_callback)
-		cper_callback(event_type, rec);
+	wi = kmalloc(sizeof(*wi), GFP_ATOMIC);
+	if (!wi) {
+		pr_err(FW_WARN "CXL CPER failed to allocate work item\n");
+		return;
+	}
+
+	wi->event_type = event_type;
+	memcpy(&wi->rec, rec, sizeof(wi->rec));
+	llist_add(&wi->node, &cxl_cper_rec_list);
+	schedule_work(&cxl_cper_work);
 }
 
 int cxl_cper_register_callback(cxl_cper_callback callback)