parisc: Stop CPUs via PAT firmware before system halt or reboot.
diff mbox

Message ID 20170512165513.GA23551@ls3530.fritz.box
State New
Headers show

Commit Message

Helge Deller May 12, 2017, 4:55 p.m. UTC
Dave reported that he had issued a "shutdown -r" and a panic occurred during
the reboot while all CPUs were still up. After this, stall messages were output
to console after the firmware version was printed.

To avoid that issue, add functions to call PAT firmware to stop all CPUs (with
the exception of the currently running CPU) before a panic reboot or a system
halt is issued.

Signed-off-by: Helge Deller <deller@gmx.de>

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rolf Eike Beer May 12, 2017, 7:38 p.m. UTC | #1
Nitpick:

> +	return retval;
> +}
> +
> +
> +/**
>   * pdc_pat_get_irt_size - Retrieve the number of entries in the cell's

Double newline.

Greetings,

Eike
John David Anglin May 15, 2017, 12:42 a.m. UTC | #2
On 2017-05-12, at 12:55 PM, Helge Deller wrote:

> Dave reported that he had issued a "shutdown -r" and a panic occurred during
> the reboot while all CPUs were still up. After this, stall messages were output
> to console after the firmware version was printed.
> 
> To avoid that issue, add functions to call PAT firmware to stop all CPUs (with
> the exception of the currently running CPU) before a panic reboot or a system
> halt is issued.

This patch causes a problem with "shutdown -h" on my c8000.  After the system prints
the message that it is okay to power off, the front panel LED shows a flashing red, and pressing
the power button once doesn't power the system down.  Pressing it again causes the system
to reboot.

Dave
--
John David Anglin	dave.anglin@bell.net



--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller May 15, 2017, 7:39 a.m. UTC | #3
On 15.05.2017 02:42, John David Anglin wrote:
> On 2017-05-12, at 12:55 PM, Helge Deller wrote:
> 
>> Dave reported that he had issued a "shutdown -r" and a panic occurred during
>> the reboot while all CPUs were still up. After this, stall messages were output
>> to console after the firmware version was printed.
>>
>> To avoid that issue, add functions to call PAT firmware to stop all CPUs (with
>> the exception of the currently running CPU) before a panic reboot or a system
>> halt is issued.
> 
> This patch causes a problem with "shutdown -h" on my c8000.  After the system prints
> the message that it is okay to power off, the front panel LED shows a flashing red, and pressing
> the power button once doesn't power the system down.  Pressing it again causes the system
> to reboot.

I noticed that during my testing as well.
It seems that the firmware call stops the whole socket, not just one CPU.
Maybe it's another problem as well?

Anyway, I did included an updated patch [1] in my last git pull request to Linus (which actually
didn't made it into 4.12...).
Can you test this one instead?

Helge


[1] https://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git/commit/?h=parisc-4.12-1&id=5aa2aabff1ce642a0c16b8c25bce8dc5ad66ad81
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John David Anglin May 16, 2017, 12:29 a.m. UTC | #4
On 2017-05-15, at 3:39 AM, Helge Deller wrote:

> Anyway, I did included an updated patch [1] in my last git pull request to Linus (which actually
> didn't made it into 4.12...).
> Can you test this one instead?

I still see same behavior.

Dave
--
John David Anglin	dave.anglin@bell.net



--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller May 16, 2017, 7:22 p.m. UTC | #5
On 16.05.2017 02:29, John David Anglin wrote:
> On 2017-05-15, at 3:39 AM, Helge Deller wrote:
> 
>> Anyway, I did included an updated patch [1] in my last git pull request to Linus (which actually
>> didn't made it into 4.12...).
>> Can you test this one instead?
> 
> I still see same behavior.
> (After the system prints the message that it is okay to power off, 
> the front panel LED shows a flashing red, and pressing the power 
> button once doesn't power the system down. Pressing it again 
> causes the system to reboot.

I'll probably drop the patch then.
I didn't noticed that myself because I turn off/on the machine
via a remote power plug (via USB connected to a x86).

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John David Anglin May 17, 2017, 12:47 a.m. UTC | #6
On 2017-05-16, at 3:22 PM, Helge Deller wrote:

> On 16.05.2017 02:29, John David Anglin wrote:
>> On 2017-05-15, at 3:39 AM, Helge Deller wrote:
>> 
>>> Anyway, I did included an updated patch [1] in my last git pull request to Linus (which actually
>>> didn't made it into 4.12...).
>>> Can you test this one instead?
>> 
>> I still see same behavior.
>> (After the system prints the message that it is okay to power off, 
>> the front panel LED shows a flashing red, and pressing the power 
>> button once doesn't power the system down. Pressing it again 
>> causes the system to reboot.
> 
> I'll probably drop the patch then.
> I didn't noticed that myself because I turn off/on the machine
> via a remote power plug (via USB connected to a x86).

I did a new kernel build after reverting change and tried "shutdown -h +0":

reboot: Power down                                                              
System shut down completed.                                                     
Please power this system off now.                                               
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [halt:2906]               
Modules linked in: ext2 ipv6 sg ext4 crc16 jbd2 mbcache sd_mod ohci_pci pata_sid
CPU: 0 PID: 2906 Comm: halt Not tainted 4.10.16+ #1                             
task: 000000007e5a6b30 task.stack: 000000007f140000                             
                                                                                
     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI                                           
PSW: 00001000001011001111111100001111 Not tainted                               
r00-03  000000ff082cff0f 00000000406cf700 0000000040170870 000000007f140360     
r04-07  00000000406a5700 000000004321fedc 0000000028121969 0000000000000000     
r08-11  fffffffffee1dead 0000000000000000 0000000000013318 0000000000000000     
r12-15  0000000000000001 0000000000000001 0000000000000001 0000000000000001     
r16-19  0000000000000001 0000000000000000 00000000fffffff6 0000000000000000     
r20-23  0000000000000001 00000000000001c1 0000000000000000 00000000408d37d5     
r24-27  0000000000000000 000000000800000f 0000000040835c60 00000000406a5700     
r28-31  000000004080a784 000000007f1403b0 000000007f1403e0 0000000000000002     
sr00-03  000000000063f000 0000000000000000 0000000000000000 000000000063f000    
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000    
                                                                                
IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040170874 0000000040170870 
 IIR: 00000000    ISR: 000000000000003d  IOR: 00000000408d3332                  
 CPU:        0   CR30: 000000007f140000 CR31: ffffffffffffffff                  
 ORIG_R28: 00000000401e9ce0                                                     
 IAOQ[0]: machine_power_off+0x8c/0x90                                           
 IAOQ[1]: machine_power_off+0x88/0x90                                           
 RP(r2): machine_power_off+0x88/0x90                                            
Backtrace:                                                                      

Can we kill NMI watch dog?

Dave
--
John David Anglin	dave.anglin@bell.net



--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/arch/parisc/include/asm/pdcpat.h b/arch/parisc/include/asm/pdcpat.h
index 32e105f..0ef789e 100644
--- a/arch/parisc/include/asm/pdcpat.h
+++ b/arch/parisc/include/asm/pdcpat.h
@@ -307,6 +307,7 @@  extern int pdc_pat_cell_module(unsigned long *actcnt, unsigned long ploc, unsign
 extern int pdc_pat_cell_num_to_loc(void *, unsigned long);
 
 extern int pdc_pat_cpu_get_number(struct pdc_pat_cpu_num *cpu_info, unsigned long hpa);
+extern int pdc_pat_cpu_stop_cpu(unsigned long hpa, unsigned long hpa_vec);
 
 extern int pdc_pat_pd_get_addr_map(unsigned long *actual_len, void *mem_addr, unsigned long count, unsigned long offset);
 
diff --git a/arch/parisc/kernel/firmware.c b/arch/parisc/kernel/firmware.c
index 9819025..3f55db6 100644
--- a/arch/parisc/kernel/firmware.c
+++ b/arch/parisc/kernel/firmware.c
@@ -1308,6 +1308,31 @@  int pdc_pat_cpu_get_number(struct pdc_pat_cpu_num *cpu_info, unsigned long hpa)
 }
 
 /**
+ * pdc_pat_cpu_stop_cpu - Stop current cpu.
+ * @hpa: The Hard Physical Address of the CPU which should be informed when
+ *       current cpu has stopped.
+ * @hpa_vec: Mask of interrupts which should be signalled on CPU at @hpa.
+ *
+ * Stop the CPU in which the call is made. Flushes caches and purges TLB and
+ * places CPU in a firmware loop. If the CPU is the last in a cell, an
+ * interrupt message is sent to the CPU at @hpa.
+ */
+int pdc_pat_cpu_stop_cpu(unsigned long hpa, unsigned long hpa_vec)
+{
+	int retval;
+	unsigned long flags;
+
+	if (!hpa)
+		hpa_vec = -1UL;
+	spin_lock_irqsave(&pdc_lock, flags);
+	retval = mem_pdc_call(PDC_PAT_CPU, PDC_PAT_CPU_STOP, hpa, hpa_vec);
+	spin_unlock_irqrestore(&pdc_lock, flags);
+
+	return retval;
+}
+
+
+/**
  * pdc_pat_get_irt_size - Retrieve the number of entries in the cell's interrupt table.
  * @num_entries: The return value.
  * @cell_num: The target cell.
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index 4516a5b..1615a9a 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -98,6 +98,9 @@  void machine_restart(char *cmd)
 #endif
 	/* set up a new led state on systems shipped with a LED State panel */
 	pdc_chassis_send_status(PDC_CHASSIS_DIRECT_SHUTDOWN);
+
+	/* stops all CPUs but the current one */
+	smp_send_stop();
 	
 	/* "Normal" system reset */
 	pdc_do_reset();
@@ -116,6 +119,9 @@  void machine_halt(void)
 	** The LED/ChassisCodes are updated by the led_halt()
 	** function, called by the reboot notifier chain.
 	*/
+
+	/* stops all CPUs but the current one */
+	smp_send_stop();
 }
 
 void (*chassis_power_off)(void);
@@ -126,6 +132,9 @@  void (*chassis_power_off)(void);
  */
 void machine_power_off(void)
 {
+	/* stops all CPUs but the current one */
+	smp_send_stop();
+
 	/* If there is a registered power off handler, call it. */
 	if (chassis_power_off)
 		chassis_power_off();
diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
index 6336510..afd9142 100644
--- a/arch/parisc/kernel/smp.c
+++ b/arch/parisc/kernel/smp.c
@@ -42,6 +42,7 @@ 
 #include <asm/irq.h>		/* for CPU_IRQ_REGION and friends */
 #include <asm/mmu_context.h>
 #include <asm/page.h>
+#include <asm/pdcpat.h>
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/processor.h>
@@ -112,6 +113,9 @@  halt_processor(void)
 	/* REVISIT : does PM *know* this CPU isn't available? */
 	set_cpu_online(smp_processor_id(), false);
 	local_irq_disable();
+#ifdef CONFIG_64BIT
+	pdc_pat_cpu_stop_cpu(0, -1UL);
+#endif
 	for (;;)
 		;
 }