diff mbox

irqchip/gic-v3: Ensure GICR_CTLR.EnableLPI=0 is observed before enabling

Message ID 1520988601-16705-1-git-send-email-shankerd@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Shanker Donthineni March 14, 2018, 12:50 a.m. UTC
The definition of the GICR_CTLR.RWP control bit was expanded to indicate
status of changing GICR_CTLR.EnableLPI from 1 to 0 is being in progress
or completed. Software must observe GICR_CTLR.RWP==0 after clearing
GICR_CTLR.EnableLPI from 1 to 0 and before writing GICR_PENDBASER and/or
GICR_PROPBASER, otherwise behavior is UNPREDICTABLE.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
---
 drivers/irqchip/irq-gic-v3-its.c   | 30 +++++++++++++++++++++++-------
 include/linux/irqchip/arm-gic-v3.h |  1 +
 2 files changed, 24 insertions(+), 7 deletions(-)

Comments

Marc Zyngier March 14, 2018, 7:41 a.m. UTC | #1
Hi Shanker,

On Wed, 14 Mar 2018 00:50:01 +0000,
Shanker Donthineni wrote:
> 
> The definition of the GICR_CTLR.RWP control bit was expanded to indicate
> status of changing GICR_CTLR.EnableLPI from 1 to 0 is being in progress
> or completed. Software must observe GICR_CTLR.RWP==0 after clearing
> GICR_CTLR.EnableLPI from 1 to 0 and before writing GICR_PENDBASER and/or
> GICR_PROPBASER, otherwise behavior is UNPREDICTABLE.
> 
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> ---
>  drivers/irqchip/irq-gic-v3-its.c   | 30 +++++++++++++++++++++++-------
>  include/linux/irqchip/arm-gic-v3.h |  1 +
>  2 files changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 1d3056f..85cd158 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1875,15 +1875,31 @@ static void its_cpu_init_lpis(void)
>  		gic_data_rdist()->pend_page = pend_page;
>  	}
>  
> -	/* Disable LPIs */
>  	val = readl_relaxed(rbase + GICR_CTLR);
> -	val &= ~GICR_CTLR_ENABLE_LPIS;
> -	writel_relaxed(val, rbase + GICR_CTLR);
>  
> -	/*
> -	 * Make sure any change to the table is observable by the GIC.
> -	 */
> -	dsb(sy);
> +	/* Make sure LPIs are disabled before programming PEND/PROP registers */
> +	if (val & GICR_CTLR_ENABLE_LPIS) {
> +		u32 count = 1000000; /* 1s! */
> +
> +		/* Disable LPIs */
> +		val &= ~GICR_CTLR_ENABLE_LPIS;
> +		writel_relaxed(val, rbase + GICR_CTLR);
> +
> +		/* Make sure any change to GICR_CTLR is observable by the GIC */
> +		dsb(sy);
> +
> +		/* Wait for GICR_CTLR.RWP==0 or timeout */
> +		while (readl_relaxed(rbase + GICR_CTLR) & GICR_CTLR_RWP) {
> +			if (!count) {
> +				pr_err("CPU%d: Failed to disable LPIs\n",
> +				       smp_processor_id());
> +				return;
> +			}
> +			cpu_relax();
> +			udelay(1);
> +			count--;
> +		};
> +	}

I can see a couple of issues with this patch:

- Entering the kernel with GICR_CTLR.EnableLPIs set is a recipe for
  memory corruption and is likely to lead to Bad Things(tm). A loud
  warning would be in order, I believe.

- If you're on a system that doesn't allow GICR_CTLR.Enable_LPIs to be
  cleared, we end-up going down the polling path for nothing. It'd be
  worth checking that the bit can be cleared the first place (and
  shout again if it cannot).

- From a cosmetic PoV, please move this to a redist_disable_lpis()
  function.

Thanks,

	M.
Mark Rutland March 14, 2018, 10:38 a.m. UTC | #2
On Tue, Mar 13, 2018 at 07:50:01PM -0500, Shanker Donthineni wrote:
> The definition of the GICR_CTLR.RWP control bit was expanded to indicate
> status of changing GICR_CTLR.EnableLPI from 1 to 0 is being in progress
> or completed. Software must observe GICR_CTLR.RWP==0 after clearing
> GICR_CTLR.EnableLPI from 1 to 0 and before writing GICR_PENDBASER and/or
> GICR_PROPBASER, otherwise behavior is UNPREDICTABLE.

> +	/* Make sure LPIs are disabled before programming PEND/PROP registers */
> +	if (val & GICR_CTLR_ENABLE_LPIS) {
> +		u32 count = 1000000; /* 1s! */

Please use USEC_PER_SEC from <linux/time64.h>.

> +		/* Wait for GICR_CTLR.RWP==0 or timeout */
> +		while (readl_relaxed(rbase + GICR_CTLR) & GICR_CTLR_RWP) {
> +			if (!count) {
> +				pr_err("CPU%d: Failed to disable LPIs\n",
> +				       smp_processor_id());
> +				return;
> +			}
> +			cpu_relax();
> +			udelay(1);
> +			count--;
> +		};

Please use readl_relaxed_poll_timeout() from <linux/iopoll.h>.

	/* Wait for GICR_CTLR.RWP==0 or timeout */
	ret = readl_relaxed_poll_timeout(rbase + GICR_CTLR, reg,
					 !(reg & GICR_CTLR_RWP), 1,
					 USEC_PER_SEC);
	if (ret) {
		pr_err("CPU%d: Failed to disable LPIs\n",
			smp_processor_id());
		return;
	}

Thanks,
Mark.
Shanker Donthineni March 14, 2018, 1:33 p.m. UTC | #3
Hi Marc,

On 03/14/2018 02:41 AM, Marc Zyngier wrote:
> Hi Shanker,
> 
> On Wed, 14 Mar 2018 00:50:01 +0000,
> Shanker Donthineni wrote:
>>
>> The definition of the GICR_CTLR.RWP control bit was expanded to indicate
>> status of changing GICR_CTLR.EnableLPI from 1 to 0 is being in progress
>> or completed. Software must observe GICR_CTLR.RWP==0 after clearing
>> GICR_CTLR.EnableLPI from 1 to 0 and before writing GICR_PENDBASER and/or
>> GICR_PROPBASER, otherwise behavior is UNPREDICTABLE.
>>
>> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
>> ---
>>  drivers/irqchip/irq-gic-v3-its.c   | 30 +++++++++++++++++++++++-------
>>  include/linux/irqchip/arm-gic-v3.h |  1 +
>>  2 files changed, 24 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index 1d3056f..85cd158 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -1875,15 +1875,31 @@ static void its_cpu_init_lpis(void)
>>  		gic_data_rdist()->pend_page = pend_page;
>>  	}
>>  
>> -	/* Disable LPIs */
>>  	val = readl_relaxed(rbase + GICR_CTLR);
>> -	val &= ~GICR_CTLR_ENABLE_LPIS;
>> -	writel_relaxed(val, rbase + GICR_CTLR);
>>  
>> -	/*
>> -	 * Make sure any change to the table is observable by the GIC.
>> -	 */
>> -	dsb(sy);
>> +	/* Make sure LPIs are disabled before programming PEND/PROP registers */
>> +	if (val & GICR_CTLR_ENABLE_LPIS) {
>> +		u32 count = 1000000; /* 1s! */
>> +
>> +		/* Disable LPIs */
>> +		val &= ~GICR_CTLR_ENABLE_LPIS;
>> +		writel_relaxed(val, rbase + GICR_CTLR);
>> +
>> +		/* Make sure any change to GICR_CTLR is observable by the GIC */
>> +		dsb(sy);
>> +
>> +		/* Wait for GICR_CTLR.RWP==0 or timeout */
>> +		while (readl_relaxed(rbase + GICR_CTLR) & GICR_CTLR_RWP) {
>> +			if (!count) {
>> +				pr_err("CPU%d: Failed to disable LPIs\n",
>> +				       smp_processor_id());
>> +				return;
>> +			}
>> +			cpu_relax();
>> +			udelay(1);
>> +			count--;
>> +		};
>> +	}
> 
> I can see a couple of issues with this patch:
> 
> - Entering the kernel with GICR_CTLR.EnableLPIs set is a recipe for
>   memory corruption and is likely to lead to Bad Things(tm). A loud
>   warning would be in order, I believe.
> 

I agree with you entering kernel with GICR_CTLR.EnableLPI=1 causes many
issues. Unfortunately this is happening with KDUMP/KEXEC case. We don't
disable GICD, GICRs and ITSs before loading the 2nd kernel. 

> - If you're on a system that doesn't allow GICR_CTLR.Enable_LPIs to be
>   cleared, we end-up going down the polling path for nothing. It'd be
>   worth checking that the bit can be cleared the first place (and
>   shout again if it cannot).
> 

This tells the bug in hardware but not in software, as per per spec it
should be able to cleared by software. Any suggestions how software knows
GICR_CTLR.EnableLPI bit can be cleared from enabled state.

> - From a cosmetic PoV, please move this to a redist_disable_lpis()
>   function.
> 
Sure, I'll move.

> Thanks,
> 
> 	M.
>
Marc Zyngier March 14, 2018, 1:50 p.m. UTC | #4
On 14/03/18 13:33, Shanker Donthineni wrote:
> Hi Marc,
> 
> On 03/14/2018 02:41 AM, Marc Zyngier wrote:
>> Hi Shanker,
>>
>> On Wed, 14 Mar 2018 00:50:01 +0000,
>> Shanker Donthineni wrote:
>>>
>>> The definition of the GICR_CTLR.RWP control bit was expanded to indicate
>>> status of changing GICR_CTLR.EnableLPI from 1 to 0 is being in progress
>>> or completed. Software must observe GICR_CTLR.RWP==0 after clearing
>>> GICR_CTLR.EnableLPI from 1 to 0 and before writing GICR_PENDBASER and/or
>>> GICR_PROPBASER, otherwise behavior is UNPREDICTABLE.
>>>
>>> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
>>> ---
>>>  drivers/irqchip/irq-gic-v3-its.c   | 30 +++++++++++++++++++++++-------
>>>  include/linux/irqchip/arm-gic-v3.h |  1 +
>>>  2 files changed, 24 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>>> index 1d3056f..85cd158 100644
>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>> @@ -1875,15 +1875,31 @@ static void its_cpu_init_lpis(void)
>>>  		gic_data_rdist()->pend_page = pend_page;
>>>  	}
>>>  
>>> -	/* Disable LPIs */
>>>  	val = readl_relaxed(rbase + GICR_CTLR);
>>> -	val &= ~GICR_CTLR_ENABLE_LPIS;
>>> -	writel_relaxed(val, rbase + GICR_CTLR);
>>>  
>>> -	/*
>>> -	 * Make sure any change to the table is observable by the GIC.
>>> -	 */
>>> -	dsb(sy);
>>> +	/* Make sure LPIs are disabled before programming PEND/PROP registers */
>>> +	if (val & GICR_CTLR_ENABLE_LPIS) {
>>> +		u32 count = 1000000; /* 1s! */
>>> +
>>> +		/* Disable LPIs */
>>> +		val &= ~GICR_CTLR_ENABLE_LPIS;
>>> +		writel_relaxed(val, rbase + GICR_CTLR);
>>> +
>>> +		/* Make sure any change to GICR_CTLR is observable by the GIC */
>>> +		dsb(sy);
>>> +
>>> +		/* Wait for GICR_CTLR.RWP==0 or timeout */
>>> +		while (readl_relaxed(rbase + GICR_CTLR) & GICR_CTLR_RWP) {
>>> +			if (!count) {
>>> +				pr_err("CPU%d: Failed to disable LPIs\n",
>>> +				       smp_processor_id());
>>> +				return;
>>> +			}
>>> +			cpu_relax();
>>> +			udelay(1);
>>> +			count--;
>>> +		};
>>> +	}
>>
>> I can see a couple of issues with this patch:
>>
>> - Entering the kernel with GICR_CTLR.EnableLPIs set is a recipe for
>>   memory corruption and is likely to lead to Bad Things(tm). A loud
>>   warning would be in order, I believe.
>>
> 
> I agree with you entering kernel with GICR_CTLR.EnableLPI=1 causes many
> issues. Unfortunately this is happening with KDUMP/KEXEC case. We don't
> disable GICD, GICRs and ITSs before loading the 2nd kernel. 

This is an orthogonal issue, and I'm working on an irqchip reset
infrastructure that would allow the GIC (and any other irqchip) to be
properly torn down on kexec. For kdump, there is nothing that can be
done, and we will rely on the secondary kernel to cleanup things, if at
all possible.

>> - If you're on a system that doesn't allow GICR_CTLR.Enable_LPIs to be
>>   cleared, we end-up going down the polling path for nothing. It'd be
>>   worth checking that the bit can be cleared the first place (and
>>   shout again if it cannot).
>>
> 
> This tells the bug in hardware but not in software, as per per spec it
> should be able to cleared by software. Any suggestions how software knows
> GICR_CTLR.EnableLPI bit can be cleared from enabled state.

That's not what the spec says: "After it has been written to 1, it is
IMPLEMENTATION DEFINED whether the bit becomes RES 1 or can be cleared
by to 0.".

So both implementations are valid. One is clearly superior to the other,
but we still need to deal with it. What you need to do is to try and
clear the EnableLPIs bit, and then check that it has actually cleared.
If it has, you start polling RWP. If it hasn't, you scream because the
system is likely to be broken (you shouldn't have used kexec/kdump the
first place).

Thanks,

	M.
diff mbox

Patch

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 1d3056f..85cd158 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1875,15 +1875,31 @@  static void its_cpu_init_lpis(void)
 		gic_data_rdist()->pend_page = pend_page;
 	}
 
-	/* Disable LPIs */
 	val = readl_relaxed(rbase + GICR_CTLR);
-	val &= ~GICR_CTLR_ENABLE_LPIS;
-	writel_relaxed(val, rbase + GICR_CTLR);
 
-	/*
-	 * Make sure any change to the table is observable by the GIC.
-	 */
-	dsb(sy);
+	/* Make sure LPIs are disabled before programming PEND/PROP registers */
+	if (val & GICR_CTLR_ENABLE_LPIS) {
+		u32 count = 1000000; /* 1s! */
+
+		/* Disable LPIs */
+		val &= ~GICR_CTLR_ENABLE_LPIS;
+		writel_relaxed(val, rbase + GICR_CTLR);
+
+		/* Make sure any change to GICR_CTLR is observable by the GIC */
+		dsb(sy);
+
+		/* Wait for GICR_CTLR.RWP==0 or timeout */
+		while (readl_relaxed(rbase + GICR_CTLR) & GICR_CTLR_RWP) {
+			if (!count) {
+				pr_err("CPU%d: Failed to disable LPIs\n",
+				       smp_processor_id());
+				return;
+			}
+			cpu_relax();
+			udelay(1);
+			count--;
+		};
+	}
 
 	/* set PROPBASE */
 	val = (page_to_phys(gic_rdists->prop_page) |
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index c00c4c33..4d5fb60 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -106,6 +106,7 @@ 
 #define GICR_PIDR2			GICD_PIDR2
 
 #define GICR_CTLR_ENABLE_LPIS		(1UL << 0)
+#define GICR_CTLR_RWP			(1UL << 3)
 
 #define GICR_TYPER_CPU_NUMBER(r)	(((r) >> 8) & 0xffff)