diff mbox

arm64/mm: Do not write ASID generation to ttbr0

Message ID 1512495507-23259-1-git-send-email-julien.thierry@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Julien Thierry Dec. 5, 2017, 5:38 p.m. UTC
When writing the user ASID to ttbr0, 16bit get copied to ttbr0, potentially
including part of the ASID generation in the ttbr0.ASID. If the kernel is
using less than 16bits ASIDs and the other ttbr0 bits aren't RES0, two
tasks using the same mm context might end up running with different
ttbr0.ASID values.
This would be triggered by one of the threads being scheduled before a
roll-over and the second one scheduled after a roll-over.

Pad the generation out of the 16bits of the mm id that are written to
ttbr0. Thus, what the hardware sees is what the kernel considers ASID.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Reported-by: Will Deacon <will.deacon@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
---
 arch/arm64/include/asm/mmu.h |  8 +++++++-
 arch/arm64/mm/context.c      | 21 +++++++++++++++++++--
 arch/arm64/mm/proc.S         |  3 ++-
 3 files changed, 28 insertions(+), 4 deletions(-)

--
1.9.1

Comments

James Morse Dec. 5, 2017, 7:33 p.m. UTC | #1
Hi Julien,

On 05/12/17 17:38, Julien Thierry wrote:
> When writing the user ASID to ttbr0, 16bit get copied to ttbr0, potentially
> including part of the ASID generation in the ttbr0.ASID. If the kernel is
> using less than 16bits ASIDs and the other ttbr0 bits aren't RES0, two
> tasks using the same mm context might end up running with different
> ttbr0.ASID values.
> This would be triggered by one of the threads being scheduled before a
> roll-over and the second one scheduled after a roll-over.
> 
> Pad the generation out of the 16bits of the mm id that are written to
> ttbr0. Thus, what the hardware sees is what the kernel considers ASID.

Is this to fix a system with a mix of 8/16 asid bits? Or someone being clever
and reducing asid_bits to stress rollover?

(I think we should do something like this so we can test rollover.)


> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
> index 6f40170..a7c72d4 100644
> --- a/arch/arm64/mm/context.c
> +++ b/arch/arm64/mm/context.c
> @@ -180,6 +190,13 @@ static u64 new_context(struct mm_struct *mm, unsigned int cpu)
>  	/* We're out of ASIDs, so increment the global generation count */
>  	generation = atomic64_add_return_relaxed(ASID_FIRST_VERSION,
>  						 &asid_generation);
> +
> +	/*
> +	 * It is unlikely the generation will ever overflow, but if this
> +	 * happens, let it be known strange things can occur.
> +	 */
> +	WARN_ON(generation == ASID_FIRST_VERSION);

Won't generation wrap to zero, not '1<<16'?

asid_generation-is-never-zero is one of the nasty things in this code.

In check_and_switch_context() we switch to the 'fast path' where the current
asid is usable if: the generation matches and the active_asid isn't 0 (which
indicates a rollover has occurred).

from mm/context.c::check_and_switch_context():
> 	if (!((asid ^ atomic64_read(&asid_generation)) >> asid_bits)
>	    && atomic64_xchg_relaxed(&per_cpu(active_asids, cpu), asid))
>		goto switch_mm_fastpath;


If asid_generation is ever zero, (even briefly) multiple new tasks with
different pages-tables will pass the generation test, and run with asid=0.

Short of xchg(ASID_MAX_VERSION, ASID_FIRST_VERSION), every time, just in case, I
don't think this is something we can handle.

But, this thing is layers on layers of subtle behaviour, so I may have missed
something...


Instead, do you think we can duplicate just the asid bits (asid & ASID_MASK)
into a new 16bit field in mm_context_t, which we then use instead of the ASID()
and mmid macros? (We only set a new asid in one place as returned by new_context()).

This would let context.c's asid_bits be arbitrary, as the hardware only uses the
masked value it exposes in the new field.

This doesn't solve the mixed 8/16 bit case. I think we should force those
systems to use 8bit asids via cpufeature to preserve as much of the generation
counter as possible.


Thanks,

James
Julien Thierry Dec. 6, 2017, 10:17 a.m. UTC | #2
Hi James,

On 05/12/17 19:33, James Morse wrote:
> Hi Julien,
> 
> On 05/12/17 17:38, Julien Thierry wrote:
>> When writing the user ASID to ttbr0, 16bit get copied to ttbr0, potentially
>> including part of the ASID generation in the ttbr0.ASID. If the kernel is
>> using less than 16bits ASIDs and the other ttbr0 bits aren't RES0, two
>> tasks using the same mm context might end up running with different
>> ttbr0.ASID values.
>> This would be triggered by one of the threads being scheduled before a
>> roll-over and the second one scheduled after a roll-over.
>>
>> Pad the generation out of the 16bits of the mm id that are written to
>> ttbr0. Thus, what the hardware sees is what the kernel considers ASID.
> 
> Is this to fix a system with a mix of 8/16 asid bits? Or someone being clever
> and reducing asid_bits to stress rollover?
> 

So it was motivated by the later. But hopefully the fix, pushing the 
generation always over 16bits is also suitable for hardware that mixes 
8/16bits asids. If it is not, do call it out.

> (I think we should do something like this so we can test rollover.)
> 
> 
>> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
>> index 6f40170..a7c72d4 100644
>> --- a/arch/arm64/mm/context.c
>> +++ b/arch/arm64/mm/context.c
>> @@ -180,6 +190,13 @@ static u64 new_context(struct mm_struct *mm, unsigned int cpu)
>>   	/* We're out of ASIDs, so increment the global generation count */
>>   	generation = atomic64_add_return_relaxed(ASID_FIRST_VERSION,
>>   						 &asid_generation);
>> +
>> +	/*
>> +	 * It is unlikely the generation will ever overflow, but if this
>> +	 * happens, let it be known strange things can occur.
>> +	 */
>> +	WARN_ON(generation == ASID_FIRST_VERSION);
> 
> Won't generation wrap to zero, not '1<<16'?

Yes it will! Silly me...

> 
> asid_generation-is-never-zero is one of the nasty things in this code.
> 
> In check_and_switch_context() we switch to the 'fast path' where the current
> asid is usable if: the generation matches and the active_asid isn't 0 (which
> indicates a rollover has occurred).
> 
> from mm/context.c::check_and_switch_context():
>> 	if (!((asid ^ atomic64_read(&asid_generation)) >> asid_bits)
>> 	    && atomic64_xchg_relaxed(&per_cpu(active_asids, cpu), asid))
>> 		goto switch_mm_fastpath;
> 
> 
> If asid_generation is ever zero, (even briefly) multiple new tasks with
> different pages-tables will pass the generation test, and run with asid=0.
> 
> Short of xchg(ASID_MAX_VERSION, ASID_FIRST_VERSION), every time, just in case, I
> don't think this is something we can handle.
> 
> But, this thing is layers on layers of subtle behaviour, so I may have missed
> something...
> 
> 

I had not thought of that. However I believe we checked with 48bits and 
the case of the generation overflowing would take a human life span (or 
something on that scale) of a system running and spawning continuous 
processes before reaching this. Which is why we decided on having a 
WARN_ON for now. So I don't know if we want to change things for this 
corner case which really means "things are going bad" more than anything 
else.

> Instead, do you think we can duplicate just the asid bits (asid & ASID_MASK)
> into a new 16bit field in mm_context_t, which we then use instead of the ASID()
> and mmid macros? (We only set a new asid in one place as returned by new_context()).
> 

I'm not sure I understand why this prevents running with asid = 0 when 
generation is 0.

> This would let context.c's asid_bits be arbitrary, as the hardware only uses the
> masked value it exposes in the new field.
> 
> This doesn't solve the mixed 8/16 bit case. I think we should force those
> systems to use 8bit asids via cpufeature to preserve as much of the generation
> counter as possible.
> 

Hmmm right, I see that today we just panic the kernel when we have a 
smaller asid size than the boot cpu...
So if we boot on a 8bit asid cpu it should work but not the other way 
around... I agree we probably should find a way to reduce the size of 
software asids system wide when we find a cpu has less bits available.

Thanks,
James Morse Dec. 6, 2017, 10:56 a.m. UTC | #3
Hi Julien,

On 06/12/17 10:17, Julien Thierry wrote:
> On 05/12/17 19:33, James Morse wrote:
>> On 05/12/17 17:38, Julien Thierry wrote:
>>> When writing the user ASID to ttbr0, 16bit get copied to ttbr0, potentially
>>> including part of the ASID generation in the ttbr0.ASID. If the kernel is
>>> using less than 16bits ASIDs and the other ttbr0 bits aren't RES0, two
>>> tasks using the same mm context might end up running with different
>>> ttbr0.ASID values.
>>> This would be triggered by one of the threads being scheduled before a
>>> roll-over and the second one scheduled after a roll-over.
>>>
>>> Pad the generation out of the 16bits of the mm id that are written to
>>> ttbr0. Thus, what the hardware sees is what the kernel considers ASID.
>>
>> Is this to fix a system with a mix of 8/16 asid bits? Or someone being clever
>> and reducing asid_bits to stress rollover?

> So it was motivated by the later. But hopefully the fix, pushing the generation
> always over 16bits is also suitable for hardware that mixes 8/16bits asids. If
> it is not, do call it out.

(okay, wasn't clear from the commit message!)


>>> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
>>> index 6f40170..a7c72d4 100644
>>> --- a/arch/arm64/mm/context.c
>>> +++ b/arch/arm64/mm/context.c
>>> @@ -180,6 +190,13 @@ static u64 new_context(struct mm_struct *mm, unsigned
>>> int cpu)
>>>       /* We're out of ASIDs, so increment the global generation count */
>>>       generation = atomic64_add_return_relaxed(ASID_FIRST_VERSION,
>>>                            &asid_generation);
>>> +
>>> +    /*
>>> +     * It is unlikely the generation will ever overflow, but if this
>>> +     * happens, let it be known strange things can occur.
>>> +     */
>>> +    WARN_ON(generation == ASID_FIRST_VERSION);
>>
>> Won't generation wrap to zero, not '1<<16'?
> 
> Yes it will! Silly me...
> 
>>
>> asid_generation-is-never-zero is one of the nasty things in this code.
>>
>> In check_and_switch_context() we switch to the 'fast path' where the current
>> asid is usable if: the generation matches and the active_asid isn't 0 (which
>> indicates a rollover has occurred).
>>
>> from mm/context.c::check_and_switch_context():
>>>     if (!((asid ^ atomic64_read(&asid_generation)) >> asid_bits)
>>>         && atomic64_xchg_relaxed(&per_cpu(active_asids, cpu), asid))
>>>         goto switch_mm_fastpath;
>>
>>
>> If asid_generation is ever zero, (even briefly) multiple new tasks with
>> different pages-tables will pass the generation test, and run with asid=0.
>>
>> Short of xchg(ASID_MAX_VERSION, ASID_FIRST_VERSION), every time, just in case, I
>> don't think this is something we can handle.
>>
>> But, this thing is layers on layers of subtle behaviour, so I may have missed
>> something...

> I had not thought of that. However I believe we checked with 48bits and the case
> of the generation overflowing would take a human life span (or something on that
> scale) of a system running and spawning continuous processes before reaching

This scales with the number of CPUs, and how quickly they can fork().
(Not using all the counter bits makes me nervous.)


> this. Which is why we decided on having a WARN_ON for now. So I don't know if we
> want to change things for this corner case which really means "things are going
> bad" more than anything else.

But it fires the generation after the chaos started, so if we ever do see this
it gives us false-confidence that generation-overflow wasn't the issue.
WARN_ON(generation == ASID_MAX_VERSION) ?


>> Instead, do you think we can duplicate just the asid bits (asid & ASID_MASK)
>> into a new 16bit field in mm_context_t, which we then use instead of the ASID()
>> and mmid macros? (We only set a new asid in one place as returned by
>> new_context()).
>>
> 
> I'm not sure I understand why this prevents running with asid = 0 when
> generation is 0.

Sorry, I was talking about two things. It doesn't, I don't think we can do
anything about that.

This was a suggestion for an alternative way of stopping generation bits getting
into the TTBR, which still lets use use all the counter bits. (and lets us mess
with asid_bits to stress rollover without re-inventing this bug).


>> This would let context.c's asid_bits be arbitrary, as the hardware only uses the
>> masked value it exposes in the new field.
>>
>> This doesn't solve the mixed 8/16 bit case. I think we should force those
>> systems to use 8bit asids via cpufeature to preserve as much of the generation
>> counter as possible.

> Hmmm right, I see that today we just panic the kernel when we have a smaller
> asid size than the boot cpu...
> So if we boot on a 8bit asid cpu it should work but not the other way around...
> I agree we probably should find a way to reduce the size of software asids
> system wide when we find a cpu has less bits available.

cpufeature has some stuff for 'LOWER_SAFE' that might help here..


Thanks,

James
Julien Thierry Dec. 6, 2017, 11:33 a.m. UTC | #4
On 06/12/17 10:56, James Morse wrote:
> Hi Julien,
> 
> On 06/12/17 10:17, Julien Thierry wrote:
>> On 05/12/17 19:33, James Morse wrote:
>>> On 05/12/17 17:38, Julien Thierry wrote:
>>>> When writing the user ASID to ttbr0, 16bit get copied to ttbr0, potentially
>>>> including part of the ASID generation in the ttbr0.ASID. If the kernel is
>>>> using less than 16bits ASIDs and the other ttbr0 bits aren't RES0, two
>>>> tasks using the same mm context might end up running with different
>>>> ttbr0.ASID values.
>>>> This would be triggered by one of the threads being scheduled before a
>>>> roll-over and the second one scheduled after a roll-over.
>>>>
>>>> Pad the generation out of the 16bits of the mm id that are written to
>>>> ttbr0. Thus, what the hardware sees is what the kernel considers ASID.
>>>
>>> Is this to fix a system with a mix of 8/16 asid bits? Or someone being clever
>>> and reducing asid_bits to stress rollover?
> 
>> So it was motivated by the later. But hopefully the fix, pushing the generation
>> always over 16bits is also suitable for hardware that mixes 8/16bits asids. If
>> it is not, do call it out.
> 
> (okay, wasn't clear from the commit message!)
> 

Fair point, I'll mention the mixed size hardware asids in the next version.

> 
>>>> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
>>>> index 6f40170..a7c72d4 100644
>>>> --- a/arch/arm64/mm/context.c
>>>> +++ b/arch/arm64/mm/context.c
>>>> @@ -180,6 +190,13 @@ static u64 new_context(struct mm_struct *mm, unsigned
>>>> int cpu)
>>>>        /* We're out of ASIDs, so increment the global generation count */
>>>>        generation = atomic64_add_return_relaxed(ASID_FIRST_VERSION,
>>>>                             &asid_generation);
>>>> +
>>>> +    /*
>>>> +     * It is unlikely the generation will ever overflow, but if this
>>>> +     * happens, let it be known strange things can occur.
>>>> +     */
>>>> +    WARN_ON(generation == ASID_FIRST_VERSION);
>>>
>>> Won't generation wrap to zero, not '1<<16'?
>>
>> Yes it will! Silly me...
>>
>>>
>>> asid_generation-is-never-zero is one of the nasty things in this code.
>>>
>>> In check_and_switch_context() we switch to the 'fast path' where the current
>>> asid is usable if: the generation matches and the active_asid isn't 0 (which
>>> indicates a rollover has occurred).
>>>
>>> from mm/context.c::check_and_switch_context():
>>>>      if (!((asid ^ atomic64_read(&asid_generation)) >> asid_bits)
>>>>          && atomic64_xchg_relaxed(&per_cpu(active_asids, cpu), asid))
>>>>          goto switch_mm_fastpath;
>>>
>>>
>>> If asid_generation is ever zero, (even briefly) multiple new tasks with
>>> different pages-tables will pass the generation test, and run with asid=0.
>>>
>>> Short of xchg(ASID_MAX_VERSION, ASID_FIRST_VERSION), every time, just in case, I
>>> don't think this is something we can handle.
>>>
>>> But, this thing is layers on layers of subtle behaviour, so I may have missed
>>> something...
> 
>> I had not thought of that. However I believe we checked with 48bits and the case
>> of the generation overflowing would take a human life span (or something on that
>> scale) of a system running and spawning continuous processes before reaching
> 
> This scales with the number of CPUs, and how quickly they can fork().
> (Not using all the counter bits makes me nervous.)
> 
> 
>> this. Which is why we decided on having a WARN_ON for now. So I don't know if we
>> want to change things for this corner case which really means "things are going
>> bad" more than anything else.
> 
> But it fires the generation after the chaos started, so if we ever do see this
> it gives us false-confidence that generation-overflow wasn't the issue.
> WARN_ON(generation == ASID_MAX_VERSION) ?
> 

Yes that seems fair.

> 
>>> Instead, do you think we can duplicate just the asid bits (asid & ASID_MASK)
>>> into a new 16bit field in mm_context_t, which we then use instead of the ASID()
>>> and mmid macros? (We only set a new asid in one place as returned by
>>> new_context()).
>>>
>>
>> I'm not sure I understand why this prevents running with asid = 0 when
>> generation is 0.
> 
> Sorry, I was talking about two things. It doesn't, I don't think we can do
> anything about that.
> 
> This was a suggestion for an alternative way of stopping generation bits getting
> into the TTBR, which still lets use use all the counter bits. (and lets us mess
> with asid_bits to stress rollover without re-inventing this bug).
> 

I see. Only thing that worries me is people being tempted to use this 
duplicated field (only ASID) wrongfully.

I'm not very familiar with ASIDs and mem contexts so I'd like to know 
how this suggestion sounds to others. If people agree, I'm happy to add 
this for the next version.

> 
>>> This would let context.c's asid_bits be arbitrary, as the hardware only uses the
>>> masked value it exposes in the new field.
>>>
>>> This doesn't solve the mixed 8/16 bit case. I think we should force those
>>> systems to use 8bit asids via cpufeature to preserve as much of the generation
>>> counter as possible.
> 
>> Hmmm right, I see that today we just panic the kernel when we have a smaller
>> asid size than the boot cpu...
>> So if we boot on a 8bit asid cpu it should work but not the other way around...
>> I agree we probably should find a way to reduce the size of software asids
>> system wide when we find a cpu has less bits available.
> 
> cpufeature has some stuff for 'LOWER_SAFE' that might help here..
> 

I'll have a look.

Cheers,
Suzuki K Poulose Dec. 6, 2017, 2:35 p.m. UTC | #5
On 06/12/17 11:33, Julien Thierry wrote:
> 
> 
> On 06/12/17 10:56, James Morse wrote:
>> Hi Julien,
>>
>> On 06/12/17 10:17, Julien Thierry wrote:
>>> On 05/12/17 19:33, James Morse wrote:
>>>> On 05/12/17 17:38, Julien Thierry wrote:

>>
>>>> This would let context.c's asid_bits be arbitrary, as the hardware only uses the
>>>> masked value it exposes in the new field.
>>>>
>>>> This doesn't solve the mixed 8/16 bit case. I think we should force those
>>>> systems to use 8bit asids via cpufeature to preserve as much of the generation
>>>> counter as possible.
>>
>>> Hmmm right, I see that today we just panic the kernel when we have a smaller
>>> asid size than the boot cpu...
>>> So if we boot on a 8bit asid cpu it should work but not the other way around...
>>> I agree we probably should find a way to reduce the size of software asids
>>> system wide when we find a cpu has less bits available.
>>
>> cpufeature has some stuff for 'LOWER_SAFE' that might help here..
>>
> 
> I'll have a look.

The problem with ASIDs is that you have to allocate the ASID map early in the
init, before any of the secondaries are brought up. So this is something that
has to be decided based on the boot CPU and hence is treated as an "early" feature
(without really a CAP bit). So all new CPUs should have minimum of the ASID size
as the boot CPU, failing which we panic the system, via verfiy_cpu_asid_bits().

I am not sure if you can really delay the asid_map allocation. You could probably
start there.

Cheers
Suzuki


> 
> Cheers,
>
diff mbox

Patch

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 0d34bf0..61e5436 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -16,6 +16,10 @@ 
 #ifndef __ASM_MMU_H
 #define __ASM_MMU_H

+#define ASID_MAX_BITS	16
+
+#ifndef __ASSEMBLY__
+
 #define MMCF_AARCH32	0x1	/* mm context flag for AArch32 executables */

 typedef struct {
@@ -29,7 +33,8 @@ 
  * ASID change and therefore doesn't need to reload the counter using
  * atomic64_read.
  */
-#define ASID(mm)	((mm)->context.id.counter & 0xffff)
+#define ASID(mm)							\
+	((mm)->context.id.counter & GENMASK(ASID_MAX_BITS - 1, 0))

 extern void paging_init(void);
 extern void bootmem_init(void);
@@ -41,4 +46,5 @@  extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
 extern void mark_linear_text_alias_ro(void);

+#endif /* __ASSEMBLY__ */
 #endif
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 6f40170..a7c72d4 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -37,9 +37,9 @@ 
 static DEFINE_PER_CPU(u64, reserved_asids);
 static cpumask_t tlb_flush_pending;

+#define ASID_FIRST_VERSION	(1UL << ASID_MAX_BITS)
 #define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
-#define ASID_FIRST_VERSION	(1UL << asid_bits)
-#define NUM_USER_ASIDS		ASID_FIRST_VERSION
+#define NUM_USER_ASIDS		(1UL << asid_bits)

 /* Get the ASIDBits supported by the current CPU */
 static u32 get_cpu_asid_bits(void)
@@ -60,6 +60,8 @@  static u32 get_cpu_asid_bits(void)
 		asid = 16;
 	}

+	WARN_ON(asid > ASID_MAX_BITS);
+
 	return asid;
 }

@@ -142,6 +144,14 @@  static bool check_update_reserved_asid(u64 asid, u64 newasid)
 	return hit;
 }

+/*
+ * Format of ASID is:
+ * - bits <asid_bits - 1>.. 0 -> actual ASID
+ * - bits 63..16 -> ASID generation
+ *
+ * Generation is padded to the maximum supported ASID size
+ * to avoid it being taken into account in the ttbr.
+ */
 static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 {
 	static u32 cur_idx = 1;
@@ -180,6 +190,13 @@  static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 	/* We're out of ASIDs, so increment the global generation count */
 	generation = atomic64_add_return_relaxed(ASID_FIRST_VERSION,
 						 &asid_generation);
+
+	/*
+	 * It is unlikely the generation will ever overflow, but if this
+	 * happens, let it be known strange things can occur.
+	 */
+	WARN_ON(generation == ASID_FIRST_VERSION);
+
 	flush_context(cpu);

 	/* We have more ASIDs than CPUs, so this will always succeed */
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 95233df..33c7f13 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -23,6 +23,7 @@ 
 #include <asm/assembler.h>
 #include <asm/asm-offsets.h>
 #include <asm/hwcap.h>
+#include <asm/mmu.h>
 #include <asm/pgtable.h>
 #include <asm/pgtable-hwdef.h>
 #include <asm/cpufeature.h>
@@ -140,7 +141,7 @@  ENDPROC(cpu_do_resume)
 ENTRY(cpu_do_switch_mm)
 	pre_ttbr0_update_workaround x0, x2, x3
 	mmid	x1, x1				// get mm->context.id
-	bfi	x0, x1, #48, #16		// set the ASID
+	bfi	x0, x1, #48, #ASID_MAX_BITS	// set the ASID
 	msr	ttbr0_el1, x0			// set TTBR0
 	isb
 	post_ttbr0_update_workaround