diff mbox

x86/mm: Make PV linear pagetables optional

Message ID 20171017171037.436-1-george.dunlap@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

George Dunlap Oct. 17, 2017, 5:10 p.m. UTC
Allowing pagetables to point to other pagetables of the same level
(often called 'linear pagetables') has been included in Xen since its
inception; but recently it has been the source of a number of subtle
reference-counting bugs.

It is not used by Linux or MiniOS; but it used used by NetBSD and
Novell Netware.  There are significant numbers of people who are never
going to use the feature, along with significant numbers who need the
feature.

Add a Kconfig option for the feature (default to 'y').  Also add a
command-line option to control whether PV linear pagetables are
allowed (default to 'true').

In order to make the code clean:
- Introduce LPT_ASSERT(), which only exists if CONFIG_PV_LINEAR_PT is defined
- Introduce zero_linear_entries() to set page->linear_pt_count to zero
  (or do nothing, as appropriate)

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
Changes since XSA
- Add a Kconfig option
- Default to 'on' (rather than 'off').

Release justification: This was originally part of a security fix
embargoed until after the freeze date; it wasn't checked in with the
other security patches in order to allow a discussion about the
default.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Konrad Wilk <konrad.wilk@oracle.com>
CC: Julien Grall <julien.grall@arm.com>
---
 docs/misc/xen-command-line.markdown | 16 ++++++++++++++++
 xen/arch/Kconfig                    |  1 +
 xen/arch/arm/mm.c                   |  1 +
 xen/arch/x86/Kconfig                | 21 ++++++++++++++++++++
 xen/arch/x86/mm.c                   | 38 +++++++++++++++++++++++++++++++++----
 xen/include/asm-x86/mm.h            |  5 +++++
 6 files changed, 78 insertions(+), 4 deletions(-)

Comments

Andrew Cooper Oct. 17, 2017, 6:05 p.m. UTC | #1
On 17/10/17 18:10, George Dunlap wrote:
> Allowing pagetables to point to other pagetables of the same level
> (often called 'linear pagetables') has been included in Xen since its
> inception; but recently it has been the source of a number of subtle
> reference-counting bugs.
>
> It is not used by Linux or MiniOS; but it used used by NetBSD and
> Novell Netware.  There are significant numbers of people who are never
> going to use the feature, along with significant numbers who need the
> feature.
>
> Add a Kconfig option for the feature (default to 'y').  Also add a
> command-line option to control whether PV linear pagetables are
> allowed (default to 'true').
>
> In order to make the code clean:
> - Introduce LPT_ASSERT(), which only exists if CONFIG_PV_LINEAR_PT is defined
> - Introduce zero_linear_entries() to set page->linear_pt_count to zero
>   (or do nothing, as appropriate)
>
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>

Definitely +1 to this kind of arrangement of user choices.  Some notes
below.

> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index eb4995e68b..952368d3be 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1422,6 +1422,22 @@ The following resources are available:
>      CDP, one COS will corespond two CBMs other than one with CAT, due to the
>      sum of CBMs is fixed, that means actual `cos_max` in use will automatically
>      reduce to half when CDP is enabled.
> +	
> +### pv-linear-pt
> +> `= <boolean>`
> +
> +> Default: `false`

Only available if Xen is compiled with CONFIG_PV_LINEAR_PT support enabled.

> +
> +Allow PV guests to have pagetable entries pointing to other pagetables
> +of the same level (i.e., allowing L2 PTEs to point to other L2 pages).
> +This technique is often called "linear pagetables", and is sometimes
> +used to allow operating systems a simple way to consistently map the
> +current process's pagetables into its own virtual address space.
> +
> +Linux and MiniOS don't use this technique.  NetBSD and Novell Netware
> +do; there may be other custom operating systems which do.  If you're
> +certain you don't plan on having PV guests which use this feature,
> +turning it off can reduce the attack surface.
>  
>  ### rcu-idle-timer-period-ms
>  > `= <integer>`
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 62d313e3f5..5881b64608 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -654,6 +660,9 @@ static void dec_linear_uses(struct page_info *pg)
>   *     frame if it is mapped by a different root table. This is sufficient and
>   *     also necessary to allow validation of a root table mapping itself.
>   */
> +static bool __read_mostly pv_linear_pt_enable = true;
> +boolean_param("pv-linear-pt", pv_linear_pt_enable);

The _enable suffix just makes the name longer, and (semi-upheld)
convention would be for opt_pv_linear_pt, which is fine even in its used
context below.

> +
>  #define define_get_linear_pagetable(level)                                  \
>  static int                                                                  \
>  get_##level##_linear_pagetable(                                             \
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index 26f0153164..7825f36316 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -177,10 +177,15 @@ struct page_info
>           *   in use.
>           */
>          struct {
> +#ifdef CONFIG_PV_LINEAR_PT
>              u16 nr_validated_ptes:PAGETABLE_ORDER + 1;
>              u16 :16 - PAGETABLE_ORDER - 1 - 2;
>              s16 partial_pte:2;
>              s16 linear_pt_count;
> +#else
> +            u16 nr_validated_ptes;
> +            s8 partial_pte;
> +#endif

I don't think this is a clever move.  Having CONFIG_PV_LINEAR_PT change
the behaviour of nr_validated_ptes and partial_pte is a recipe for
subtle bugs.

An alternative would be to have the dec_linear_{uses,entries}()
BUG_ON(pg->linear_pt_count != 0) when !CONFIG_PV_LINEAR_PT

This way, you don't need LPT_ASSERT(), or play games with the existing
macros to avoid having them evaluate their parameters.

~Andrew
George Dunlap Oct. 18, 2017, 9:17 a.m. UTC | #2
On 10/17/2017 07:05 PM, Andrew Cooper wrote:
> On 17/10/17 18:10, George Dunlap wrote:
>> Allowing pagetables to point to other pagetables of the same level
>> (often called 'linear pagetables') has been included in Xen since its
>> inception; but recently it has been the source of a number of subtle
>> reference-counting bugs.
>>
>> It is not used by Linux or MiniOS; but it used used by NetBSD and
>> Novell Netware.  There are significant numbers of people who are never
>> going to use the feature, along with significant numbers who need the
>> feature.
>>
>> Add a Kconfig option for the feature (default to 'y').  Also add a
>> command-line option to control whether PV linear pagetables are
>> allowed (default to 'true').
>>
>> In order to make the code clean:
>> - Introduce LPT_ASSERT(), which only exists if CONFIG_PV_LINEAR_PT is defined
>> - Introduce zero_linear_entries() to set page->linear_pt_count to zero
>>   (or do nothing, as appropriate)
>>
>> Reported-by: Jann Horn <jannh@google.com>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> 
> Definitely +1 to this kind of arrangement of user choices.  Some notes
> below.
> 
>> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
>> index eb4995e68b..952368d3be 100644
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1422,6 +1422,22 @@ The following resources are available:
>>      CDP, one COS will corespond two CBMs other than one with CAT, due to the
>>      sum of CBMs is fixed, that means actual `cos_max` in use will automatically
>>      reduce to half when CDP is enabled.
>> +	
>> +### pv-linear-pt
>> +> `= <boolean>`
>> +
>> +> Default: `false`
> 
> Only available if Xen is compiled with CONFIG_PV_LINEAR_PT support enabled.

Ack

> 
>> +
>> +Allow PV guests to have pagetable entries pointing to other pagetables
>> +of the same level (i.e., allowing L2 PTEs to point to other L2 pages).
>> +This technique is often called "linear pagetables", and is sometimes
>> +used to allow operating systems a simple way to consistently map the
>> +current process's pagetables into its own virtual address space.
>> +
>> +Linux and MiniOS don't use this technique.  NetBSD and Novell Netware
>> +do; there may be other custom operating systems which do.  If you're
>> +certain you don't plan on having PV guests which use this feature,
>> +turning it off can reduce the attack surface.
>>  
>>  ### rcu-idle-timer-period-ms
>>  > `= <integer>`
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index 62d313e3f5..5881b64608 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -654,6 +660,9 @@ static void dec_linear_uses(struct page_info *pg)
>>   *     frame if it is mapped by a different root table. This is sufficient and
>>   *     also necessary to allow validation of a root table mapping itself.
>>   */
>> +static bool __read_mostly pv_linear_pt_enable = true;
>> +boolean_param("pv-linear-pt", pv_linear_pt_enable);
> 
> The _enable suffix just makes the name longer, and (semi-upheld)
> convention would be for opt_pv_linear_pt, which is fine even in its used
> context below.

Ack

> 
>> +
>>  #define define_get_linear_pagetable(level)                                  \
>>  static int                                                                  \
>>  get_##level##_linear_pagetable(                                             \
>> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
>> index 26f0153164..7825f36316 100644
>> --- a/xen/include/asm-x86/mm.h
>> +++ b/xen/include/asm-x86/mm.h
>> @@ -177,10 +177,15 @@ struct page_info
>>           *   in use.
>>           */
>>          struct {
>> +#ifdef CONFIG_PV_LINEAR_PT
>>              u16 nr_validated_ptes:PAGETABLE_ORDER + 1;
>>              u16 :16 - PAGETABLE_ORDER - 1 - 2;
>>              s16 partial_pte:2;
>>              s16 linear_pt_count;
>> +#else
>> +            u16 nr_validated_ptes;
>> +            s8 partial_pte;
>> +#endif
> 
> I don't think this is a clever move.  Having CONFIG_PV_LINEAR_PT change
> the behaviour of nr_validated_ptes and partial_pte is a recipe for
> subtle bugs.
>
> An alternative would be to have the dec_linear_{uses,entries}()
> BUG_ON(pg->linear_pt_count != 0) when !CONFIG_PV_LINEAR_PT

Oh, I just noticed this was a union; so cutting out linear_pt_count
doesn't actually save you any space.

Yeah, in that case, leaving it in and adding ASSERTs that it's 0 makes
sense.  (I think an ASSERT is better than a BUG_ON() in this case.)

 -George
Jan Beulich Oct. 18, 2017, 9:39 a.m. UTC | #3
>>> On 17.10.17 at 19:10, <george.dunlap@citrix.com> wrote:
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1422,6 +1422,22 @@ The following resources are available:
>      CDP, one COS will corespond two CBMs other than one with CAT, due to the
>      sum of CBMs is fixed, that means actual `cos_max` in use will automatically
>      reduce to half when CDP is enabled.
> +	
> +### pv-linear-pt
> +> `= <boolean>`
> +
> +> Default: `false`

This looks to be wrong now.

> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -97,6 +97,27 @@ config TBOOT
>  	  Technology (TXT)
>  
>  	  If unsure, say Y.
> +
> +config PV_LINEAR_PT
> +       bool "Support for PV linear pagetables"
> +       depends on PV

For this to look reasonable in a hierarchical menu, it should follow
PV (with - if there were any - only other options also depending on
PV in between) rather than being added at a random place.

> +       default y
> +       ---help---
> +         Linear pagetables (also called "recursive pagetables") refers
> +	 to the practice of a guest operating system having pagetable

The two lines above should match in how they're being indented.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -587,6 +587,12 @@ static void put_data_page(
>          put_page(page);
>  }
>  
> +#ifdef CONFIG_PV_LINEAR_PT
> +static void zero_linear_entries(struct page_info *pg)

When framing multiple functions, I think it is better to have a blank
line between #ifdef and first piece of code (as well as around the
#else and prior to the #endif), and I think the #else and #endif
would also benefit from having /* PV_LINEAR_PT */ or some such
added on their lines.

> @@ -719,6 +735,20 @@ get_##level##_linear_pagetable(                                             \
>                                                                              \
>      return 1;                                                               \
>  }
> +#define LPT_ASSERT ASSERT
> +#else
> +#define define_get_linear_pagetable(level)                              \
> +static int                                                              \
> +get_##level##_linear_pagetable(                                         \
> +        level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d) \
> +{                                                                       \
> +        return 0;                                                       \
> +}
> +#define zero_linear_entries(pg)
> +#define dec_linear_uses(pg)
> +#define dec_linear_entries(pg)

Would perhaps be better if these evaluated their arguments.

> +#define LPT_ASSERT(x)
> +#endif
>  
>  
>  bool is_iomem_page(mfn_t mfn)

Could you arrange for the double blank lines to go away here with
the blank line additions asked for above?

> @@ -2330,8 +2360,8 @@ static int _put_page_type(struct page_info *page, bool preemptible,
>                   * necessary anymore for a dying domain.
>                   */
>                  ASSERT(page_get_owner(page)->is_dying);
> -                ASSERT(page->linear_pt_count < 0);
> -                ASSERT(ptpg->linear_pt_count > 0);
> +                LPT_ASSERT(page->linear_pt_count < 0);
> +                LPT_ASSERT(ptpg->linear_pt_count > 0);

Other than Andrew has suggested, with these I don't think
LPT_ASSERT() can go away, unless you played tricks and forced
the function's ptpg to be NULL regardless of caller, or unless you
put the entire if() into an #ifdef.

Jan
George Dunlap Oct. 18, 2017, 9:52 a.m. UTC | #4
On 10/18/2017 10:39 AM, Jan Beulich wrote:
>>>> On 17.10.17 at 19:10, <george.dunlap@citrix.com> wrote:
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1422,6 +1422,22 @@ The following resources are available:
>>      CDP, one COS will corespond two CBMs other than one with CAT, due to the
>>      sum of CBMs is fixed, that means actual `cos_max` in use will automatically
>>      reduce to half when CDP is enabled.
>> +	
>> +### pv-linear-pt
>> +> `= <boolean>`
>> +
>> +> Default: `false`
> 
> This looks to be wrong now.
> 
>> --- a/xen/arch/x86/Kconfig
>> +++ b/xen/arch/x86/Kconfig
>> @@ -97,6 +97,27 @@ config TBOOT
>>  	  Technology (TXT)
>>  
>>  	  If unsure, say Y.
>> +
>> +config PV_LINEAR_PT
>> +       bool "Support for PV linear pagetables"
>> +       depends on PV
> 
> For this to look reasonable in a hierarchical menu, it should follow
> PV (with - if there were any - only other options also depending on
> PV in between) rather than being added at a random place.

AFAICT there's no way to select PV or HVM options in the menu at the
moment.  I could move this below the 'PV' option in case that should
ever change.

>> +       default y
>> +       ---help---
>> +         Linear pagetables (also called "recursive pagetables") refers
>> +	 to the practice of a guest operating system having pagetable
> 
> The two lines above should match in how they're being indented.

Gah -- this isn't a .c file so my .c style isn't being applied.  Let me
see what I can do.

> 
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -587,6 +587,12 @@ static void put_data_page(
>>          put_page(page);
>>  }
>>  
>> +#ifdef CONFIG_PV_LINEAR_PT
>> +static void zero_linear_entries(struct page_info *pg)
> 
> When framing multiple functions, I think it is better to have a blank
> line between #ifdef and first piece of code (as well as around the
> #else and prior to the #endif), and I think the #else and #endif
> would also benefit from having /* PV_LINEAR_PT */ or some such
> added on their lines.

Ack

> 
>> @@ -719,6 +735,20 @@ get_##level##_linear_pagetable(                                             \
>>                                                                              \
>>      return 1;                                                               \
>>  }
>> +#define LPT_ASSERT ASSERT
>> +#else
>> +#define define_get_linear_pagetable(level)                              \
>> +static int                                                              \
>> +get_##level##_linear_pagetable(                                         \
>> +        level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d) \
>> +{                                                                       \
>> +        return 0;                                                       \
>> +}
>> +#define zero_linear_entries(pg)
>> +#define dec_linear_uses(pg)
>> +#define dec_linear_entries(pg)
> 
> Would perhaps be better if these evaluated their arguments.

Following Andy's suggestion I'm changing them to static inlines and
adding an ASSERT().

>> +#define LPT_ASSERT(x)
>> +#endif
>>  
>>  
>>  bool is_iomem_page(mfn_t mfn)
> 
> Could you arrange for the double blank lines to go away here with
> the blank line additions asked for above?

Ack

> 
>> @@ -2330,8 +2360,8 @@ static int _put_page_type(struct page_info *page, bool preemptible,
>>                   * necessary anymore for a dying domain.
>>                   */
>>                  ASSERT(page_get_owner(page)->is_dying);
>> -                ASSERT(page->linear_pt_count < 0);
>> -                ASSERT(ptpg->linear_pt_count > 0);
>> +                LPT_ASSERT(page->linear_pt_count < 0);
>> +                LPT_ASSERT(ptpg->linear_pt_count > 0);
> 
> Other than Andrew has suggested, with these I don't think
> LPT_ASSERT() can go away, unless you played tricks and forced
> the function's ptpg to be NULL regardless of caller, or unless you
> put the entire if() into an #ifdef.

Good point -- I'll see what I can do.

 -George
Jan Beulich Oct. 18, 2017, 9:58 a.m. UTC | #5
>>> On 18.10.17 at 11:52, <george.dunlap@citrix.com> wrote:
> On 10/18/2017 10:39 AM, Jan Beulich wrote:
>>>>> On 17.10.17 at 19:10, <george.dunlap@citrix.com> wrote:
>>> --- a/xen/arch/x86/Kconfig
>>> +++ b/xen/arch/x86/Kconfig
>>> @@ -97,6 +97,27 @@ config TBOOT
>>>  	  Technology (TXT)
>>>  
>>>  	  If unsure, say Y.
>>> +
>>> +config PV_LINEAR_PT
>>> +       bool "Support for PV linear pagetables"
>>> +       depends on PV
>> 
>> For this to look reasonable in a hierarchical menu, it should follow
>> PV (with - if there were any - only other options also depending on
>> PV in between) rather than being added at a random place.
> 
> AFAICT there's no way to select PV or HVM options in the menu at the
> moment.  I could move this below the 'PV' option in case that should
> ever change.

Yes, please do. The mid to long term intention is for PV to become
selectable.

Jan
George Dunlap Oct. 18, 2017, 10:22 a.m. UTC | #6
On 10/18/2017 10:39 AM, Jan Beulich wrote:
>>>> On 17.10.17 at 19:10, <george.dunlap@citrix.com> wrote:
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1422,6 +1422,22 @@ The following resources are available:
>>      CDP, one COS will corespond two CBMs other than one with CAT, due to the
>>      sum of CBMs is fixed, that means actual `cos_max` in use will automatically
>>      reduce to half when CDP is enabled.
>> +	
>> +### pv-linear-pt
>> +> `= <boolean>`
>> +
>> +> Default: `false`
> 
> This looks to be wrong now.
> 
>> --- a/xen/arch/x86/Kconfig
>> +++ b/xen/arch/x86/Kconfig
>> @@ -97,6 +97,27 @@ config TBOOT
>>  	  Technology (TXT)
>>  
>>  	  If unsure, say Y.
>> +
>> +config PV_LINEAR_PT
>> +       bool "Support for PV linear pagetables"
>> +       depends on PV
> 
> For this to look reasonable in a hierarchical menu, it should follow
> PV (with - if there were any - only other options also depending on
> PV in between) rather than being added at a random place.
> 
>> +       default y
>> +       ---help---
>> +         Linear pagetables (also called "recursive pagetables") refers
>> +	 to the practice of a guest operating system having pagetable
> 
> The two lines above should match in how they're being indented.
> 
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -587,6 +587,12 @@ static void put_data_page(
>>          put_page(page);
>>  }
>>  
>> +#ifdef CONFIG_PV_LINEAR_PT
>> +static void zero_linear_entries(struct page_info *pg)
> 
> When framing multiple functions, I think it is better to have a blank
> line between #ifdef and first piece of code (as well as around the
> #else and prior to the #endif), and I think the #else and #endif
> would also benefit from having /* PV_LINEAR_PT */ or some such
> added on their lines.
> 
>> @@ -719,6 +735,20 @@ get_##level##_linear_pagetable(                                             \
>>                                                                              \
>>      return 1;                                                               \
>>  }
>> +#define LPT_ASSERT ASSERT
>> +#else
>> +#define define_get_linear_pagetable(level)                              \
>> +static int                                                              \
>> +get_##level##_linear_pagetable(                                         \
>> +        level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d) \
>> +{                                                                       \
>> +        return 0;                                                       \
>> +}
>> +#define zero_linear_entries(pg)
>> +#define dec_linear_uses(pg)
>> +#define dec_linear_entries(pg)
> 
> Would perhaps be better if these evaluated their arguments.
> 
>> +#define LPT_ASSERT(x)
>> +#endif
>>  
>>  
>>  bool is_iomem_page(mfn_t mfn)
> 
> Could you arrange for the double blank lines to go away here with
> the blank line additions asked for above?
> 
>> @@ -2330,8 +2360,8 @@ static int _put_page_type(struct page_info *page, bool preemptible,
>>                   * necessary anymore for a dying domain.
>>                   */
>>                  ASSERT(page_get_owner(page)->is_dying);
>> -                ASSERT(page->linear_pt_count < 0);
>> -                ASSERT(ptpg->linear_pt_count > 0);
>> +                LPT_ASSERT(page->linear_pt_count < 0);
>> +                LPT_ASSERT(ptpg->linear_pt_count > 0);
> 
> Other than Andrew has suggested, with these I don't think
> LPT_ASSERT() can go away, unless you played tricks and forced
> the function's ptpg to be NULL regardless of caller, or unless you
> put the entire if() into an #ifdef.

Actually, coming back to this -- if we disable linear pagetables, how
can it ever be the case that "PGT_type_equal(x,
ptpg->u.inuse.type_info)" evaluates to true?  The ASSERT()s should never
be executed.

OTOH, if we know the code will be completely unused (and that the
compiler won't know), it's probably a better idea to just block it out
entirely anyway (and maybe add a BUG_ON() the types being equal).

 -George
Jan Beulich Oct. 18, 2017, 1:21 p.m. UTC | #7
>>> On 18.10.17 at 12:22, <george.dunlap@citrix.com> wrote:
> On 10/18/2017 10:39 AM, Jan Beulich wrote:
>>>>> On 17.10.17 at 19:10, <george.dunlap@citrix.com> wrote:
>>> @@ -2330,8 +2360,8 @@ static int _put_page_type(struct page_info *page, bool preemptible,
>>>                   * necessary anymore for a dying domain.
>>>                   */
>>>                  ASSERT(page_get_owner(page)->is_dying);
>>> -                ASSERT(page->linear_pt_count < 0);
>>> -                ASSERT(ptpg->linear_pt_count > 0);
>>> +                LPT_ASSERT(page->linear_pt_count < 0);
>>> +                LPT_ASSERT(ptpg->linear_pt_count > 0);
>> 
>> Other than Andrew has suggested, with these I don't think
>> LPT_ASSERT() can go away, unless you played tricks and forced
>> the function's ptpg to be NULL regardless of caller, or unless you
>> put the entire if() into an #ifdef.
> 
> Actually, coming back to this -- if we disable linear pagetables, how
> can it ever be the case that "PGT_type_equal(x,
> ptpg->u.inuse.type_info)" evaluates to true?  The ASSERT()s should never
> be executed.

Oh, indeed. I.e. taking it from the other angle - the ASSERT()s
being reached, it would be correct for them to always trigger.

Jan
diff mbox

Patch

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index eb4995e68b..952368d3be 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1422,6 +1422,22 @@  The following resources are available:
     CDP, one COS will corespond two CBMs other than one with CAT, due to the
     sum of CBMs is fixed, that means actual `cos_max` in use will automatically
     reduce to half when CDP is enabled.
+	
+### pv-linear-pt
+> `= <boolean>`
+
+> Default: `false`
+
+Allow PV guests to have pagetable entries pointing to other pagetables
+of the same level (i.e., allowing L2 PTEs to point to other L2 pages).
+This technique is often called "linear pagetables", and is sometimes
+used to allow operating systems a simple way to consistently map the
+current process's pagetables into its own virtual address space.
+
+Linux and MiniOS don't use this technique.  NetBSD and Novell Netware
+do; there may be other custom operating systems which do.  If you're
+certain you don't plan on having PV guests which use this feature,
+turning it off can reduce the attack surface.
 
 ### rcu-idle-timer-period-ms
 > `= <integer>`
diff --git a/xen/arch/Kconfig b/xen/arch/Kconfig
index cf0acb7e89..47287a4985 100644
--- a/xen/arch/Kconfig
+++ b/xen/arch/Kconfig
@@ -6,3 +6,4 @@  config NR_CPUS
 	default "128" if ARM
 	---help---
 	  Specifies the maximum number of physical CPUs which Xen will support.
+
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 3c328e2df5..199155fcd8 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -42,6 +42,7 @@ 
 #include <xen/libfdt/libfdt.h>
 #include <asm/setup.h>
 
+
 struct domain *dom_xen, *dom_io, *dom_cow;
 
 /* Override macros from asm/page.h to make them work with mfn_t */
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 64955dc017..e2fcbaf5cc 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -97,6 +97,27 @@  config TBOOT
 	  Technology (TXT)
 
 	  If unsure, say Y.
+
+config PV_LINEAR_PT
+       bool "Support for PV linear pagetables"
+       depends on PV
+       default y
+       ---help---
+         Linear pagetables (also called "recursive pagetables") refers
+	 to the practice of a guest operating system having pagetable
+	 entries pointing to other pagetables of the same level (i.e.,
+	 allowing L2 PTEs to point to other L2 pages).  Some operating
+	 systems use it as a simple way to consisently map the current
+	 process's pagetables into its own virtual address space.
+
+	 Linux and MiniOS don't use this technique.  NetBSD and Novell
+	 Netware do; there may be other custom operating systems which
+	 do.  If you're certain you don't plan on having PV guests
+	 which use this feature, turning it off can reduce the attack
+	 surface.
+
+	 If unsure, say Y.
+
 endmenu
 
 source "common/Kconfig"
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 62d313e3f5..5881b64608 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -587,6 +587,12 @@  static void put_data_page(
         put_page(page);
 }
 
+#ifdef CONFIG_PV_LINEAR_PT
+static void zero_linear_entries(struct page_info *pg)
+{
+    pg->linear_pt_count = 0;
+}
+
 static bool inc_linear_entries(struct page_info *pg)
 {
     typeof(pg->linear_pt_count) nc = read_atomic(&pg->linear_pt_count), oc;
@@ -654,6 +660,9 @@  static void dec_linear_uses(struct page_info *pg)
  *     frame if it is mapped by a different root table. This is sufficient and
  *     also necessary to allow validation of a root table mapping itself.
  */
+static bool __read_mostly pv_linear_pt_enable = true;
+boolean_param("pv-linear-pt", pv_linear_pt_enable);
+
 #define define_get_linear_pagetable(level)                                  \
 static int                                                                  \
 get_##level##_linear_pagetable(                                             \
@@ -663,6 +672,13 @@  get_##level##_linear_pagetable(                                             \
     struct page_info *page;                                                 \
     unsigned long pfn;                                                      \
                                                                             \
+    if ( !pv_linear_pt_enable )                                             \
+    {                                                                       \
+        gdprintk(XENLOG_WARNING,                                            \
+                 "Attempt to create linear p.t. (feature disabled)\n");     \
+        return 0;                                                           \
+    }                                                                       \
+                                                                            \
     if ( (level##e_get_flags(pde) & _PAGE_RW) )                             \
     {                                                                       \
         gdprintk(XENLOG_WARNING,                                            \
@@ -719,6 +735,20 @@  get_##level##_linear_pagetable(                                             \
                                                                             \
     return 1;                                                               \
 }
+#define LPT_ASSERT ASSERT
+#else
+#define define_get_linear_pagetable(level)                              \
+static int                                                              \
+get_##level##_linear_pagetable(                                         \
+        level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d) \
+{                                                                       \
+        return 0;                                                       \
+}
+#define zero_linear_entries(pg)
+#define dec_linear_uses(pg)
+#define dec_linear_entries(pg)
+#define LPT_ASSERT(x)
+#endif
 
 
 bool is_iomem_page(mfn_t mfn)
@@ -2260,7 +2290,7 @@  static int _put_final_page_type(struct page_info *page, unsigned long type,
             dec_linear_uses(page);
             dec_linear_entries(ptpg);
         }
-        ASSERT(!page->linear_pt_count || page_get_owner(page)->is_dying);
+        LPT_ASSERT(!page->linear_pt_count || page_get_owner(page)->is_dying);
         set_tlbflush_timestamp(page);
         smp_wmb();
         page->u.inuse.type_info--;
@@ -2330,8 +2360,8 @@  static int _put_page_type(struct page_info *page, bool preemptible,
                  * necessary anymore for a dying domain.
                  */
                 ASSERT(page_get_owner(page)->is_dying);
-                ASSERT(page->linear_pt_count < 0);
-                ASSERT(ptpg->linear_pt_count > 0);
+                LPT_ASSERT(page->linear_pt_count < 0);
+                LPT_ASSERT(ptpg->linear_pt_count > 0);
                 ptpg = NULL;
             }
 
@@ -2508,7 +2538,7 @@  static int __get_page_type(struct page_info *page, unsigned long type,
             page->nr_validated_ptes = 0;
             page->partial_pte = 0;
         }
-        page->linear_pt_count = 0;
+        zero_linear_entries(page);
         rc = alloc_page_type(page, type, preemptible);
     }
 
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 26f0153164..7825f36316 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -177,10 +177,15 @@  struct page_info
          *   in use.
          */
         struct {
+#ifdef CONFIG_PV_LINEAR_PT
             u16 nr_validated_ptes:PAGETABLE_ORDER + 1;
             u16 :16 - PAGETABLE_ORDER - 1 - 2;
             s16 partial_pte:2;
             s16 linear_pt_count;
+#else
+            u16 nr_validated_ptes;
+            s8 partial_pte;
+#endif
         };
 
         /*