[01/10] xen/arm: vpl011: Add pl011 uart emulation in Xen

Message ID	1491212673-13476-2-git-send-email-bhupinder.thakur@linaro.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Bhupinder Thakur <bhupinder.thakur@linaro.org> To: xen-devel@lists.xenproject.org Date: Mon, 3 Apr 2017 15:14:24 +0530 Message-Id: <1491212673-13476-2-git-send-email-bhupinder.thakur@linaro.org> In-Reply-To: <1491212673-13476-1-git-send-email-bhupinder.thakur@linaro.org> References: <1491212673-13476-1-git-send-email-bhupinder.thakur@linaro.org> Cc: Wei Liu <wei.liu2@citrix.com>, Julien Grall <julien.grall@arm.com>, Stefano Stabellini <sstabellini@kernel.org>, Ian Jackson <ian.jackson@eu.citrix.com> Subject: [Xen-devel] [PATCH 01/10] xen/arm: vpl011: Add pl011 uart emulation in Xen Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Bhupinder Thakur April 3, 2017, 9:44 a.m. UTC

Add emulation code to emulate read/write access to pl011 registers
and pl011 interrupts:

    - Emulate DR read/write by reading and writing from/to the IN
      and OUT ring buffers and raising an event to the backend when
      there is data in the OUT ring buffer and injecting an interrupt
      to the guest when there is data in the IN ring buffer

    - Other registers are related to interrupt management and
      essentially control when interrupts are delivered to the guest

The SBSA compliant pl011 uart is covered in Appendix B of
https://static.docs.arm.com/den0029/a/Server_Base_System_Architecture_v3_1_ARM_DEN_0029A.pdf

Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
---
 xen/arch/arm/Kconfig             |   5 +
 xen/arch/arm/Makefile            |   1 +
 xen/arch/arm/vpl011.c            | 339 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/domain.h     |   5 +
 xen/include/asm-arm/pl011-uart.h |   2 +
 xen/include/public/arch-arm.h    |   8 +
 xen/include/xen/vpl011.h         |  67 ++++++++
 7 files changed, 427 insertions(+)
 create mode 100644 xen/arch/arm/vpl011.c
 create mode 100644 xen/include/xen/vpl011.h

Wei Liu April 12, 2017, 4:32 p.m. UTC | #1

On Mon, Apr 03, 2017 at 03:14:24PM +0530, Bhupinder Thakur wrote:
> +
> +#include <xen/init.h>
> +#include <xen/lib.h>
> +#include <xen/errno.h>
> +#include <xen/guest_access.h>
> +#include <xen/sched.h>
> +#include <xen/event.h>
> +#include <public/io/console.h>
> +#include <xen/vpl011.h>
> +#include <asm-arm/pl011-uart.h>

Normally I would place all xen headers first, then public headers, then
asm headers.  And then please sort them alphabetically.

Wei.

Stefano Stabellini April 19, 2017, 12:15 a.m. UTC | #2

On Mon, 3 Apr 2017, Bhupinder Thakur wrote:
> Add emulation code to emulate read/write access to pl011 registers
> and pl011 interrupts:
> 
>     - Emulate DR read/write by reading and writing from/to the IN
>       and OUT ring buffers and raising an event to the backend when
>       there is data in the OUT ring buffer and injecting an interrupt
>       to the guest when there is data in the IN ring buffer
> 
>     - Other registers are related to interrupt management and
>       essentially control when interrupts are delivered to the guest
> 
> The SBSA compliant pl011 uart is covered in Appendix B of
> https://static.docs.arm.com/den0029/a/Server_Base_System_Architecture_v3_1_ARM_DEN_0029A.pdf
> 
> Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
> ---
>  xen/arch/arm/Kconfig             |   5 +
>  xen/arch/arm/Makefile            |   1 +
>  xen/arch/arm/vpl011.c            | 339 +++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/domain.h     |   5 +
>  xen/include/asm-arm/pl011-uart.h |   2 +
>  xen/include/public/arch-arm.h    |   8 +
>  xen/include/xen/vpl011.h         |  67 ++++++++
>  7 files changed, 427 insertions(+)
>  create mode 100644 xen/arch/arm/vpl011.c
>  create mode 100644 xen/include/xen/vpl011.h
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index 2e023d1..bfa86c7 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -45,6 +45,11 @@ config ACPI
>  config HAS_GICV3
>  	bool
>  
> +config VPL011_CONSOLE
> +	bool "Emulated pl011 console support"
> +	default y
> +	---help---
> +	  Allows a guest to use pl011 UART as a console
>  endmenu
>  
>  menu "ARM errata workaround via the alternative framework"
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 7afb8a3..a94bdab 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -49,6 +49,7 @@ obj-y += vm_event.o
>  obj-y += vtimer.o
>  obj-y += vpsci.o
>  obj-y += vuart.o
> +obj-$(CONFIG_VPL011_CONSOLE) += vpl011.o
>  
>  #obj-bin-y += ....o
>  
> diff --git a/xen/arch/arm/vpl011.c b/xen/arch/arm/vpl011.c
> new file mode 100644
> index 0000000..eeb1cbf
> --- /dev/null
> +++ b/xen/arch/arm/vpl011.c
> @@ -0,0 +1,339 @@
> +/*
> + * arch/arm/vpl011.c
> + *
> + * Virtual PL011 UART
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/init.h>
> +#include <xen/lib.h>
> +#include <xen/errno.h>
> +#include <xen/guest_access.h>
> +#include <xen/sched.h>
> +#include <xen/event.h>
> +#include <public/io/console.h>
> +#include <xen/vpl011.h>
> +#include <asm-arm/pl011-uart.h>
> +
> +unsigned int vpl011_reg_mask[] = {0xff, 0xffff, 0xffffffff};
> +
> +static void vgic_inject_vpl011_spi(struct domain *d)
> +{
> +    struct vpl011_s *vpl011=&d->arch.vpl011;
> +
> +    if ( (vpl011->uartris & vpl011->uartimsc) )
> +        vgic_vcpu_inject_spi(d, GUEST_VPL011_SPI);
> +}
> +
> +static void vpl011_read_data(struct domain *d, uint8_t *data)
> +{
> +    unsigned long flags;
> +    struct vpl011_s *vpl011=&d->arch.vpl011;

The code style is:
    
    vpl011 = &d->arch.vpl011


> +    struct xencons_interface *intf=(struct xencons_interface *)vpl011->ring_buf;

Same here. Please fix it everywhere.


> +    /*
> +     * Initialize the data so that even if there is no data in ring buffer
> +     * 0 is returned.
> +     */
> +    *data = 0;
> +
> +    VPL011_LOCK(d, flags);
> +
> +    /*
> +     * It is expected that there will be data in the ring buffer when this 
> +     * function is called since the guest is expected to read the data register
> +     * only if the TXFE flag is not set.
> +     * If the guest still does read event when TXFE bit is set then 0 will
> +     * be returned.
> +     */
> +    if ( !VPL011_IN_RING_EMPTY(intf) )
> +    {
> +        *data = intf->in[MASK_XENCONS_IDX(intf->in_cons++, intf->in)];

For correctness, it is important to maintain the right ordering and use
barriers:

        in_cons = intf->in_cons;
        *data = intf->in[MASK_XENCONS_IDX(in_cons, intf->in)];
        smb_mb();
        intf->in_cons = in_cons + 1;


> +    }
> +
> +    if ( VPL011_IN_RING_EMPTY(intf) )
> +    {
> +        vpl011->uartfr |= (RXFE);
> +        vpl011->uartris &= ~(RXI);
> +    }
> +
> +    vpl011->uartfr &= ~(RXFF);
> +
> +    VPL011_UNLOCK(d, flags);

I think we need to notify xenconsoled here, in case the ring was
previosuly full? Now that we have read some data, xenconsoled can go
back and write more.


> +}
> +
> +static void vpl011_write_data(struct domain *d, uint8_t data)
> +{
> +    unsigned long flags;
> +    struct vpl011_s *vpl011=&d->arch.vpl011;
> +    struct xencons_interface *intf=(struct xencons_interface *)vpl011->ring_buf;
> +
> +    VPL011_LOCK(d, flags);
> +
> +    /*
> +     * It is expected that the ring is not full when this function is called
> +     * as the guest is expected to write to the data register only when the
> +     * TXFF flag is not set.
> +     * In case the guest does write even when the TXFF flag is set then the
> +     * data will be silently dropped.
> +     */
> +    if ( !VPL011_OUT_RING_FULL(intf) )
> +    {
> +        intf->out[MASK_XENCONS_IDX(intf->out_prod++, intf->out)] = data;
> +        smp_wmb();

Similarly, the order should be:

    out_prod = intf->out_prod;
    intf->out[MASK_XENCONS_IDX(out_prod, intf->out)] = data;
    smb_wmb();
    intf->out_prod = out_prod + 1;


> +    }
> +
> +    if ( VPL011_OUT_RING_FULL(intf) )
> +    {
> +        vpl011->uartfr |= (TXFF);
> +        vpl011->uartris &= ~(TXI);
> +    }
> +
> +    vpl011->uartfr |= (BUSY);
> +
> +    vpl011->uartfr &= ~(TXFE);
> +
> +    VPL011_UNLOCK(d, flags);
> +
> +    /* raise an event to xenconsoled only if it is the first character in the buffer */
> +    if ( VPL011_RING_DEPTH(intf, out) == 1 )
> +    {
> +        notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN]);
> +    }

Why only in case is the first character? xenconsoled will miss updates
if we don't send a notification, right?


> +}
> +
> +static int vpl011_mmio_read(struct vcpu *v, mmio_info_t *info, register_t *r, void *priv)
> +{
> +    uint8_t ch;
> +    struct hsr_dabt dabt = info->dabt;
> +    int vpl011_reg = (int)(info->gpa - GUEST_PL011_BASE);
> +    struct vpl011_s *vpl011 = &v->domain->arch.vpl011;
> +
> +    switch ( vpl011_reg )
> +    {
> +    case DR:
> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> +        vpl011_read_data(v->domain, &ch);
> +        *r = ch;
> +        break;
> +
> +    case RSR:
> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
> +
> +        /* It always returns 0 as there are no physical errors. */
> +        *r = 0;
> +        break;
> +
> +    case FR:
> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
> +        *r = (vpl011->uartfr & vpl011_reg_mask[dabt.size]);
> +        break;
> +
> +    case RIS:
> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> +        *r = (vpl011->uartris & vpl011_reg_mask[dabt.size]);
> +        break;
> +
> +    case MIS:
> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> +        *r = (vpl011->uartris &
> +                            vpl011->uartimsc & vpl011_reg_mask[dabt.size]);
> +        break;
> +
> +    case IMSC:
> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> +        *r = (vpl011->uartimsc & vpl011_reg_mask[dabt.size]);
> +        break;
> +
> +    case ICR:
> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> +
> +        /* Only write is valid. */
> +        return 0;
> +
> +    default:
> +        gprintk(XENLOG_ERR, "vpl011: unhandled read r%d offset %#08x\n",
> +                               dabt.reg, vpl011_reg);
> +        return 0;
> +    }
> +
> +    return 1;
> +
> +bad_width:
> +    gprintk(XENLOG_ERR, "vpl011: bad read width %d r%d offset %#08x\n",
> +                       dabt.size, dabt.reg, vpl011_reg);
> +    domain_crash_synchronous();
> +    return 0;
> +
> +}
> +
> +static int vpl011_mmio_write(struct vcpu *v, mmio_info_t *info, register_t r, void *priv)
> +{
> +    uint8_t ch = ((struct uartdr_reg *)&r)->data;
> +    struct hsr_dabt dabt = info->dabt;
> +    int vpl011_reg = (int)(info->gpa - GUEST_PL011_BASE);
> +    struct vpl011_s *vpl011 = &v->domain->arch.vpl011;
> +
> +    switch ( vpl011_reg )
> +    {
> +    case DR:
> +
> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
> +        vpl011_write_data(v->domain, ch);
> +        break;
> +
> +    case RSR: /* Nothing to clear. */
> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
> +        break;
> +
> +    case FR:
> +        goto write_ignore;
> +    case RIS:
> +    case MIS:
> +        goto word_write_ignore;
> +
> +    case IMSC:
> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> +        vpl011->uartimsc = (r & vpl011_reg_mask[dabt.size]);
> +        vgic_inject_vpl011_spi(v->domain);
> +        break;
> +
> +    case ICR:
> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> +        vpl011->uartris &= ~(r & vpl011_reg_mask[dabt.size]);
> +        vgic_inject_vpl011_spi(v->domain);
> +        break;
> +
> +    default:
> +        gprintk(XENLOG_ERR, "vpl011: unhandled write r%d offset %#08x\n",
> +                               dabt.reg, vpl011_reg);
> +        return 0;
> +    }
> +
> +    return 1;
> +
> +write_ignore:
> +    if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
> +    return 1;
> +
> +word_write_ignore:
> +    if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> +    return 1;
> +
> +bad_width:
> +    gprintk(XENLOG_ERR, "vpl011: bad write width %d r%d offset %#08x\n",
> +                       dabt.size, dabt.reg, vpl011_reg);
> +    domain_crash_synchronous();
> +    return 0;
> +
> +}
> +
> +static const struct mmio_handler_ops vpl011_mmio_handler = {
> +    .read = vpl011_mmio_read,
> +    .write = vpl011_mmio_write,
> +};
> +
> +int vpl011_map_guest_page(struct domain *d)
> +{
> +    struct vpl011_s *vpl011 = &d->arch.vpl011;
> +
> +    /* Map the guest PFN to Xen address space. */
> +    return prepare_ring_for_helper(d,
> +                                   d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_PFN],
> +                                   &vpl011->ring_page,
> +                                   &vpl011->ring_buf);
> +}
> +
> +static void vpl011_data_avail(struct domain *d)
> +{
> +    unsigned long flags;
> +    struct vpl011_s *vpl011 = &d->arch.vpl011;
> +    struct xencons_interface *intf=(struct xencons_interface *)vpl011->ring_buf;
> +
> +    VPL011_LOCK(d, flags);
> +
> +    /* Update the uart rx state if the buffer is not empty. */
> +    if ( !VPL011_IN_RING_EMPTY(intf) )
> +    {
> +        vpl011->uartfr &= ~(RXFE);
> +        if ( VPL011_IN_RING_FULL(intf) )
> +            vpl011->uartfr |= (RXFF);
> +        vpl011->uartris |= (RXI);
> +    }
> +
> +    /* Update the uart tx state if the buffer is not full. */
> +    if ( !VPL011_OUT_RING_FULL(intf) )
> +    {
> +        vpl011->uartfr &= ~(TXFF);
> +        vpl011->uartris |= (TXI);
> +        if ( VPL011_OUT_RING_EMPTY(intf) )
> +        {
> +            vpl011->uartfr &= ~(BUSY);
> +            vpl011->uartfr |= (TXFE);

It is not a good idea to check the intf values twice: xenconsoled might
be modifying them, so we can get into the situation where intf->out*
change in between the two checks. Please read out_cons and out_prod only
once in this function.


> +        }
> +    }
> +
> +    VPL011_UNLOCK(d, flags);
> +
> +    vgic_inject_vpl011_spi(d);
> +
> +    if ( !VPL011_OUT_RING_EMPTY(intf) )
> +    {
> +        ASSERT( d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN] != 0 );
> +        notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN]);

Why are we sending notifications at this point? The out buffer hasn't
changed since xenconsoled sent the evtchn to Xen, so xenconsoled already
knows at this point that the out ring is not empty.


> +    }
> +}
> +
> +
> +static void vpl011_notification(struct vcpu *v, unsigned int port)
> +{
> +    vpl011_data_avail(v->domain);
> +}
> +
> +int domain_vpl011_init(struct domain *d, struct xen_arch_domainconfig *config)
> +{
> +    int rc;
> +    struct vpl011_s *vpl011 = &d->arch.vpl011;
> +
> +    rc = alloc_unbound_xen_event_channel(d, 0, config->console_domid,
> +                                         vpl011_notification);
> +    if (rc < 0)
> +    {
> +        return rc;
> +    }
> +    d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN] = rc;
> +    rc = vgic_reserve_virq(d, GUEST_VPL011_SPI);
> +    if ( !rc )
> +        return rc;

We should free the allocated evtchn in case of error here


> +    register_mmio_handler(d, &vpl011_mmio_handler, GUEST_PL011_BASE, GUEST_PL011_SIZE, NULL);
> +    spin_lock_init(&vpl011->lock);
> +
> +    vpl011->intialized = true;
> +
> +    return 0;
> +}
> +
> +int domain_vpl011_deinit(struct domain *d)
> +{
> +    struct vpl011_s *vpl011 = &d->arch.vpl011;
> +
> +    if ( vpl011->intialized )
> +    {
> +        free_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN]);
> +        destroy_ring_for_helper(&vpl011->ring_buf, vpl011->ring_page);
> +    }
> +
> +    return 0;
> +}
> +
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 2d6fbb1..a122504 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -11,6 +11,7 @@
>  #include <asm/gic.h>
>  #include <public/hvm/params.h>
>  #include <xen/serial.h>
> +#include <xen/vpl011.h>
>  
>  struct hvm_domain
>  {
> @@ -131,6 +132,10 @@ struct arch_domain
>      struct {
>          uint8_t privileged_call_enabled : 1;
>      } monitor;
> +
> +#ifdef CONFIG_VPL011_CONSOLE
> +    struct vpl011_s vpl011;
> +#endif
>  }  __cacheline_aligned;
>  
>  struct arch_vcpu
> diff --git a/xen/include/asm-arm/pl011-uart.h b/xen/include/asm-arm/pl011-uart.h
> index 123f477..57e9ec7 100644
> --- a/xen/include/asm-arm/pl011-uart.h
> +++ b/xen/include/asm-arm/pl011-uart.h
> @@ -49,6 +49,8 @@
>  /* FR bits */
>  #define TXFE   (1<<7) /* TX FIFO empty */
>  #define RXFE   (1<<4) /* RX FIFO empty */
> +#define TXFF   (1<<5) /* TX FIFO full */
> +#define RXFF   (1<<6) /* RX FIFO full */
>  #define BUSY   (1<<3) /* Transmit is not complete */
>  
>  /* LCR_H bits */
> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
> index bd974fb..5f91207 100644
> --- a/xen/include/public/arch-arm.h
> +++ b/xen/include/public/arch-arm.h
> @@ -322,6 +322,8 @@ struct xen_arch_domainconfig {
>       *
>       */
>      uint32_t clock_frequency;
> +
> +    uint32_t console_domid;
>  };
>  #endif /* __XEN__ || __XEN_TOOLS__ */
>  
> @@ -410,6 +412,10 @@ typedef uint64_t xen_callback_t;
>  #define GUEST_ACPI_BASE 0x20000000ULL
>  #define GUEST_ACPI_SIZE 0x02000000ULL
>  
> +/* PL011 mappings */
> +#define GUEST_PL011_BASE    0x22000000ULL
> +#define GUEST_PL011_SIZE    0x00001000ULL
> +
>  /*
>   * 16MB == 4096 pages reserved for guest to use as a region to map its
>   * grant table in.
> @@ -444,6 +450,8 @@ typedef uint64_t xen_callback_t;
>  #define GUEST_TIMER_PHYS_NS_PPI 30
>  #define GUEST_EVTCHN_PPI        31
>  
> +#define GUEST_VPL011_SPI        32
> +
>  /* PSCI functions */
>  #define PSCI_cpu_suspend 0
>  #define PSCI_cpu_off     1
> diff --git a/xen/include/xen/vpl011.h b/xen/include/xen/vpl011.h
> new file mode 100644
> index 0000000..f9f2aba
> --- /dev/null
> +++ b/xen/include/xen/vpl011.h
> @@ -0,0 +1,67 @@
> +/*
> + * include/xen/vpl011.h
> + *
> + * Virtual PL011 UART
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _VPL011_H_
> +
> +#define _VPL011_H_
> +
> +/* helper macros */
> +#define VPL011_RING_DEPTH(intf,dir) (((intf)->dir ## _prod - (intf)->dir ## _cons))
> +
> +#define VPL011_RING_MAX_DEPTH(intf,dir) (sizeof((intf)->dir)-1)
> +
> +#define VPL011_IN_RING_EMPTY(intf) (VPL011_RING_DEPTH(intf, in) == 0)
> +
> +#define VPL011_OUT_RING_EMPTY(intf) (VPL011_RING_DEPTH(intf, out) == 0)
> +
> +#define VPL011_IN_RING_FULL(intf) (VPL011_RING_DEPTH(intf, in) == VPL011_RING_MAX_DEPTH(intf, in))
> +
> +#define VPL011_OUT_RING_FULL(intf) (VPL011_RING_DEPTH(intf, out) == VPL011_RING_MAX_DEPTH(intf,out))
> +
> +#define VPL011_LOCK(d,flags) spin_lock_irqsave(&(d)->arch.vpl011.lock, flags)
> +#define VPL011_UNLOCK(d,flags) spin_unlock_irqrestore(&(d)->arch.vpl011.lock, flags)
> +
> +#define VALID_BW_SIZE(size) ( size == DABT_BYTE || size == DABT_HALF_WORD || size == DABT_WORD )
> +#define VALID_W_SIZE(size)  ( size == DABT_HALF_WORD || size == DABT_WORD )
> +
> +struct uartdr_reg {
> +    uint8_t data;
> +    uint8_t error_status:4;
> +    uint8_t reserved1:4;
> +    uint16_t reserved2;
> +    uint32_t reserved3;
> +};
> +
> +struct vpl011_s {
> +    void *ring_buf;
> +    struct page_info *ring_page;
> +    uint32_t    uartfr;     /* flag register */
> +    uint32_t    uartcr;     /* control register */
> +    uint32_t    uartimsc;   /* interrupt mask register*/
> +    uint32_t    uarticr;    /* interrupt clear register */
> +    uint32_t    uartris;    /* raw interrupt status register */
> +    uint32_t    uartmis;    /* masked interrupt register */
> +    spinlock_t  lock;
> +    bool        intialized; /* flag which tells whether vpl011 is initialized */
> +};
> +
> +int domain_vpl011_init(struct domain *d, struct xen_arch_domainconfig *config);
> +int domain_vpl011_deinit(struct domain *d);
> +int vpl011_map_guest_page(struct domain *d);
> +
> +#endif
> -- 
> 2.7.4
>

Bhupinder Thakur April 19, 2017, 7:28 a.m. UTC | #3

Hi Stefano,

Thanks for your comments.

On 19 April 2017 at 05:45, Stefano Stabellini <sstabellini@kernel.org> wrote:
>> +static void vpl011_read_data(struct domain *d, uint8_t *data)
>> +{
>> +    unsigned long flags;
>> +    struct vpl011_s *vpl011=&d->arch.vpl011;
>
> The code style is:
>
>     vpl011 = &d->arch.vpl011
>
>
>> +    struct xencons_interface *intf=(struct xencons_interface *)vpl011->ring_buf;
>
> Same here. Please fix it everywhere.
>
I will fix the coding style here.

>
>> +    /*
>> +     * Initialize the data so that even if there is no data in ring buffer
>> +     * 0 is returned.
>> +     */
>> +    *data = 0;
>> +
>> +    VPL011_LOCK(d, flags);
>> +
>> +    /*
>> +     * It is expected that there will be data in the ring buffer when this
>> +     * function is called since the guest is expected to read the data register
>> +     * only if the TXFE flag is not set.
>> +     * If the guest still does read event when TXFE bit is set then 0 will
>> +     * be returned.
>> +     */
>> +    if ( !VPL011_IN_RING_EMPTY(intf) )
>> +    {
>> +        *data = intf->in[MASK_XENCONS_IDX(intf->in_cons++, intf->in)];
>
> For correctness, it is important to maintain the right ordering and use
> barriers:
>
>         in_cons = intf->in_cons;
>         *data = intf->in[MASK_XENCONS_IDX(in_cons, intf->in)];
>         smb_mb();
>         intf->in_cons = in_cons + 1;
>
>
>> +    }
>> +
>> +    if ( VPL011_IN_RING_EMPTY(intf) )
>> +    {
>> +        vpl011->uartfr |= (RXFE);
>> +        vpl011->uartris &= ~(RXI);
>> +    }
>> +
>> +    vpl011->uartfr &= ~(RXFF);
>> +
>> +    VPL011_UNLOCK(d, flags);
>
> I think we need to notify xenconsoled here, in case the ring was
> previosuly full? Now that we have read some data, xenconsoled can go
> back and write more.
>
Yes this is a valid point. I will send an event here.

>
>> +}
>> +
>> +static void vpl011_write_data(struct domain *d, uint8_t data)
>> +{
>> +    unsigned long flags;
>> +    struct vpl011_s *vpl011=&d->arch.vpl011;
>> +    struct xencons_interface *intf=(struct xencons_interface *)vpl011->ring_buf;
>> +
>> +    VPL011_LOCK(d, flags);
>> +
>> +    /*
>> +     * It is expected that the ring is not full when this function is called
>> +     * as the guest is expected to write to the data register only when the
>> +     * TXFF flag is not set.
>> +     * In case the guest does write even when the TXFF flag is set then the
>> +     * data will be silently dropped.
>> +     */
>> +    if ( !VPL011_OUT_RING_FULL(intf) )
>> +    {
>> +        intf->out[MASK_XENCONS_IDX(intf->out_prod++, intf->out)] = data;
>> +        smp_wmb();
>
> Similarly, the order should be:
>
>     out_prod = intf->out_prod;
>     intf->out[MASK_XENCONS_IDX(out_prod, intf->out)] = data;
>     smb_wmb();
>     intf->out_prod = out_prod + 1;
>
I will use the suggested order.
>
>> +    }
>> +
>> +    if ( VPL011_OUT_RING_FULL(intf) )
>> +    {
>> +        vpl011->uartfr |= (TXFF);
>> +        vpl011->uartris &= ~(TXI);
>> +    }
>> +
>> +    vpl011->uartfr |= (BUSY);
>> +
>> +    vpl011->uartfr &= ~(TXFE);
>> +
>> +    VPL011_UNLOCK(d, flags);
>> +
>> +    /* raise an event to xenconsoled only if it is the first character in the buffer */
>> +    if ( VPL011_RING_DEPTH(intf, out) == 1 )
>> +    {
>> +        notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN]);
>> +    }
>
> Why only in case is the first character? xenconsoled will miss updates
> if we don't send a notification, right?
>
Whenever xenconsoled sends an event, the vpl011_data_avail function
will make sure that if there is data in the OUT ring buffer then it
again
sends an event to xenconsoled. Also it avoids the need of sending an
event for every character written in the OUT ring buffer.
>
>> +}
>> +
>> +static int vpl011_mmio_read(struct vcpu *v, mmio_info_t *info, register_t *r, void *priv)
>> +{
>> +    uint8_t ch;
>> +    struct hsr_dabt dabt = info->dabt;
>> +    int vpl011_reg = (int)(info->gpa - GUEST_PL011_BASE);
>> +    struct vpl011_s *vpl011 = &v->domain->arch.vpl011;
>> +
>> +    switch ( vpl011_reg )
>> +    {
>> +    case DR:
>> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
>> +        vpl011_read_data(v->domain, &ch);
>> +        *r = ch;
>> +        break;
>> +
>> +    case RSR:
>> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
>> +
>> +        /* It always returns 0 as there are no physical errors. */
>> +        *r = 0;
>> +        break;
>> +
>> +    case FR:
>> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
>> +        *r = (vpl011->uartfr & vpl011_reg_mask[dabt.size]);
>> +        break;
>> +
>> +    case RIS:
>> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
>> +        *r = (vpl011->uartris & vpl011_reg_mask[dabt.size]);
>> +        break;
>> +
>> +    case MIS:
>> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
>> +        *r = (vpl011->uartris &
>> +                            vpl011->uartimsc & vpl011_reg_mask[dabt.size]);
>> +        break;
>> +
>> +    case IMSC:
>> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
>> +        *r = (vpl011->uartimsc & vpl011_reg_mask[dabt.size]);
>> +        break;
>> +
>> +    case ICR:
>> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
>> +
>> +        /* Only write is valid. */
>> +        return 0;
>> +
>> +    default:
>> +        gprintk(XENLOG_ERR, "vpl011: unhandled read r%d offset %#08x\n",
>> +                               dabt.reg, vpl011_reg);
>> +        return 0;
>> +    }
>> +
>> +    return 1;
>> +
>> +bad_width:
>> +    gprintk(XENLOG_ERR, "vpl011: bad read width %d r%d offset %#08x\n",
>> +                       dabt.size, dabt.reg, vpl011_reg);
>> +    domain_crash_synchronous();
>> +    return 0;
>> +
>> +}
>> +
>> +static int vpl011_mmio_write(struct vcpu *v, mmio_info_t *info, register_t r, void *priv)
>> +{
>> +    uint8_t ch = ((struct uartdr_reg *)&r)->data;
>> +    struct hsr_dabt dabt = info->dabt;
>> +    int vpl011_reg = (int)(info->gpa - GUEST_PL011_BASE);
>> +    struct vpl011_s *vpl011 = &v->domain->arch.vpl011;
>> +
>> +    switch ( vpl011_reg )
>> +    {
>> +    case DR:
>> +
>> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
>> +        vpl011_write_data(v->domain, ch);
>> +        break;
>> +
>> +    case RSR: /* Nothing to clear. */
>> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
>> +        break;
>> +
>> +    case FR:
>> +        goto write_ignore;
>> +    case RIS:
>> +    case MIS:
>> +        goto word_write_ignore;
>> +
>> +    case IMSC:
>> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
>> +        vpl011->uartimsc = (r & vpl011_reg_mask[dabt.size]);
>> +        vgic_inject_vpl011_spi(v->domain);
>> +        break;
>> +
>> +    case ICR:
>> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
>> +        vpl011->uartris &= ~(r & vpl011_reg_mask[dabt.size]);
>> +        vgic_inject_vpl011_spi(v->domain);
>> +        break;
>> +
>> +    default:
>> +        gprintk(XENLOG_ERR, "vpl011: unhandled write r%d offset %#08x\n",
>> +                               dabt.reg, vpl011_reg);
>> +        return 0;
>> +    }
>> +
>> +    return 1;
>> +
>> +write_ignore:
>> +    if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
>> +    return 1;
>> +
>> +word_write_ignore:
>> +    if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
>> +    return 1;
>> +
>> +bad_width:
>> +    gprintk(XENLOG_ERR, "vpl011: bad write width %d r%d offset %#08x\n",
>> +                       dabt.size, dabt.reg, vpl011_reg);
>> +    domain_crash_synchronous();
>> +    return 0;
>> +
>> +}
>> +
>> +static const struct mmio_handler_ops vpl011_mmio_handler = {
>> +    .read = vpl011_mmio_read,
>> +    .write = vpl011_mmio_write,
>> +};
>> +
>> +int vpl011_map_guest_page(struct domain *d)
>> +{
>> +    struct vpl011_s *vpl011 = &d->arch.vpl011;
>> +
>> +    /* Map the guest PFN to Xen address space. */
>> +    return prepare_ring_for_helper(d,
>> +                                   d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_PFN],
>> +                                   &vpl011->ring_page,
>> +                                   &vpl011->ring_buf);
>> +}
>> +
>> +static void vpl011_data_avail(struct domain *d)
>> +{
>> +    unsigned long flags;
>> +    struct vpl011_s *vpl011 = &d->arch.vpl011;
>> +    struct xencons_interface *intf=(struct xencons_interface *)vpl011->ring_buf;
>> +
>> +    VPL011_LOCK(d, flags);
>> +
>> +    /* Update the uart rx state if the buffer is not empty. */
>> +    if ( !VPL011_IN_RING_EMPTY(intf) )
>> +    {
>> +        vpl011->uartfr &= ~(RXFE);
>> +        if ( VPL011_IN_RING_FULL(intf) )
>> +            vpl011->uartfr |= (RXFF);
>> +        vpl011->uartris |= (RXI);
>> +    }
>> +
>> +    /* Update the uart tx state if the buffer is not full. */
>> +    if ( !VPL011_OUT_RING_FULL(intf) )
>> +    {
>> +        vpl011->uartfr &= ~(TXFF);
>> +        vpl011->uartris |= (TXI);
>> +        if ( VPL011_OUT_RING_EMPTY(intf) )
>> +        {
>> +            vpl011->uartfr &= ~(BUSY);
>> +            vpl011->uartfr |= (TXFE);
>
> It is not a good idea to check the intf values twice: xenconsoled might
> be modifying them, so we can get into the situation where intf->out*
> change in between the two checks. Please read out_cons and out_prod only
> once in this function.

Since this code is under LOCK, the IN and OUT ring buffers will not be
updated by the guest. Specifically, the following transitions are
ruled out:

IN ring buffer
non-empty ----> empty (as the reader is blocked due to lock)

OUT ring buffer
not-full ----> full (as writer is blocked due to lock).

So the code inside the IF block remains valid even if the buffer state changes.
For the IN ring buffer it can go from non-empty to full. Similarly for
OUT ring buffer it can go from FULL to empty.

Also checking the latest buffer index (instead of checking buffer
index read as local variables) allows to update the pl011 state at the
earliest.
>
>
>> +        }
>> +    }
>> +
>> +    VPL011_UNLOCK(d, flags);
>> +
>> +    vgic_inject_vpl011_spi(d);
>> +
>> +    if ( !VPL011_OUT_RING_EMPTY(intf) )
>> +    {
>> +        ASSERT( d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN] != 0 );
>> +        notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN]);
>
> Why are we sending notifications at this point? The out buffer hasn't
> changed since xenconsoled sent the evtchn to Xen, so xenconsoled already
> knows at this point that the out ring is not empty.

The OUT buffer may be empty when xenconsoled sends the event. By the
time the event is handled, the guest may write more data to OUT
buffer. This condition makes sure that if the OUT ring buffer is not
empty then it sends an event to xenconsoled.
>
>
>> +    }
>> +}
>> +
>> +
>> +static void vpl011_notification(struct vcpu *v, unsigned int port)
>> +{
>> +    vpl011_data_avail(v->domain);
>> +}
>> +
>> +int domain_vpl011_init(struct domain *d, struct xen_arch_domainconfig *config)
>> +{
>> +    int rc;
>> +    struct vpl011_s *vpl011 = &d->arch.vpl011;
>> +
>> +    rc = alloc_unbound_xen_event_channel(d, 0, config->console_domid,
>> +                                         vpl011_notification);
>> +    if (rc < 0)
>> +    {
>> +        return rc;
>> +    }
>> +    d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN] = rc;
>> +    rc = vgic_reserve_virq(d, GUEST_VPL011_SPI);
>> +    if ( !rc )
>> +        return rc;
>
> We should free the allocated evtchn in case of error here
I will free the event channel.

Regards,
Bhupinder

Julien Grall April 19, 2017, 8:36 a.m. UTC | #4

Hi Bhupinder,

On 04/19/2017 08:28 AM, Bhupinder Thakur wrote:
> On 19 April 2017 at 05:45, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>> +    }
>>> +}
>>> +
>>> +
>>> +static void vpl011_notification(struct vcpu *v, unsigned int port)
>>> +{
>>> +    vpl011_data_avail(v->domain);
>>> +}
>>> +
>>> +int domain_vpl011_init(struct domain *d, struct xen_arch_domainconfig *config)
>>> +{
>>> +    int rc;
>>> +    struct vpl011_s *vpl011 = &d->arch.vpl011;
>>> +
>>> +    rc = alloc_unbound_xen_event_channel(d, 0, config->console_domid,
>>> +                                         vpl011_notification);
>>> +    if (rc < 0)
>>> +    {
>>> +        return rc;
>>> +    }
>>> +    d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN] = rc;
>>> +    rc = vgic_reserve_virq(d, GUEST_VPL011_SPI);
>>> +    if ( !rc )
>>> +        return rc;
>>
>> We should free the allocated evtchn in case of error here
> I will free the event channel.

You will also want to reset d->arch.hvm_domain.params[...] to avoid the 
the other end reading garbage.

Cheers,

Stefano Stabellini April 19, 2017, 6:40 p.m. UTC | #5

On Wed, 19 Apr 2017, Bhupinder Thakur wrote:
> Hi Stefano,
> 
> Thanks for your comments.
> 
> On 19 April 2017 at 05:45, Stefano Stabellini <sstabellini@kernel.org> wrote:
> >> +static void vpl011_read_data(struct domain *d, uint8_t *data)
> >> +{
> >> +    unsigned long flags;
> >> +    struct vpl011_s *vpl011=&d->arch.vpl011;
> >
> > The code style is:
> >
> >     vpl011 = &d->arch.vpl011
> >
> >
> >> +    struct xencons_interface *intf=(struct xencons_interface *)vpl011->ring_buf;
> >
> > Same here. Please fix it everywhere.
> >
> I will fix the coding style here.
> 
> >
> >> +    /*
> >> +     * Initialize the data so that even if there is no data in ring buffer
> >> +     * 0 is returned.
> >> +     */
> >> +    *data = 0;
> >> +
> >> +    VPL011_LOCK(d, flags);
> >> +
> >> +    /*
> >> +     * It is expected that there will be data in the ring buffer when this
> >> +     * function is called since the guest is expected to read the data register
> >> +     * only if the TXFE flag is not set.
> >> +     * If the guest still does read event when TXFE bit is set then 0 will
> >> +     * be returned.
> >> +     */
> >> +    if ( !VPL011_IN_RING_EMPTY(intf) )
> >> +    {
> >> +        *data = intf->in[MASK_XENCONS_IDX(intf->in_cons++, intf->in)];
> >
> > For correctness, it is important to maintain the right ordering and use
> > barriers:
> >
> >         in_cons = intf->in_cons;
> >         *data = intf->in[MASK_XENCONS_IDX(in_cons, intf->in)];
> >         smb_mb();
> >         intf->in_cons = in_cons + 1;
> >
> >
> >> +    }
> >> +
> >> +    if ( VPL011_IN_RING_EMPTY(intf) )
> >> +    {
> >> +        vpl011->uartfr |= (RXFE);
> >> +        vpl011->uartris &= ~(RXI);
> >> +    }
> >> +
> >> +    vpl011->uartfr &= ~(RXFF);
> >> +
> >> +    VPL011_UNLOCK(d, flags);
> >
> > I think we need to notify xenconsoled here, in case the ring was
> > previosuly full? Now that we have read some data, xenconsoled can go
> > back and write more.
> >
> Yes this is a valid point. I will send an event here.
> 
> >
> >> +}
> >> +
> >> +static void vpl011_write_data(struct domain *d, uint8_t data)
> >> +{
> >> +    unsigned long flags;
> >> +    struct vpl011_s *vpl011=&d->arch.vpl011;
> >> +    struct xencons_interface *intf=(struct xencons_interface *)vpl011->ring_buf;
> >> +
> >> +    VPL011_LOCK(d, flags);
> >> +
> >> +    /*
> >> +     * It is expected that the ring is not full when this function is called
> >> +     * as the guest is expected to write to the data register only when the
> >> +     * TXFF flag is not set.
> >> +     * In case the guest does write even when the TXFF flag is set then the
> >> +     * data will be silently dropped.
> >> +     */
> >> +    if ( !VPL011_OUT_RING_FULL(intf) )
> >> +    {
> >> +        intf->out[MASK_XENCONS_IDX(intf->out_prod++, intf->out)] = data;
> >> +        smp_wmb();
> >
> > Similarly, the order should be:
> >
> >     out_prod = intf->out_prod;
> >     intf->out[MASK_XENCONS_IDX(out_prod, intf->out)] = data;
> >     smb_wmb();
> >     intf->out_prod = out_prod + 1;
> >
> I will use the suggested order.
> >
> >> +    }
> >> +
> >> +    if ( VPL011_OUT_RING_FULL(intf) )
> >> +    {
> >> +        vpl011->uartfr |= (TXFF);
> >> +        vpl011->uartris &= ~(TXI);
> >> +    }
> >> +
> >> +    vpl011->uartfr |= (BUSY);
> >> +
> >> +    vpl011->uartfr &= ~(TXFE);
> >> +
> >> +    VPL011_UNLOCK(d, flags);
> >> +
> >> +    /* raise an event to xenconsoled only if it is the first character in the buffer */
> >> +    if ( VPL011_RING_DEPTH(intf, out) == 1 )
> >> +    {
> >> +        notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN]);
> >> +    }
> >
> > Why only in case is the first character? xenconsoled will miss updates
> > if we don't send a notification, right?
> >
> Whenever xenconsoled sends an event, the vpl011_data_avail function
> will make sure that if there is data in the OUT ring buffer then it
> again
> sends an event to xenconsoled. Also it avoids the need of sending an
> event for every character written in the OUT ring buffer.

We don't have a formal protocol spec for PV console, but if we had, it
would say that the frontend (Xen in this case) should send notifications
every time there is new data to write. Thus, once per character in this
case.

I would start with a dumb implementation like that, simply notify the
other end every time. Then, in a separate patch, we can consider
introducing optimizations, which need to be well explained.


Regarding the optimization you introduced in this patch, delaying write
notifications until we receive a notification from xenconsoled, how many
notifications from xen to xenconsoled does it actually save? xenconsoled
is going to send a notification for every read: we might end up sending
the same number of notifications, only delayed.


> >> +}
> >> +
> >> +static int vpl011_mmio_read(struct vcpu *v, mmio_info_t *info, register_t *r, void *priv)
> >> +{
> >> +    uint8_t ch;
> >> +    struct hsr_dabt dabt = info->dabt;
> >> +    int vpl011_reg = (int)(info->gpa - GUEST_PL011_BASE);
> >> +    struct vpl011_s *vpl011 = &v->domain->arch.vpl011;
> >> +
> >> +    switch ( vpl011_reg )
> >> +    {

> >> +    case DR:
> >> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> >> +        vpl011_read_data(v->domain, &ch);
> >> +        *r = ch;
> >> +        break;
> >> +
> >> +    case RSR:
> >> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
> >> +
> >> +        /* It always returns 0 as there are no physical errors. */
> >> +        *r = 0;
> >> +        break;
> >> +
> >> +    case FR:
> >> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
> >> +        *r = (vpl011->uartfr & vpl011_reg_mask[dabt.size]);
> >> +        break;
> >> +
> >> +    case RIS:
> >> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> >> +        *r = (vpl011->uartris & vpl011_reg_mask[dabt.size]);
> >> +        break;
> >> +
> >> +    case MIS:
> >> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> >> +        *r = (vpl011->uartris &
> >> +                            vpl011->uartimsc & vpl011_reg_mask[dabt.size]);
> >> +        break;
> >> +
> >> +    case IMSC:
> >> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> >> +        *r = (vpl011->uartimsc & vpl011_reg_mask[dabt.size]);
> >> +        break;
> >> +
> >> +    case ICR:
> >> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> >> +
> >> +        /* Only write is valid. */
> >> +        return 0;
> >> +
> >> +    default:
> >> +        gprintk(XENLOG_ERR, "vpl011: unhandled read r%d offset %#08x\n",
> >> +                               dabt.reg, vpl011_reg);
> >> +        return 0;
> >> +    }
> >> +
> >> +    return 1;
> >> +
> >> +bad_width:
> >> +    gprintk(XENLOG_ERR, "vpl011: bad read width %d r%d offset %#08x\n",
> >> +                       dabt.size, dabt.reg, vpl011_reg);
> >> +    domain_crash_synchronous();
> >> +    return 0;
> >> +
> >> +}
> >> +
> >> +static int vpl011_mmio_write(struct vcpu *v, mmio_info_t *info, register_t r, void *priv)
> >> +{
> >> +    uint8_t ch = ((struct uartdr_reg *)&r)->data;
> >> +    struct hsr_dabt dabt = info->dabt;
> >> +    int vpl011_reg = (int)(info->gpa - GUEST_PL011_BASE);
> >> +    struct vpl011_s *vpl011 = &v->domain->arch.vpl011;
> >> +
> >> +    switch ( vpl011_reg )
> >> +    {
> >> +    case DR:
> >> +
> >> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
> >> +        vpl011_write_data(v->domain, ch);
> >> +        break;
> >> +
> >> +    case RSR: /* Nothing to clear. */
> >> +        if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
> >> +        break;
> >> +
> >> +    case FR:
> >> +        goto write_ignore;
> >> +    case RIS:
> >> +    case MIS:
> >> +        goto word_write_ignore;
> >> +
> >> +    case IMSC:
> >> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> >> +        vpl011->uartimsc = (r & vpl011_reg_mask[dabt.size]);
> >> +        vgic_inject_vpl011_spi(v->domain);
> >> +        break;
> >> +
> >> +    case ICR:
> >> +        if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> >> +        vpl011->uartris &= ~(r & vpl011_reg_mask[dabt.size]);
> >> +        vgic_inject_vpl011_spi(v->domain);
> >> +        break;
> >> +
> >> +    default:
> >> +        gprintk(XENLOG_ERR, "vpl011: unhandled write r%d offset %#08x\n",
> >> +                               dabt.reg, vpl011_reg);
> >> +        return 0;
> >> +    }
> >> +
> >> +    return 1;
> >> +
> >> +write_ignore:
> >> +    if ( !VALID_BW_SIZE(dabt.size) ) goto bad_width;
> >> +    return 1;
> >> +
> >> +word_write_ignore:
> >> +    if ( !VALID_W_SIZE(dabt.size) ) goto bad_width;
> >> +    return 1;
> >> +
> >> +bad_width:
> >> +    gprintk(XENLOG_ERR, "vpl011: bad write width %d r%d offset %#08x\n",
> >> +                       dabt.size, dabt.reg, vpl011_reg);
> >> +    domain_crash_synchronous();
> >> +    return 0;
> >> +
> >> +}
> >> +
> >> +static const struct mmio_handler_ops vpl011_mmio_handler = {
> >> +    .read = vpl011_mmio_read,
> >> +    .write = vpl011_mmio_write,
> >> +};
> >> +
> >> +int vpl011_map_guest_page(struct domain *d)
> >> +{
> >> +    struct vpl011_s *vpl011 = &d->arch.vpl011;
> >> +
> >> +    /* Map the guest PFN to Xen address space. */
> >> +    return prepare_ring_for_helper(d,
> >> +                                   d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_PFN],
> >> +                                   &vpl011->ring_page,
> >> +                                   &vpl011->ring_buf);
> >> +}
> >> +
> >> +static void vpl011_data_avail(struct domain *d)
> >> +{
> >> +    unsigned long flags;
> >> +    struct vpl011_s *vpl011 = &d->arch.vpl011;
> >> +    struct xencons_interface *intf=(struct xencons_interface *)vpl011->ring_buf;
> >> +
> >> +    VPL011_LOCK(d, flags);
> >> +
> >> +    /* Update the uart rx state if the buffer is not empty. */
> >> +    if ( !VPL011_IN_RING_EMPTY(intf) )
> >> +    {
> >> +        vpl011->uartfr &= ~(RXFE);
> >> +        if ( VPL011_IN_RING_FULL(intf) )
> >> +            vpl011->uartfr |= (RXFF);
> >> +        vpl011->uartris |= (RXI);
> >> +    }
> >> +
> >> +    /* Update the uart tx state if the buffer is not full. */
> >> +    if ( !VPL011_OUT_RING_FULL(intf) )
> >> +    {
> >> +        vpl011->uartfr &= ~(TXFF);
> >> +        vpl011->uartris |= (TXI);
> >> +        if ( VPL011_OUT_RING_EMPTY(intf) )
> >> +        {
> >> +            vpl011->uartfr &= ~(BUSY);
> >> +            vpl011->uartfr |= (TXFE);
> >
> > It is not a good idea to check the intf values twice: xenconsoled might
> > be modifying them, so we can get into the situation where intf->out*
> > change in between the two checks. Please read out_cons and out_prod only
> > once in this function.
> 
> Since this code is under LOCK, the IN and OUT ring buffers will not be
> updated by the guest. Specifically, the following transitions are
> ruled out:
> 
> IN ring buffer
> non-empty ----> empty (as the reader is blocked due to lock)
> 
> OUT ring buffer
> not-full ----> full (as writer is blocked due to lock).
> 
> So the code inside the IF block remains valid even if the buffer state changes.
> For the IN ring buffer it can go from non-empty to full. Similarly for
> OUT ring buffer it can go from FULL to empty.
> 
> Also checking the latest buffer index (instead of checking buffer
> index read as local variables) allows to update the pl011 state at the
> earliest.

I understand that the IN state shouldn't change. However, like you say,
the OUT state can change between the VPL011_OUT_RING_FULL and the
VPL011_OUT_RING_EMPTY checks.

It is a bad idea to try to write lockless code against a potentially
changing state. It is already hard enough without adding that
complexity; the other end can find ways to cause problems. Please take
a snapshot of the out indexes and use the local variables, rather than
"intf", to do the checks.


> >
> >
> >> +        }
> >> +    }
> >> +
> >> +    VPL011_UNLOCK(d, flags);
> >> +
> >> +    vgic_inject_vpl011_spi(d);
> >> +
> >> +    if ( !VPL011_OUT_RING_EMPTY(intf) )
> >> +    {
> >> +        ASSERT( d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN] != 0 );
> >> +        notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN]);
> >
> > Why are we sending notifications at this point? The out buffer hasn't
> > changed since xenconsoled sent the evtchn to Xen, so xenconsoled already
> > knows at this point that the out ring is not empty.
> 
> The OUT buffer may be empty when xenconsoled sends the event. By the
> time the event is handled, the guest may write more data to OUT
> buffer. This condition makes sure that if the OUT ring buffer is not
> empty then it sends an event to xenconsoled.

OUT notifications from Xen to xenconsoled are best handled where writes
actually happen: in vpl011_write_data. I could accept an optimization
like the one you introduced, but please explain the design and the
benefits in details.


> >
> >
> >> +    }
> >> +}
> >> +
> >> +
> >> +static void vpl011_notification(struct vcpu *v, unsigned int port)
> >> +{
> >> +    vpl011_data_avail(v->domain);
> >> +}
> >> +
> >> +int domain_vpl011_init(struct domain *d, struct xen_arch_domainconfig *config)
> >> +{
> >> +    int rc;
> >> +    struct vpl011_s *vpl011 = &d->arch.vpl011;
> >> +
> >> +    rc = alloc_unbound_xen_event_channel(d, 0, config->console_domid,
> >> +                                         vpl011_notification);
> >> +    if (rc < 0)
> >> +    {
> >> +        return rc;
> >> +    }
> >> +    d->arch.hvm_domain.params[HVM_PARAM_VCONSOLE_EVTCHN] = rc;
> >> +    rc = vgic_reserve_virq(d, GUEST_VPL011_SPI);
> >> +    if ( !rc )
> >> +        return rc;
> >
> > We should free the allocated evtchn in case of error here
> I will free the event channel.

Bhupinder Thakur April 25, 2017, 7:31 a.m. UTC | #6

On 20 April 2017 at 00:10, Stefano Stabellini <sstabellini@kernel.org> wrote:
> We don't have a formal protocol spec for PV console, but if we had, it
> would say that the frontend (Xen in this case) should send notifications
> every time there is new data to write. Thus, once per character in this
> case.
>
> I would start with a dumb implementation like that, simply notify the
> other end every time. Then, in a separate patch, we can consider
> introducing optimizations, which need to be well explained.
>
Ok. I will add this optimisation as a separate patch later.

>
> Regarding the optimization you introduced in this patch, delaying write
> notifications until we receive a notification from xenconsoled, how many
> notifications from xen to xenconsoled does it actually save? xenconsoled
> is going to send a notification for every read: we might end up sending
> the same number of notifications, only delayed.
>
>
In the PV console design, the events from the guest are sent to
xenconsole for every chunk of data. Since in the pl011 emulation, the
data comes in bytes only, it would generate lot of events to
xenconsole. To reduce the flurry of events, this optimisation was
added.

Xenconsole sends an event in the following conditions:

1. There is data available for Xen to process
2. It has finished processing the data and Xen can send more data

In the 2nd case, xenconsole will keep reading the data from the ring
buffer until it goes empty. At that point, it would send an event to
Xen. Between sending of this event and processing of this event by
Xen, there could be more data added for the xenconsole to process.
While handling an event, the Xen will check for that condition and if
there is data to be processed by xenconsole, it would send an event.

Also sending delayed events helps with the rate limit check in
xenconsole. If there are too many events, they maybe masked by
xenconsole. I could test whether this rate limit check is really
getting hit with and without this optimisation.

>>
>> Since this code is under LOCK, the IN and OUT ring buffers will not be
>> updated by the guest. Specifically, the following transitions are
>> ruled out:
>>
>> IN ring buffer
>> non-empty ----> empty (as the reader is blocked due to lock)
>>
>> OUT ring buffer
>> not-full ----> full (as writer is blocked due to lock).
>>
>> So the code inside the IF block remains valid even if the buffer state changes.
>> For the IN ring buffer it can go from non-empty to full. Similarly for
>> OUT ring buffer it can go from FULL to empty.
>>
>> Also checking the latest buffer index (instead of checking buffer
>> index read as local variables) allows to update the pl011 state at the
>> earliest.
>
> I understand that the IN state shouldn't change. However, like you say,
> the OUT state can change between the VPL011_OUT_RING_FULL and the
> VPL011_OUT_RING_EMPTY checks.
>
> It is a bad idea to try to write lockless code against a potentially
> changing state. It is already hard enough without adding that
> complexity; the other end can find ways to cause problems. Please take
> a snapshot of the out indexes and use the local variables, rather than
> "intf", to do the checks.
>
I will use local variables to do these checks.


Regards,
Bhupinder

Stefano Stabellini April 25, 2017, 5:56 p.m. UTC | #7

On Tue, 25 Apr 2017, Bhupinder Thakur wrote:
> On 20 April 2017 at 00:10, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > We don't have a formal protocol spec for PV console, but if we had, it
> > would say that the frontend (Xen in this case) should send notifications
> > every time there is new data to write. Thus, once per character in this
> > case.
> >
> > I would start with a dumb implementation like that, simply notify the
> > other end every time. Then, in a separate patch, we can consider
> > introducing optimizations, which need to be well explained.
> >
> Ok. I will add this optimisation as a separate patch later.
> 
> >
> > Regarding the optimization you introduced in this patch, delaying write
> > notifications until we receive a notification from xenconsoled, how many
> > notifications from xen to xenconsoled does it actually save? xenconsoled
> > is going to send a notification for every read: we might end up sending
> > the same number of notifications, only delayed.
> >
> >
> In the PV console design, the events from the guest are sent to
> xenconsole for every chunk of data. Since in the pl011 emulation, the
> data comes in bytes only, it would generate lot of events to
> xenconsole. To reduce the flurry of events, this optimisation was
> added.
> 
> Xenconsole sends an event in the following conditions:
> 
> 1. There is data available for Xen to process
> 2. It has finished processing the data and Xen can send more data
> 
> In the 2nd case, xenconsole will keep reading the data from the ring
> buffer until it goes empty. At that point, it would send an event to
> Xen. Between sending of this event and processing of this event by
> Xen, there could be more data added for the xenconsole to process.
> While handling an event, the Xen will check for that condition and if
> there is data to be processed by xenconsole, it would send an event.
> 
> Also sending delayed events helps with the rate limit check in
> xenconsole. If there are too many events, they maybe masked by
> xenconsole. I could test whether this rate limit check is really
> getting hit with and without this optimisation.

I understand the idea behind, my question is whether this approach was
actually verified by any scientific measurements. Did you run a test
to count how many notifications were skipped thanks to this
optimization? If so, what was the scenario? How many notifications were
saved? If you didn't run a test, I suggest you do :-)


> >>
> >> Since this code is under LOCK, the IN and OUT ring buffers will not be
> >> updated by the guest. Specifically, the following transitions are
> >> ruled out:
> >>
> >> IN ring buffer
> >> non-empty ----> empty (as the reader is blocked due to lock)
> >>
> >> OUT ring buffer
> >> not-full ----> full (as writer is blocked due to lock).
> >>
> >> So the code inside the IF block remains valid even if the buffer state changes.
> >> For the IN ring buffer it can go from non-empty to full. Similarly for
> >> OUT ring buffer it can go from FULL to empty.
> >>
> >> Also checking the latest buffer index (instead of checking buffer
> >> index read as local variables) allows to update the pl011 state at the
> >> earliest.
> >
> > I understand that the IN state shouldn't change. However, like you say,
> > the OUT state can change between the VPL011_OUT_RING_FULL and the
> > VPL011_OUT_RING_EMPTY checks.
> >
> > It is a bad idea to try to write lockless code against a potentially
> > changing state. It is already hard enough without adding that
> > complexity; the other end can find ways to cause problems. Please take
> > a snapshot of the out indexes and use the local variables, rather than
> > "intf", to do the checks.
> >
> I will use local variables to do these checks.
> 
> 
> Regards,
> Bhupinder
>

Bhupinder Thakur April 26, 2017, 7:49 a.m. UTC | #8

Hi Stefano,


>> > Regarding the optimization you introduced in this patch, delaying write
>> > notifications until we receive a notification from xenconsoled, how many
>> > notifications from xen to xenconsoled does it actually save? xenconsoled
>> > is going to send a notification for every read: we might end up sending
>> > the same number of notifications, only delayed.
>> >
>> >
>> In the PV console design, the events from the guest are sent to
>> xenconsole for every chunk of data. Since in the pl011 emulation, the
>> data comes in bytes only, it would generate lot of events to
>> xenconsole. To reduce the flurry of events, this optimisation was
>> added.
>>
>> Xenconsole sends an event in the following conditions:
>>
>> 1. There is data available for Xen to process
>> 2. It has finished processing the data and Xen can send more data
>>
>> In the 2nd case, xenconsole will keep reading the data from the ring
>> buffer until it goes empty. At that point, it would send an event to
>> Xen. Between sending of this event and processing of this event by
>> Xen, there could be more data added for the xenconsole to process.
>> While handling an event, the Xen will check for that condition and if
>> there is data to be processed by xenconsole, it would send an event.
>>
>> Also sending delayed events helps with the rate limit check in
>> xenconsole. If there are too many events, they maybe masked by
>> xenconsole. I could test whether this rate limit check is really
>> getting hit with and without this optimisation.
>
> I understand the idea behind, my question is whether this approach was
> actually verified by any scientific measurements. Did you run a test
> to count how many notifications were skipped thanks to this
> optimization? If so, what was the scenario? How many notifications were
> saved? If you didn't run a test, I suggest you do :-)
>
Today I did some instrumentation and count the number of events sent
by Xen to xenconsole and how many are really processed by xenconsole.

I could not see any difference in the number of events processed by
xenconsole with or without the optimization. The total number of
events processed by xenconsole were about 500 for the complete guest
booting till the login prompt (for both optimised and non-optimised
case).

Although Xen calls notify_via_xen_event_channel() far more number of
times (about 12000 times until the guest loging prompt comes) without
the optimisation, it does not translate into sending those many events
to xenconsole though. With the optmization it just reduces the number
of times notify_via_xen_event_channel() is called which is about 500
times.

I believe the reason could be that if the event is still pending with
xenconsole when the next event comes via
notify_via_xen_event_channel() then all such events would be coalesced
and delivered to xenconsole as a single event.

So the optimization does not help with saving any processing on
xenconsole though it saves the overhead of calling
notify_via_xen_event_channel() very frequently in Xen.

Regards,
Bhupinder

Julien Grall April 26, 2017, 8:13 a.m. UTC | #9

Hi Bhupinder,

On 26/04/2017 08:49, Bhupinder Thakur wrote:
>>>> Regarding the optimization you introduced in this patch, delaying write
>>>> notifications until we receive a notification from xenconsoled, how many
>>>> notifications from xen to xenconsoled does it actually save? xenconsoled
>>>> is going to send a notification for every read: we might end up sending
>>>> the same number of notifications, only delayed.
>>>>
>>>>
>>> In the PV console design, the events from the guest are sent to
>>> xenconsole for every chunk of data. Since in the pl011 emulation, the
>>> data comes in bytes only, it would generate lot of events to
>>> xenconsole. To reduce the flurry of events, this optimisation was
>>> added.
>>>
>>> Xenconsole sends an event in the following conditions:
>>>
>>> 1. There is data available for Xen to process
>>> 2. It has finished processing the data and Xen can send more data
>>>
>>> In the 2nd case, xenconsole will keep reading the data from the ring
>>> buffer until it goes empty. At that point, it would send an event to
>>> Xen. Between sending of this event and processing of this event by
>>> Xen, there could be more data added for the xenconsole to process.
>>> While handling an event, the Xen will check for that condition and if
>>> there is data to be processed by xenconsole, it would send an event.
>>>
>>> Also sending delayed events helps with the rate limit check in
>>> xenconsole. If there are too many events, they maybe masked by
>>> xenconsole. I could test whether this rate limit check is really
>>> getting hit with and without this optimisation.
>>
>> I understand the idea behind, my question is whether this approach was
>> actually verified by any scientific measurements. Did you run a test
>> to count how many notifications were skipped thanks to this
>> optimization? If so, what was the scenario? How many notifications were
>> saved? If you didn't run a test, I suggest you do :-)
>>
> Today I did some instrumentation and count the number of events sent
> by Xen to xenconsole and how many are really processed by xenconsole.
>
> I could not see any difference in the number of events processed by
> xenconsole with or without the optimization. The total number of
> events processed by xenconsole were about 500 for the complete guest
> booting till the login prompt (for both optimised and non-optimised
> case).
>
> Although Xen calls notify_via_xen_event_channel() far more number of
> times (about 12000 times until the guest loging prompt comes) without
> the optimisation, it does not translate into sending those many events
> to xenconsole though. With the optmization it just reduces the number
> of times notify_via_xen_event_channel() is called which is about 500
> times.
>
> I believe the reason could be that if the event is still pending with
> xenconsole when the next event comes via
> notify_via_xen_event_channel() then all such events would be coalesced
> and delivered to xenconsole as a single event.
>
> So the optimization does not help with saving any processing on
> xenconsole though it saves the overhead of calling
> notify_via_xen_event_channel() very frequently in Xen.

I don't see any issue to call notify_via_xen_event_channel many time 
because you should do it for every batch sent.

Yes, the batch consists only of one character, but this is how an UART 
is designed.

You rely on the behavior of notify_via_xen_event_channel. What if in the 
future it changes? Then maybe, you will miss event and data.

So I would rather avoid this premature optimization and see how it is 
behaving. If it is too slow, then we can think about it.

Cheers,

Stefano Stabellini April 26, 2017, 5:03 p.m. UTC | #10

On Wed, 26 Apr 2017, Julien Grall wrote:
> Hi Bhupinder,
> 
> On 26/04/2017 08:49, Bhupinder Thakur wrote:
> > > > > Regarding the optimization you introduced in this patch, delaying
> > > > > write
> > > > > notifications until we receive a notification from xenconsoled, how
> > > > > many
> > > > > notifications from xen to xenconsoled does it actually save?
> > > > > xenconsoled
> > > > > is going to send a notification for every read: we might end up
> > > > > sending
> > > > > the same number of notifications, only delayed.
> > > > > 
> > > > > 
> > > > In the PV console design, the events from the guest are sent to
> > > > xenconsole for every chunk of data. Since in the pl011 emulation, the
> > > > data comes in bytes only, it would generate lot of events to
> > > > xenconsole. To reduce the flurry of events, this optimisation was
> > > > added.
> > > > 
> > > > Xenconsole sends an event in the following conditions:
> > > > 
> > > > 1. There is data available for Xen to process
> > > > 2. It has finished processing the data and Xen can send more data
> > > > 
> > > > In the 2nd case, xenconsole will keep reading the data from the ring
> > > > buffer until it goes empty. At that point, it would send an event to
> > > > Xen. Between sending of this event and processing of this event by
> > > > Xen, there could be more data added for the xenconsole to process.
> > > > While handling an event, the Xen will check for that condition and if
> > > > there is data to be processed by xenconsole, it would send an event.
> > > > 
> > > > Also sending delayed events helps with the rate limit check in
> > > > xenconsole. If there are too many events, they maybe masked by
> > > > xenconsole. I could test whether this rate limit check is really
> > > > getting hit with and without this optimisation.
> > > 
> > > I understand the idea behind, my question is whether this approach was
> > > actually verified by any scientific measurements. Did you run a test
> > > to count how many notifications were skipped thanks to this
> > > optimization? If so, what was the scenario? How many notifications were
> > > saved? If you didn't run a test, I suggest you do :-)
> > > 
> > Today I did some instrumentation and count the number of events sent
> > by Xen to xenconsole and how many are really processed by xenconsole.
> > 
> > I could not see any difference in the number of events processed by
> > xenconsole with or without the optimization. The total number of
> > events processed by xenconsole were about 500 for the complete guest
> > booting till the login prompt (for both optimised and non-optimised
> > case).
> > 
> > Although Xen calls notify_via_xen_event_channel() far more number of
> > times (about 12000 times until the guest loging prompt comes) without
> > the optimisation, it does not translate into sending those many events
> > to xenconsole though. With the optmization it just reduces the number
> > of times notify_via_xen_event_channel() is called which is about 500
> > times.
> > 
> > I believe the reason could be that if the event is still pending with
> > xenconsole when the next event comes via
> > notify_via_xen_event_channel() then all such events would be coalesced
> > and delivered to xenconsole as a single event.
> > 
> > So the optimization does not help with saving any processing on
> > xenconsole though it saves the overhead of calling
> > notify_via_xen_event_channel() very frequently in Xen.

Thank you Bhupinder for doing the test! This is why I always ask to do
performance measurements: changes that look obvious don't always
translate in the optimizations that we hope for.


> I don't see any issue to call notify_via_xen_event_channel many time because
> you should do it for every batch sent.
> 
> Yes, the batch consists only of one character, but this is how an UART is
> designed.
> 
> You rely on the behavior of notify_via_xen_event_channel. What if in the
> future it changes? Then maybe, you will miss event and data.
> 
> So I would rather avoid this premature optimization and see how it is
> behaving. If it is too slow, then we can think about it.

I agree with Julien. Just leave it without optimizations for now. We can
work on this separately.

[01/10] xen/arm: vpl011: Add pl011 uart emulation in Xen

Commit Message

Comments

Patch