Message ID | 1507891231-4386-1-git-send-email-bhupinder.thakur@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Bhupinder, first thing: As the bulk of the series has been merged now, please restart your patch and version numbering, so a (potential) next post should be prefixed [PATCH v3 1/2]. And please have a cover letter giving a brief overview what this series fixes. On 13/10/17 11:40, Bhupinder Thakur wrote: > The early console output uses pl011_early_write() to write data. This > function waits for BUSY bit to get cleared before writing the next byte. ... which is questionable given the actual definition of the BUSY bit in the PL011 TRM: ============ .... The BUSY signal goes HIGH as soon as data is written to the transmit FIFO (that is, the FIFO is non-empty) and remains asserted HIGH while data is being transmitted. BUSY is negated only when the transmit FIFO is empty, and the last character has been transmitted from the shift register, .... ============ (I take it you are talking about the Linux driver in a guest here). I think the early_write routine tries to (deliberately?) ignore the FIFO, possibly to make sure characters really get pushed out before a system crashes, maybe. > > In the SBSA UART emulation logic, the BUSY bit was set as soon one > byte was written in the FIFO and it remained set until the FIFO was > emptied. Which is correct behaviour, as this matches the PL011 TRM as quoted above. > This meant that the output was delayed as each character needed > the BUSY to get cleared. But this is true as well! > Since the SBSA UART is getting emulated in Xen using ring buffers, it > ensures that once the data is enqueued in the FIFO, it will be received > by xenconsole so it is safe to set the BUSY bit only when FIFO becomes > full. This will ensure that pl011_early_write() is not delayed unduly > to write the data. So I can confirm that this patch fixes the very slow earlycon output observed with the current staging HEAD. So while this is somewhat deviating from the spec, I can see the benefit for an emulation scenario. I believe that emulations in general might choose implementing things a bit differently, to cope with the fundamental differences in their environment, like the virtually endless "FIFO" and the lack of any timing restrictions on the emulated "wire". So unless someone comes up with a better solution, I would support taking this patch, as this fixes a real problem. Cheers, Andre > Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org> > --- > CC: Julien Grall <julien.grall@arm.com> > CC: Andre Przywara <andre.przywara@arm.com> > CC: Stefano Stabellini <sstabellini@kernel.org> > > xen/arch/arm/vpl011.c | 21 ++++++++++++++++----- > 1 file changed, 16 insertions(+), 5 deletions(-) > > diff --git a/xen/arch/arm/vpl011.c b/xen/arch/arm/vpl011.c > index f7ddccb..0b07436 100644 > --- a/xen/arch/arm/vpl011.c > +++ b/xen/arch/arm/vpl011.c > @@ -159,9 +159,15 @@ static void vpl011_write_data(struct domain *d, uint8_t data) > { > vpl011->uartfr |= TXFF; > vpl011->uartris &= ~TXI; > - } > > - vpl011->uartfr |= BUSY; > + /* > + * This bit is set only when FIFO becomes full. This ensures that > + * the SBSA UART driver can write the early console data as fast as > + * possible, without waiting for the BUSY bit to get cleared before > + * writing each byte. > + */ > + vpl011->uartfr |= BUSY; > + } > > vpl011->uartfr &= ~TXFE; > > @@ -371,11 +377,16 @@ static void vpl011_data_avail(struct domain *d) > { > vpl011->uartfr &= ~TXFF; > vpl011->uartris |= TXI; > + > + /* > + * Clear the BUSY bit as soon as space becomes available > + * so that the SBSA UART driver can start writing more data > + * without any further delay. > + */ > + vpl011->uartfr &= ~BUSY; > + > if ( out_ring_qsize == 0 ) > - { > - vpl011->uartfr &= ~BUSY; > vpl011->uartfr |= TXFE; > - } > } > > vpl011_update_interrupt_status(d); >
On Tue, Oct 17, 2017 at 10:51:07AM +0100, Andre Przywara wrote: > Hi Bhupinder, > > first thing: As the bulk of the series has been merged now, please > restart your patch and version numbering, so a (potential) next post > should be prefixed [PATCH v3 1/2]. And please have a cover letter giving > a brief overview what this series fixes. > > On 13/10/17 11:40, Bhupinder Thakur wrote: > > The early console output uses pl011_early_write() to write data. This > > function waits for BUSY bit to get cleared before writing the next byte. > > ... which is questionable given the actual definition of the BUSY bit in > the PL011 TRM: > > ============ > .... The BUSY signal goes HIGH as soon as data is written to the > transmit FIFO (that is, the FIFO is non-empty) and remains asserted > HIGH while data is being transmitted. BUSY is negated only when the > transmit FIFO is empty, and the last character has been transmitted from > the shift register, .... > ============ > > (I take it you are talking about the Linux driver in a guest here). > I think the early_write routine tries to (deliberately?) ignore the > FIFO, possibly to make sure characters really get pushed out before a > system crashes, maybe. > > > > > In the SBSA UART emulation logic, the BUSY bit was set as soon one > > byte was written in the FIFO and it remained set until the FIFO was > > emptied. > > Which is correct behaviour, as this matches the PL011 TRM as quoted above. > > > This meant that the output was delayed as each character needed > > the BUSY to get cleared. > > But this is true as well! > > > Since the SBSA UART is getting emulated in Xen using ring buffers, it > > ensures that once the data is enqueued in the FIFO, it will be received > > by xenconsole so it is safe to set the BUSY bit only when FIFO becomes > > full. This will ensure that pl011_early_write() is not delayed unduly > > to write the data. > > So I can confirm that this patch fixes the very slow earlycon output > observed with the current staging HEAD. > > So while this is somewhat deviating from the spec, I can see the benefit > for an emulation scenario. I believe that emulations in general might > choose implementing things a bit differently, to cope with the > fundamental differences in their environment, like the virtually endless > "FIFO" and the lack of any timing restrictions on the emulated "wire". > > So unless someone comes up with a better solution, I would support > taking this patch, as this fixes a real problem. I think you get away with this, but it does violate the spec in order to work around a feature of a correctly implemented driver. Software can now see this, for example: uart_write(ch, UARTDR); busy = uart_read(UARTFR) & UARTFR_BUSY; BUG_ON(!(uart_read(UARTFR) & UARTFR_TXFE) && !busy); which violates the spec, though I can't currently think of a good reason for software to rely on that. [+Rob, who wrote the original earlycon code in the amba-pl011 driver: 0d3c673e7881 ("tty/serial: pl011: add generic earlycon support") Is there any actualy reason why we poll for !BUSY after each char in pl011_putc()? pl011_putc() is not exposed at all: it's only called by pl011_console_write(). This will result in stuttering output even on hardware, though this doesn't typically matter. I think if the poll for !BUSY were moved to the end of pl011_console_write(), the effect would be much less bad.] For Xen vpl011: This is a software emulation, so characters really are transmitted instantaneously. Can you drain the TX FIFO into the ring buffer immediately in vpl011_write_data()? Then you could always set TXFE and clear BUSY on return to the guest, unless there is backpressure from the ring buffer. Note that this could break with Linux versions prior to v4.1, which used some trickery to assert the TX FIFO interrupt for the first time (see 734745caeb9f ("serial/amba-pl011: Activate TX IRQ passively")). However, there is a subtlety in the way a real PL011 asserts the TX/RX threshold interrupts which I think you don't implement in vpl011, so you probably won't hit this problem in practice. You could test with an older kernel to find out -- but if <v4.1 kernels don't work with current Xen for other reasons then this doesn't matter. Cheers ---Dave > > Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org> > > --- > > CC: Julien Grall <julien.grall@arm.com> > > CC: Andre Przywara <andre.przywara@arm.com> > > CC: Stefano Stabellini <sstabellini@kernel.org> > > > > xen/arch/arm/vpl011.c | 21 ++++++++++++++++----- > > 1 file changed, 16 insertions(+), 5 deletions(-) > > > > diff --git a/xen/arch/arm/vpl011.c b/xen/arch/arm/vpl011.c > > index f7ddccb..0b07436 100644 > > --- a/xen/arch/arm/vpl011.c > > +++ b/xen/arch/arm/vpl011.c > > @@ -159,9 +159,15 @@ static void vpl011_write_data(struct domain *d, uint8_t data) > > { > > vpl011->uartfr |= TXFF; > > vpl011->uartris &= ~TXI; > > - } > > > > - vpl011->uartfr |= BUSY; > > + /* > > + * This bit is set only when FIFO becomes full. This ensures that > > + * the SBSA UART driver can write the early console data as fast as > > + * possible, without waiting for the BUSY bit to get cleared before > > + * writing each byte. > > + */ > > + vpl011->uartfr |= BUSY; > > + } > > > > vpl011->uartfr &= ~TXFE; > > > > @@ -371,11 +377,16 @@ static void vpl011_data_avail(struct domain *d) > > { > > vpl011->uartfr &= ~TXFF; > > vpl011->uartris |= TXI; > > + > > + /* > > + * Clear the BUSY bit as soon as space becomes available > > + * so that the SBSA UART driver can start writing more data > > + * without any further delay. > > + */ > > + vpl011->uartfr &= ~BUSY; > > + > > if ( out_ring_qsize == 0 ) > > - { > > - vpl011->uartfr &= ~BUSY; > > vpl011->uartfr |= TXFE; > > - } > > } > > > > vpl011_update_interrupt_status(d); > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
On Tue, Oct 17, 2017 at 6:19 AM, Dave Martin <Dave.Martin@arm.com> wrote: > On Tue, Oct 17, 2017 at 10:51:07AM +0100, Andre Przywara wrote: >> Hi Bhupinder, >> >> first thing: As the bulk of the series has been merged now, please >> restart your patch and version numbering, so a (potential) next post >> should be prefixed [PATCH v3 1/2]. And please have a cover letter giving >> a brief overview what this series fixes. >> >> On 13/10/17 11:40, Bhupinder Thakur wrote: >> > The early console output uses pl011_early_write() to write data. This >> > function waits for BUSY bit to get cleared before writing the next byte. >> >> ... which is questionable given the actual definition of the BUSY bit in >> the PL011 TRM: >> >> ============ >> .... The BUSY signal goes HIGH as soon as data is written to the >> transmit FIFO (that is, the FIFO is non-empty) and remains asserted >> HIGH while data is being transmitted. BUSY is negated only when the >> transmit FIFO is empty, and the last character has been transmitted from >> the shift register, .... >> ============ >> >> (I take it you are talking about the Linux driver in a guest here). >> I think the early_write routine tries to (deliberately?) ignore the >> FIFO, possibly to make sure characters really get pushed out before a >> system crashes, maybe. >> >> > >> > In the SBSA UART emulation logic, the BUSY bit was set as soon one >> > byte was written in the FIFO and it remained set until the FIFO was >> > emptied. >> >> Which is correct behaviour, as this matches the PL011 TRM as quoted above. >> >> > This meant that the output was delayed as each character needed >> > the BUSY to get cleared. >> >> But this is true as well! >> >> > Since the SBSA UART is getting emulated in Xen using ring buffers, it >> > ensures that once the data is enqueued in the FIFO, it will be received >> > by xenconsole so it is safe to set the BUSY bit only when FIFO becomes >> > full. This will ensure that pl011_early_write() is not delayed unduly >> > to write the data. >> >> So I can confirm that this patch fixes the very slow earlycon output >> observed with the current staging HEAD. >> >> So while this is somewhat deviating from the spec, I can see the benefit >> for an emulation scenario. I believe that emulations in general might >> choose implementing things a bit differently, to cope with the >> fundamental differences in their environment, like the virtually endless >> "FIFO" and the lack of any timing restrictions on the emulated "wire". >> >> So unless someone comes up with a better solution, I would support >> taking this patch, as this fixes a real problem. > > I think you get away with this, but it does violate the spec in order > to work around a feature of a correctly implemented driver. > > Software can now see this, for example: > > uart_write(ch, UARTDR); > busy = uart_read(UARTFR) & UARTFR_BUSY; > BUG_ON(!(uart_read(UARTFR) & UARTFR_TXFE) && !busy); > > which violates the spec, though I can't currently think of a good reason > for software to rely on that. > > > [+Rob, who wrote the original earlycon code in the amba-pl011 driver: > 0d3c673e7881 ("tty/serial: pl011: add generic earlycon support") > > Is there any actualy reason why we poll for !BUSY after each char in > pl011_putc()? pl011_putc() is not exposed at all: it's only called by > pl011_console_write(). > > This will result in stuttering output even on hardware, though this > doesn't typically matter. > > I think if the poll for !BUSY were moved to the end of > pl011_console_write(), the effect would be much less bad.] I just copied the code from the arm64 earlyprintk code... Maybe it was because on simulation (which was the main platform at the time) folks wanted the character "on the wire". It seems to be that you could just drop it. Rob
On Tue, Oct 17, 2017 at 07:53:36AM -0500, Rob Herring wrote: > On Tue, Oct 17, 2017 at 6:19 AM, Dave Martin <Dave.Martin@arm.com> wrote: > > On Tue, Oct 17, 2017 at 10:51:07AM +0100, Andre Przywara wrote: > >> Hi Bhupinder, > >> > >> first thing: As the bulk of the series has been merged now, please > >> restart your patch and version numbering, so a (potential) next post > >> should be prefixed [PATCH v3 1/2]. And please have a cover letter giving > >> a brief overview what this series fixes. > >> > >> On 13/10/17 11:40, Bhupinder Thakur wrote: > >> > The early console output uses pl011_early_write() to write data. This > >> > function waits for BUSY bit to get cleared before writing the next byte. > >> > >> ... which is questionable given the actual definition of the BUSY bit in > >> the PL011 TRM: > >> > >> ============ > >> .... The BUSY signal goes HIGH as soon as data is written to the > >> transmit FIFO (that is, the FIFO is non-empty) and remains asserted > >> HIGH while data is being transmitted. BUSY is negated only when the > >> transmit FIFO is empty, and the last character has been transmitted from > >> the shift register, .... > >> ============ > >> > >> (I take it you are talking about the Linux driver in a guest here). > >> I think the early_write routine tries to (deliberately?) ignore the > >> FIFO, possibly to make sure characters really get pushed out before a > >> system crashes, maybe. > >> > >> > > >> > In the SBSA UART emulation logic, the BUSY bit was set as soon one > >> > byte was written in the FIFO and it remained set until the FIFO was > >> > emptied. > >> > >> Which is correct behaviour, as this matches the PL011 TRM as quoted above. > >> > >> > This meant that the output was delayed as each character needed > >> > the BUSY to get cleared. > >> > >> But this is true as well! > >> > >> > Since the SBSA UART is getting emulated in Xen using ring buffers, it > >> > ensures that once the data is enqueued in the FIFO, it will be received > >> > by xenconsole so it is safe to set the BUSY bit only when FIFO becomes > >> > full. This will ensure that pl011_early_write() is not delayed unduly > >> > to write the data. > >> > >> So I can confirm that this patch fixes the very slow earlycon output > >> observed with the current staging HEAD. > >> > >> So while this is somewhat deviating from the spec, I can see the benefit > >> for an emulation scenario. I believe that emulations in general might > >> choose implementing things a bit differently, to cope with the > >> fundamental differences in their environment, like the virtually endless > >> "FIFO" and the lack of any timing restrictions on the emulated "wire". > >> > >> So unless someone comes up with a better solution, I would support > >> taking this patch, as this fixes a real problem. > > > > I think you get away with this, but it does violate the spec in order > > to work around a feature of a correctly implemented driver. > > > > Software can now see this, for example: > > > > uart_write(ch, UARTDR); > > busy = uart_read(UARTFR) & UARTFR_BUSY; > > BUG_ON(!(uart_read(UARTFR) & UARTFR_TXFE) && !busy); > > > > which violates the spec, though I can't currently think of a good reason > > for software to rely on that. > > > > > > [+Rob, who wrote the original earlycon code in the amba-pl011 driver: > > 0d3c673e7881 ("tty/serial: pl011: add generic earlycon support") > > > > Is there any actualy reason why we poll for !BUSY after each char in > > pl011_putc()? pl011_putc() is not exposed at all: it's only called by > > pl011_console_write(). > > > > This will result in stuttering output even on hardware, though this > > doesn't typically matter. > > > > I think if the poll for !BUSY were moved to the end of > > pl011_console_write(), the effect would be much less bad.] > > I just copied the code from the arm64 earlyprintk code... Maybe it was > because on simulation (which was the main platform at the time) folks > wanted the character "on the wire". It seems to be that you could just > drop it. Hmmm, the arm64 earlyprintk code 2475ff9d2c6e ("arm64: Add simple earlyprintk support") looks to have been derived by Catalin from arm's assembly printch/ printascii implementation, which predates git AFACT: (Catalin, pleaes put me right if I misunderstood the history.) arch/arm/kernel/debug.S: ENTRY(printascii) addruart_current r3, r1, r2 b 2f 1: waituart r2, r3 senduart r1, r3 busyuart r2, r3 teq r1, #'\n' moveq r1, #'\r' beq 1b 2: teq r0, #0 ldrneb r1, [r0], #1 teqne r1, #0 bne 1b ret lr ENDPROC(printascii) ENTRY(printch) addruart_current r3, r1, r2 mov r1, r0 mov r0, #0 b 1b ENDPROC(printch) Russell, do you know why we wait for the UART transmitter to go completely idle before queueing a new char? For an individual printch this can makes sense, but it also introduces delay for every char in printascii. This seems to interact interestingly with virtualised UARTs, because we may thrash between the guest and hypervisor per-char, though there may be a way to reduce the impact of this on the emulation side. (See above for some context) In the pl011 earlycon code that was derived from arm64 earlycon (the latter now deceased), pl011_putc() is not exposed at all and polling for !BUSY other than at the end of pl011_early_write() seems unnecessary... Crashing the platform so hard that the PL011 stops transmitting is likely to be challenging -- e.g., turning off some clock or regulator, making the bus lock up etc. None of these is likely to be triggered by pl011_early_write() itself. Cheers ---Dave
On Tue, Oct 17, 2017 at 02:44:29PM +0100, Dave Martin wrote: > arch/arm/kernel/debug.S: > > ENTRY(printascii) > addruart_current r3, r1, r2 > b 2f > 1: waituart r2, r3 > senduart r1, r3 > busyuart r2, r3 > teq r1, #'\n' > moveq r1, #'\r' > beq 1b > 2: teq r0, #0 > ldrneb r1, [r0], #1 > teqne r1, #0 > bne 1b > ret lr > ENDPROC(printascii) > > ENTRY(printch) > addruart_current r3, r1, r2 > mov r1, r0 > mov r0, #0 > b 1b > ENDPROC(printch) > > > > Russell, do you know why we wait for the UART transmitter to go > completely idle before queueing a new char? It's the only way the /debug/ (and I stress /debug/) functions know that the character has actually been sent before allowing control to continue. This is important for early-crashy-debug situations, but probably less so for early_printk. > For an individual printch this can makes sense, but it also introduces > delay for every char in printascii. > > This seems to interact interestingly with virtualised UARTs, because we > may thrash between the guest and hypervisor per-char, though there > may be a way to reduce the impact of this on the emulation side. Well, I guess re-using the early /debugging/ code for early printk is showing more and more issues - and reusing this code in this way is something that I've never really been a fan of. I'd personally like to see the coupling between the two things gone - I never really wanted that coupling in the first place.
On Tue, Oct 17, 2017 at 03:03:02PM +0100, Russell King - ARM Linux wrote: > On Tue, Oct 17, 2017 at 02:44:29PM +0100, Dave Martin wrote: > > arch/arm/kernel/debug.S: > > > > ENTRY(printascii) > > addruart_current r3, r1, r2 > > b 2f > > 1: waituart r2, r3 > > senduart r1, r3 > > busyuart r2, r3 > > teq r1, #'\n' > > moveq r1, #'\r' > > beq 1b > > 2: teq r0, #0 > > ldrneb r1, [r0], #1 > > teqne r1, #0 > > bne 1b > > ret lr > > ENDPROC(printascii) > > > > ENTRY(printch) > > addruart_current r3, r1, r2 > > mov r1, r0 > > mov r0, #0 > > b 1b > > ENDPROC(printch) > > > > > > > > Russell, do you know why we wait for the UART transmitter to go > > completely idle before queueing a new char? > > It's the only way the /debug/ (and I stress /debug/) functions know > that the character has actually been sent before allowing control to > continue. This is important for early-crashy-debug situations, but > probably less so for early_printk. > > > For an individual printch this can makes sense, but it also introduces > > delay for every char in printascii. > > > > This seems to interact interestingly with virtualised UARTs, because we > > may thrash between the guest and hypervisor per-char, though there > > may be a way to reduce the impact of this on the emulation side. > > Well, I guess re-using the early /debugging/ code for early printk is > showing more and more issues - and reusing this code in this way is > something that I've never really been a fan of. > > I'd personally like to see the coupling between the two things gone - > I never really wanted that coupling in the first place. Agreed. I'll propose a patch for the amba-pl011 earlycon code so that the flush is only per each write() call, which should at least mitigate the impact. For low-level debug, it makes more sense to be as conservative as possible though: I don't see a need to change arm printch/printascii: as you point out, these two bits of code have different purposes, even if they have common ancestry. Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Andre, On 17 October 2017 at 15:21, Andre Przywara <andre.przywara@arm.com> wrote: > Hi Bhupinder, > > first thing: As the bulk of the series has been merged now, please > restart your patch and version numbering, so a (potential) next post > should be prefixed [PATCH v3 1/2]. And please have a cover letter giving > a brief overview what this series fixes. > Should I resend the patch series with a cover letter? I will also add a reported-by tag. > On 13/10/17 11:40, Bhupinder Thakur wrote: >> The early console output uses pl011_early_write() to write data. This >> function waits for BUSY bit to get cleared before writing the next byte. > > ... which is questionable given the actual definition of the BUSY bit in > the PL011 TRM: > > ============ > .... The BUSY signal goes HIGH as soon as data is written to the > transmit FIFO (that is, the FIFO is non-empty) and remains asserted > HIGH while data is being transmitted. BUSY is negated only when the > transmit FIFO is empty, and the last character has been transmitted from > the shift register, .... > ============ > > (I take it you are talking about the Linux driver in a guest here). > I think the early_write routine tries to (deliberately?) ignore the > FIFO, possibly to make sure characters really get pushed out before a > system crashes, maybe. > >> >> In the SBSA UART emulation logic, the BUSY bit was set as soon one >> byte was written in the FIFO and it remained set until the FIFO was >> emptied. > > Which is correct behaviour, as this matches the PL011 TRM as quoted above. > >> This meant that the output was delayed as each character needed >> the BUSY to get cleared. > > But this is true as well! > >> Since the SBSA UART is getting emulated in Xen using ring buffers, it >> ensures that once the data is enqueued in the FIFO, it will be received >> by xenconsole so it is safe to set the BUSY bit only when FIFO becomes >> full. This will ensure that pl011_early_write() is not delayed unduly >> to write the data. > > So I can confirm that this patch fixes the very slow earlycon output > observed with the current staging HEAD. > > So while this is somewhat deviating from the spec, I can see the benefit > for an emulation scenario. I believe that emulations in general might > choose implementing things a bit differently, to cope with the > fundamental differences in their environment, like the virtually endless > "FIFO" and the lack of any timing restrictions on the emulated "wire". > > So unless someone comes up with a better solution, I would support > taking this patch, as this fixes a real problem. > > Cheers, > Andre > >> Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org> >> --- >> CC: Julien Grall <julien.grall@arm.com> >> CC: Andre Przywara <andre.przywara@arm.com> >> CC: Stefano Stabellini <sstabellini@kernel.org> >> >> xen/arch/arm/vpl011.c | 21 ++++++++++++++++----- >> 1 file changed, 16 insertions(+), 5 deletions(-) >> >> diff --git a/xen/arch/arm/vpl011.c b/xen/arch/arm/vpl011.c >> index f7ddccb..0b07436 100644 >> --- a/xen/arch/arm/vpl011.c >> +++ b/xen/arch/arm/vpl011.c >> @@ -159,9 +159,15 @@ static void vpl011_write_data(struct domain *d, uint8_t data) >> { >> vpl011->uartfr |= TXFF; >> vpl011->uartris &= ~TXI; >> - } >> >> - vpl011->uartfr |= BUSY; >> + /* >> + * This bit is set only when FIFO becomes full. This ensures that >> + * the SBSA UART driver can write the early console data as fast as >> + * possible, without waiting for the BUSY bit to get cleared before >> + * writing each byte. >> + */ >> + vpl011->uartfr |= BUSY; >> + } >> >> vpl011->uartfr &= ~TXFE; >> >> @@ -371,11 +377,16 @@ static void vpl011_data_avail(struct domain *d) >> { >> vpl011->uartfr &= ~TXFF; >> vpl011->uartris |= TXI; >> + >> + /* >> + * Clear the BUSY bit as soon as space becomes available >> + * so that the SBSA UART driver can start writing more data >> + * without any further delay. >> + */ >> + vpl011->uartfr &= ~BUSY; >> + >> if ( out_ring_qsize == 0 ) >> - { >> - vpl011->uartfr &= ~BUSY; >> vpl011->uartfr |= TXFE; >> - } >> } >> >> vpl011_update_interrupt_status(d); >> Regards, Bhupinder
Hi, On 18/10/17 11:17, Bhupinder Thakur wrote: > Hi Andre, > > On 17 October 2017 at 15:21, Andre Przywara <andre.przywara@arm.com> wrote: >> Hi Bhupinder, >> >> first thing: As the bulk of the series has been merged now, please >> restart your patch and version numbering, so a (potential) next post >> should be prefixed [PATCH v3 1/2]. And please have a cover letter giving >> a brief overview what this series fixes. >> > Should I resend the patch series with a cover letter? I will also add > a reported-by tag. Please wait until we have settled upon a solution, especially for that other patch. We can talk about this in our meeting later today. Cheers, Andre. >> On 13/10/17 11:40, Bhupinder Thakur wrote: >>> The early console output uses pl011_early_write() to write data. This >>> function waits for BUSY bit to get cleared before writing the next byte. >> >> ... which is questionable given the actual definition of the BUSY bit in >> the PL011 TRM: >> >> ============ >> .... The BUSY signal goes HIGH as soon as data is written to the >> transmit FIFO (that is, the FIFO is non-empty) and remains asserted >> HIGH while data is being transmitted. BUSY is negated only when the >> transmit FIFO is empty, and the last character has been transmitted from >> the shift register, .... >> ============ >> >> (I take it you are talking about the Linux driver in a guest here). >> I think the early_write routine tries to (deliberately?) ignore the >> FIFO, possibly to make sure characters really get pushed out before a >> system crashes, maybe. >> >>> >>> In the SBSA UART emulation logic, the BUSY bit was set as soon one >>> byte was written in the FIFO and it remained set until the FIFO was >>> emptied. >> >> Which is correct behaviour, as this matches the PL011 TRM as quoted above. >> >>> This meant that the output was delayed as each character needed >>> the BUSY to get cleared. >> >> But this is true as well! >> >>> Since the SBSA UART is getting emulated in Xen using ring buffers, it >>> ensures that once the data is enqueued in the FIFO, it will be received >>> by xenconsole so it is safe to set the BUSY bit only when FIFO becomes >>> full. This will ensure that pl011_early_write() is not delayed unduly >>> to write the data. >> >> So I can confirm that this patch fixes the very slow earlycon output >> observed with the current staging HEAD. >> >> So while this is somewhat deviating from the spec, I can see the benefit >> for an emulation scenario. I believe that emulations in general might >> choose implementing things a bit differently, to cope with the >> fundamental differences in their environment, like the virtually endless >> "FIFO" and the lack of any timing restrictions on the emulated "wire". >> >> So unless someone comes up with a better solution, I would support >> taking this patch, as this fixes a real problem. >> >> Cheers, >> Andre >> >>> Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org> >>> --- >>> CC: Julien Grall <julien.grall@arm.com> >>> CC: Andre Przywara <andre.przywara@arm.com> >>> CC: Stefano Stabellini <sstabellini@kernel.org> >>> >>> xen/arch/arm/vpl011.c | 21 ++++++++++++++++----- >>> 1 file changed, 16 insertions(+), 5 deletions(-) >>> >>> diff --git a/xen/arch/arm/vpl011.c b/xen/arch/arm/vpl011.c >>> index f7ddccb..0b07436 100644 >>> --- a/xen/arch/arm/vpl011.c >>> +++ b/xen/arch/arm/vpl011.c >>> @@ -159,9 +159,15 @@ static void vpl011_write_data(struct domain *d, uint8_t data) >>> { >>> vpl011->uartfr |= TXFF; >>> vpl011->uartris &= ~TXI; >>> - } >>> >>> - vpl011->uartfr |= BUSY; >>> + /* >>> + * This bit is set only when FIFO becomes full. This ensures that >>> + * the SBSA UART driver can write the early console data as fast as >>> + * possible, without waiting for the BUSY bit to get cleared before >>> + * writing each byte. >>> + */ >>> + vpl011->uartfr |= BUSY; >>> + } >>> >>> vpl011->uartfr &= ~TXFE; >>> >>> @@ -371,11 +377,16 @@ static void vpl011_data_avail(struct domain *d) >>> { >>> vpl011->uartfr &= ~TXFF; >>> vpl011->uartris |= TXI; >>> + >>> + /* >>> + * Clear the BUSY bit as soon as space becomes available >>> + * so that the SBSA UART driver can start writing more data >>> + * without any further delay. >>> + */ >>> + vpl011->uartfr &= ~BUSY; >>> + >>> if ( out_ring_qsize == 0 ) >>> - { >>> - vpl011->uartfr &= ~BUSY; >>> vpl011->uartfr |= TXFE; >>> - } >>> } >>> >>> vpl011_update_interrupt_status(d); >>> > > Regards, > Bhupinder >
diff --git a/xen/arch/arm/vpl011.c b/xen/arch/arm/vpl011.c index f7ddccb..0b07436 100644 --- a/xen/arch/arm/vpl011.c +++ b/xen/arch/arm/vpl011.c @@ -159,9 +159,15 @@ static void vpl011_write_data(struct domain *d, uint8_t data) { vpl011->uartfr |= TXFF; vpl011->uartris &= ~TXI; - } - vpl011->uartfr |= BUSY; + /* + * This bit is set only when FIFO becomes full. This ensures that + * the SBSA UART driver can write the early console data as fast as + * possible, without waiting for the BUSY bit to get cleared before + * writing each byte. + */ + vpl011->uartfr |= BUSY; + } vpl011->uartfr &= ~TXFE; @@ -371,11 +377,16 @@ static void vpl011_data_avail(struct domain *d) { vpl011->uartfr &= ~TXFF; vpl011->uartris |= TXI; + + /* + * Clear the BUSY bit as soon as space becomes available + * so that the SBSA UART driver can start writing more data + * without any further delay. + */ + vpl011->uartfr &= ~BUSY; + if ( out_ring_qsize == 0 ) - { - vpl011->uartfr &= ~BUSY; vpl011->uartfr |= TXFE; - } } vpl011_update_interrupt_status(d);
The early console output uses pl011_early_write() to write data. This function waits for BUSY bit to get cleared before writing the next byte. In the SBSA UART emulation logic, the BUSY bit was set as soon one byte was written in the FIFO and it remained set until the FIFO was emptied. This meant that the output was delayed as each character needed the BUSY to get cleared. Since the SBSA UART is getting emulated in Xen using ring buffers, it ensures that once the data is enqueued in the FIFO, it will be received by xenconsole so it is safe to set the BUSY bit only when FIFO becomes full. This will ensure that pl011_early_write() is not delayed unduly to write the data. Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org> --- CC: Julien Grall <julien.grall@arm.com> CC: Andre Przywara <andre.przywara@arm.com> CC: Stefano Stabellini <sstabellini@kernel.org> xen/arch/arm/vpl011.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)