vt_buffer: drop console buffer copying optimisations
diff mbox

Message ID 1422504685-7864-1-git-send-email-airlied@redhat.com
State New, archived
Headers show

Commit Message

David Airlie Jan. 29, 2015, 4:11 a.m. UTC
These two copy to/from VGA memory, however on the Silicon
Motion SMI750 VGA card on a 64-bit system cause console corruption.

This is due to the hw being buggy and not handling a 64-bit transaction
correctly.

We could try and create a 32-bit version of these routines,
but I'm not sure the optimisation is worth much today.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1132826

Tested-by: Huawei engineering.
Signed-off-by: Dave Airlie <airlied@redhat.com>
---

Linus, this came up a while back I finally got some confirmation
that it fixes those servers.

 include/linux/vt_buffer.h | 4 ----
 1 file changed, 4 deletions(-)

Comments

Peter Hurley Jan. 29, 2015, 12:06 p.m. UTC | #1
On 01/28/2015 11:11 PM, Dave Airlie wrote:
> These two copy to/from VGA memory, however on the Silicon
> Motion SMI750 VGA card on a 64-bit system cause console corruption.
> 
> This is due to the hw being buggy and not handling a 64-bit transaction
> correctly.
> 
> We could try and create a 32-bit version of these routines,
> but I'm not sure the optimisation is worth much today.
> 
> Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1132826

Restricted link.


> Tested-by: Huawei engineering.
> Signed-off-by: Dave Airlie <airlied@redhat.com>
> ---
> 
> Linus, this came up a while back I finally got some confirmation
> that it fixes those servers.
> 
>  include/linux/vt_buffer.h | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/include/linux/vt_buffer.h b/include/linux/vt_buffer.h
> index 057db7d..f38c10b 100644
> --- a/include/linux/vt_buffer.h
> +++ b/include/linux/vt_buffer.h
> @@ -21,10 +21,6 @@
>  #ifndef VT_BUF_HAVE_RW
>  #define scr_writew(val, addr) (*(addr) = (val))
>  #define scr_readw(addr) (*(addr))
> -#define scr_memcpyw(d, s, c) memcpy(d, s, c)
> -#define scr_memmovew(d, s, c) memmove(d, s, c)
> -#define VT_BUF_HAVE_MEMCPYW
> -#define VT_BUF_HAVE_MEMMOVEW
>  #endif
>  
>  #ifndef VT_BUF_HAVE_MEMSETW
> 


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Linus Torvalds Jan. 29, 2015, 11:40 p.m. UTC | #2
On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie <airlied@redhat.com> wrote:
>
> Linus, this came up a while back I finally got some confirmation
> that it fixes those servers.

I'm certainly ok with this. which way should it go in? The users are:

 - drivers/tty/vt/vt.c (Greg KH, "tty layer")

 - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends)

and it might make sense to have *some* indication of how much worse
this makes fbcon performance in particular..

Greg/Tomi - the patch is removing this:

  #define scr_memcpyw(d, s, c) memcpy(d, s, c)
  #define scr_memmovew(d, s, c) memmove(d, s, c)
  #define VT_BUF_HAVE_MEMCPYW
  #define VT_BUF_HAVE_MEMMOVEW

from <linux/vt_buffer.h>, because some stupid graphics cards
apparently cannot handle 64-bit accesses of regular memcpy/memmove.

And on other setups, this will be the reverse: 8-bit accesses due to
using "rep movsb", which is the fast way to move/clear memory on
modern Intel CPU's, but is really wrong for MMIO where it will be slow
as hell.

So just getting rid of the memcpy/memmove is likely the right thing in
general, since the fallbacks go this the traditional 16-bit-at-a-time
way. And getting rid of the memcpy _may_ speed things up.

But if it slows things down, we might have to try something else. Like
saying "all cards we've ever seen have been ok with aligned 32-bit
accesses", and extend the open-coded scr_memcpy/memmove functions to
do that.

Hmm?

                           Linus

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Greg Kroah-Hartman Jan. 29, 2015, 11:57 p.m. UTC | #3
On Thu, Jan 29, 2015 at 03:40:33PM -0800, Linus Torvalds wrote:
> On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie <airlied@redhat.com> wrote:
> >
> > Linus, this came up a while back I finally got some confirmation
> > that it fixes those servers.
> 
> I'm certainly ok with this. which way should it go in? The users are:
> 
>  - drivers/tty/vt/vt.c (Greg KH, "tty layer")
> 
>  - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends)
> 
> and it might make sense to have *some* indication of how much worse
> this makes fbcon performance in particular..
> 
> Greg/Tomi - the patch is removing this:
> 
>   #define scr_memcpyw(d, s, c) memcpy(d, s, c)
>   #define scr_memmovew(d, s, c) memmove(d, s, c)
>   #define VT_BUF_HAVE_MEMCPYW
>   #define VT_BUF_HAVE_MEMMOVEW
> 
> from <linux/vt_buffer.h>, because some stupid graphics cards
> apparently cannot handle 64-bit accesses of regular memcpy/memmove.
> 
> And on other setups, this will be the reverse: 8-bit accesses due to
> using "rep movsb", which is the fast way to move/clear memory on
> modern Intel CPU's, but is really wrong for MMIO where it will be slow
> as hell.
> 
> So just getting rid of the memcpy/memmove is likely the right thing in
> general, since the fallbacks go this the traditional 16-bit-at-a-time
> way. And getting rid of the memcpy _may_ speed things up.
> 
> But if it slows things down, we might have to try something else. Like
> saying "all cards we've ever seen have been ok with aligned 32-bit
> accesses", and extend the open-coded scr_memcpy/memmove functions to
> do that.
> 
> Hmm?

I can take this through the tty tree, but can I put it in linux-next and
wait for the 3.20 merge window to give people who might notice a
slow-down a chance to object?

thanks,

greg k-h

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Linus Torvalds Jan. 30, 2015, 12:03 a.m. UTC | #4
On Thu, Jan 29, 2015 at 3:57 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> I can take this through the tty tree, but can I put it in linux-next and
> wait for the 3.20 merge window to give people who might notice a
> slow-down a chance to object?

Yes. The problem only affects one (or a couple of) truly outrageously
bad graphics cards that are only used in servers (because they are
such crap that they wouldn't be acceptable anywhere else anyway), and
they have afaik never worked with 64-bit kernels, so it's not even a
regression.

So it's worth fixing because it's a real - albeit very rare - problem
(especially since the enhanched rep instruction model of memcpy could
easily be *worse* than the 16-bit-at-a-time manual version), but I
wouldn't consider it anywhere near high priority.

                         Linus

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Dave Airlie Jan. 30, 2015, 12:14 a.m. UTC | #5
On 30 January 2015 at 10:03, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 29, 2015 at 3:57 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
>>
>> I can take this through the tty tree, but can I put it in linux-next and
>> wait for the 3.20 merge window to give people who might notice a
>> slow-down a chance to object?
>
> Yes. The problem only affects one (or a couple of) truly outrageously
> bad graphics cards that are only used in servers (because they are
> such crap that they wouldn't be acceptable anywhere else anyway), and
> they have afaik never worked with 64-bit kernels, so it's not even a
> regression.
>
> So it's worth fixing because it's a real - albeit very rare - problem
> (especially since the enhanched rep instruction model of memcpy could
> easily be *worse* than the 16-bit-at-a-time manual version), but I
> wouldn't consider it anywhere near high priority.
>
Totally not a priority, it just finally got tested for RHEL so I wanted to
make sure I posted it upstream before I forgot about it for months,

I also filed:
https://bugzilla.kernel.org/show_bug.cgi?id=92311

since the RH bug is private and full of crap, that bug contains
a screenshot of the remote console to see what sort of crap it produces.

Dave.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Alan Cox Feb. 3, 2015, 3:54 p.m. UTC | #6
On Thu, 29 Jan 2015 15:40:33 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie <airlied@redhat.com> wrote:
> >
> > Linus, this came up a while back I finally got some confirmation
> > that it fixes those servers.
> 
> I'm certainly ok with this. which way should it go in? The users are:
> 
>  - drivers/tty/vt/vt.c (Greg KH, "tty layer")
> 
>  - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends)
> 
> and it might make sense to have *some* indication of how much worse
> this makes fbcon performance in particular..

For devices that have no hardware scrolling it used to be double digit
percentages difference between 32 and 64bit when reading from the fb
because the reads are not posted and the latency killed you. Writes - not
so big a deal - but the bridge should combine them anyway. I imagine
16bit read would be unprintably bad.

Is it reads or writes that kill the card ?

Also note that switching to lots of small writes may break the 3Dfx
driver for the early 3Dfx PCI cards - they are really quite touchy about
how they are fed.

Unfortunately fbcon still matters for dumb EFI framebuffer fallbacks.

vgacon it doesn't matter (if it was too slow you could make vgacon as
fast as you want by only updating the off screen characters once per
vertical blank). fbcon that is a bit harder as you are allowed to
scribble on the display as well. You can't even check open/mmapped as you
can open, scribble and close.

Alan

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Geert Uytterhoeven Feb. 5, 2015, 9:01 a.m. UTC | #7
On Tue, Feb 3, 2015 at 4:54 PM, One Thousand Gnomes
<gnomes@lxorguk.ukuu.org.uk> wrote:
> On Thu, 29 Jan 2015 15:40:33 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie <airlied@redhat.com> wrote:
>> >
>> > Linus, this came up a while back I finally got some confirmation
>> > that it fixes those servers.
>>
>> I'm certainly ok with this. which way should it go in? The users are:
>>
>>  - drivers/tty/vt/vt.c (Greg KH, "tty layer")
>>
>>  - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends)
>>
>> and it might make sense to have *some* indication of how much worse
>> this makes fbcon performance in particular..
>
> For devices that have no hardware scrolling it used to be double digit
> percentages difference between 32 and 64bit when reading from the fb
> because the reads are not posted and the latency killed you. Writes - not
> so big a deal - but the bridge should combine them anyway. I imagine
> 16bit read would be unprintably bad.

Fbcon uses scr_mem{cpy,move}w() for the VT buffer (characters + attributes)
only, not for the frame buffer data.
So the performance degradation should be minimal.

However, as this affects real VGA on x86 only, perhaps it can be fixed
in arch/x86/include/asm/vga.h instead of include/linux/vt_buffer.h, so
platforms not having VGA are not affected? We have these VT_BUF_*
and scr_*() abstractions for a reason...

If I'm not mistaken, that would be as simple as adding

    #define VT_BUF_HAVE_RW.
    #define scr_writew(val, addr) (*(addr) = (val))
    #define scr_readw(addr) (*(addr))

to arch/x86/include/asm/vga.h.

If someone wants to put one of the "bad" VGA cards in a non-x86 PCI slot,
perhaps a few more architecture-specific asm/vga.h have to be updated:

$ git grep -w VT_BUF_HAVE_RW -- arch
arch/alpha/include/asm/vga.h:#define VT_BUF_HAVE_RW
arch/mips/include/asm/vga.h:#define VT_BUF_HAVE_RW
arch/powerpc/include/asm/vga.h:#define VT_BUF_HAVE_RW
arch/sparc/include/asm/vga.h:#define VT_BUF_HAVE_RW
arch/tile/include/asm/vga.h:#define VT_BUF_HAVE_RW

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Alan Cox Feb. 5, 2015, 11:35 a.m. UTC | #8
> If I'm not mistaken, that would be as simple as adding
> 
>     #define VT_BUF_HAVE_RW.
>     #define scr_writew(val, addr) (*(addr) = (val))
>     #define scr_readw(addr) (*(addr))
> 
> to arch/x86/include/asm/vga.h.

and stick an

#if defined (CONFIG_SUPPORT_SHITE_VGA_ADAPTERS)

#endif

around that and its sorted as an option everyone can leave off but the
afflicted.

Alan

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Daniel Stone Feb. 9, 2015, 10:35 a.m. UTC | #9
On 5 February 2015 at 11:35, One Thousand Gnomes
<gnomes@lxorguk.ukuu.org.uk> wrote:
>> If I'm not mistaken, that would be as simple as adding
>>
>>     #define VT_BUF_HAVE_RW.
>>     #define scr_writew(val, addr) (*(addr) = (val))
>>     #define scr_readw(addr) (*(addr))
>>
>> to arch/x86/include/asm/vga.h.
>
> and stick an
>
> #if defined (CONFIG_SUPPORT_SHITE_VGA_ADAPTERS)
>
> #endif
>
> around that and its sorted as an option everyone can leave off but the
> afflicted.

Well, given all the distros will enable that, might as well be #if
!defined(CONFIG_BREAK_SOME_HARDWARE_BUT_VGA_SCROLLING_WILL_BE_IMMEASURABLY_FASTER).

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Geert Uytterhoeven Feb. 9, 2015, 10:49 a.m. UTC | #10
On Mon, Feb 9, 2015 at 11:35 AM, Daniel Stone <daniel@fooishbar.org> wrote:
> On 5 February 2015 at 11:35, One Thousand Gnomes
> <gnomes@lxorguk.ukuu.org.uk> wrote:
>>> If I'm not mistaken, that would be as simple as adding
>>>
>>>     #define VT_BUF_HAVE_RW.
>>>     #define scr_writew(val, addr) (*(addr) = (val))
>>>     #define scr_readw(addr) (*(addr))
>>>
>>> to arch/x86/include/asm/vga.h.
>>
>> and stick an
>>
>> #if defined (CONFIG_SUPPORT_SHITE_VGA_ADAPTERS)
>>
>> #endif
>>
>> around that and its sorted as an option everyone can leave off but the
>> afflicted.
>
> Well, given all the distros will enable that, might as well be #if
> !defined(CONFIG_BREAK_SOME_HARDWARE_BUT_VGA_SCROLLING_WILL_BE_IMMEASURABLY_FASTER).

All distros on 1 out of 29 architectures?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Daniel Stone Feb. 9, 2015, 11 a.m. UTC | #11
On 9 February 2015 at 10:49, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Mon, Feb 9, 2015 at 11:35 AM, Daniel Stone <daniel@fooishbar.org> wrote:
>> On 5 February 2015 at 11:35, One Thousand Gnomes
>> <gnomes@lxorguk.ukuu.org.uk> wrote:
>>> #if defined (CONFIG_SUPPORT_SHITE_VGA_ADAPTERS)
>>>
>>> #endif
>>>
>>> around that and its sorted as an option everyone can leave off but the
>>> afflicted.
>>
>> Well, given all the distros will enable that, might as well be #if
>> !defined(CONFIG_BREAK_SOME_HARDWARE_BUT_VGA_SCROLLING_WILL_BE_IMMEASURABLY_FASTER).
>
> All distros on 1 out of 29 architectures?

It's a fairly popular architecture.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Alan Cox Feb. 9, 2015, 8:17 p.m. UTC | #12
On Mon, 9 Feb 2015 11:00:55 +0000
Daniel Stone <daniel@fooishbar.org> wrote:

> On 9 February 2015 at 10:49, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Mon, Feb 9, 2015 at 11:35 AM, Daniel Stone <daniel@fooishbar.org> wrote:
> >> On 5 February 2015 at 11:35, One Thousand Gnomes
> >> <gnomes@lxorguk.ukuu.org.uk> wrote:
> >>> #if defined (CONFIG_SUPPORT_SHITE_VGA_ADAPTERS)
> >>>
> >>> #endif
> >>>
> >>> around that and its sorted as an option everyone can leave off but the
> >>> afflicted.
> >>
> >> Well, given all the distros will enable that, might as well be #if
> >> !defined(CONFIG_BREAK_SOME_HARDWARE_BUT_VGA_SCROLLING_WILL_BE_IMMEASURABLY_FASTER).
> >
> > All distros on 1 out of 29 architectures?
> 
> It's a fairly popular architecture.

I imagine most distros wouldn't enable it even on x86. It's an incredibly
obscure setup from the evidence of how long it took to get reported.

Most distributions don't support non PAE processors and other far more
common things 8)

Alan


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
Pavel Machek Feb. 24, 2015, 4:49 p.m. UTC | #13
On Thu 2015-01-29 14:11:25, Dave Airlie wrote:
> These two copy to/from VGA memory, however on the Silicon
> Motion SMI750 VGA card on a 64-bit system cause console corruption.
> 
> This is due to the hw being buggy and not handling a 64-bit transaction
> correctly.
> 
> We could try and create a 32-bit version of these routines,
> but I'm not sure the optimisation is worth much today.
> 
> Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1132826
> 
> Tested-by: Huawei engineering.
> Signed-off-by: Dave Airlie <airlied@redhat.com>

Actually... are you sure this is right fix?

IOW can gcc do the optimalization behind your back and still break the
buggy card?
								Pavel

> diff --git a/include/linux/vt_buffer.h b/include/linux/vt_buffer.h
> index 057db7d..f38c10b 100644
> --- a/include/linux/vt_buffer.h
> +++ b/include/linux/vt_buffer.h
> @@ -21,10 +21,6 @@
>  #ifndef VT_BUF_HAVE_RW
>  #define scr_writew(val, addr) (*(addr) = (val))
>  #define scr_readw(addr) (*(addr))
> -#define scr_memcpyw(d, s, c) memcpy(d, s, c)
> -#define scr_memmovew(d, s, c) memmove(d, s, c)
> -#define VT_BUF_HAVE_MEMCPYW
> -#define VT_BUF_HAVE_MEMMOVEW
>  #endif
>  
>  #ifndef VT_BUF_HAVE_MEMSETW

Patch
diff mbox

diff --git a/include/linux/vt_buffer.h b/include/linux/vt_buffer.h
index 057db7d..f38c10b 100644
--- a/include/linux/vt_buffer.h
+++ b/include/linux/vt_buffer.h
@@ -21,10 +21,6 @@ 
 #ifndef VT_BUF_HAVE_RW
 #define scr_writew(val, addr) (*(addr) = (val))
 #define scr_readw(addr) (*(addr))
-#define scr_memcpyw(d, s, c) memcpy(d, s, c)
-#define scr_memmovew(d, s, c) memmove(d, s, c)
-#define VT_BUF_HAVE_MEMCPYW
-#define VT_BUF_HAVE_MEMMOVEW
 #endif
 
 #ifndef VT_BUF_HAVE_MEMSETW