Message ID | 1422504685-7864-1-git-send-email-airlied@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 01/28/2015 11:11 PM, Dave Airlie wrote: > These two copy to/from VGA memory, however on the Silicon > Motion SMI750 VGA card on a 64-bit system cause console corruption. > > This is due to the hw being buggy and not handling a 64-bit transaction > correctly. > > We could try and create a 32-bit version of these routines, > but I'm not sure the optimisation is worth much today. > > Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1132826 Restricted link. > Tested-by: Huawei engineering. > Signed-off-by: Dave Airlie <airlied@redhat.com> > --- > > Linus, this came up a while back I finally got some confirmation > that it fixes those servers. > > include/linux/vt_buffer.h | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/include/linux/vt_buffer.h b/include/linux/vt_buffer.h > index 057db7d..f38c10b 100644 > --- a/include/linux/vt_buffer.h > +++ b/include/linux/vt_buffer.h > @@ -21,10 +21,6 @@ > #ifndef VT_BUF_HAVE_RW > #define scr_writew(val, addr) (*(addr) = (val)) > #define scr_readw(addr) (*(addr)) > -#define scr_memcpyw(d, s, c) memcpy(d, s, c) > -#define scr_memmovew(d, s, c) memmove(d, s, c) > -#define VT_BUF_HAVE_MEMCPYW > -#define VT_BUF_HAVE_MEMMOVEW > #endif > > #ifndef VT_BUF_HAVE_MEMSETW > ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie <airlied@redhat.com> wrote: > > Linus, this came up a while back I finally got some confirmation > that it fixes those servers. I'm certainly ok with this. which way should it go in? The users are: - drivers/tty/vt/vt.c (Greg KH, "tty layer") - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends) and it might make sense to have *some* indication of how much worse this makes fbcon performance in particular.. Greg/Tomi - the patch is removing this: #define scr_memcpyw(d, s, c) memcpy(d, s, c) #define scr_memmovew(d, s, c) memmove(d, s, c) #define VT_BUF_HAVE_MEMCPYW #define VT_BUF_HAVE_MEMMOVEW from <linux/vt_buffer.h>, because some stupid graphics cards apparently cannot handle 64-bit accesses of regular memcpy/memmove. And on other setups, this will be the reverse: 8-bit accesses due to using "rep movsb", which is the fast way to move/clear memory on modern Intel CPU's, but is really wrong for MMIO where it will be slow as hell. So just getting rid of the memcpy/memmove is likely the right thing in general, since the fallbacks go this the traditional 16-bit-at-a-time way. And getting rid of the memcpy _may_ speed things up. But if it slows things down, we might have to try something else. Like saying "all cards we've ever seen have been ok with aligned 32-bit accesses", and extend the open-coded scr_memcpy/memmove functions to do that. Hmm? Linus ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
On Thu, Jan 29, 2015 at 03:40:33PM -0800, Linus Torvalds wrote: > On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie <airlied@redhat.com> wrote: > > > > Linus, this came up a while back I finally got some confirmation > > that it fixes those servers. > > I'm certainly ok with this. which way should it go in? The users are: > > - drivers/tty/vt/vt.c (Greg KH, "tty layer") > > - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends) > > and it might make sense to have *some* indication of how much worse > this makes fbcon performance in particular.. > > Greg/Tomi - the patch is removing this: > > #define scr_memcpyw(d, s, c) memcpy(d, s, c) > #define scr_memmovew(d, s, c) memmove(d, s, c) > #define VT_BUF_HAVE_MEMCPYW > #define VT_BUF_HAVE_MEMMOVEW > > from <linux/vt_buffer.h>, because some stupid graphics cards > apparently cannot handle 64-bit accesses of regular memcpy/memmove. > > And on other setups, this will be the reverse: 8-bit accesses due to > using "rep movsb", which is the fast way to move/clear memory on > modern Intel CPU's, but is really wrong for MMIO where it will be slow > as hell. > > So just getting rid of the memcpy/memmove is likely the right thing in > general, since the fallbacks go this the traditional 16-bit-at-a-time > way. And getting rid of the memcpy _may_ speed things up. > > But if it slows things down, we might have to try something else. Like > saying "all cards we've ever seen have been ok with aligned 32-bit > accesses", and extend the open-coded scr_memcpy/memmove functions to > do that. > > Hmm? I can take this through the tty tree, but can I put it in linux-next and wait for the 3.20 merge window to give people who might notice a slow-down a chance to object? thanks, greg k-h ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
On Thu, Jan 29, 2015 at 3:57 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > I can take this through the tty tree, but can I put it in linux-next and > wait for the 3.20 merge window to give people who might notice a > slow-down a chance to object? Yes. The problem only affects one (or a couple of) truly outrageously bad graphics cards that are only used in servers (because they are such crap that they wouldn't be acceptable anywhere else anyway), and they have afaik never worked with 64-bit kernels, so it's not even a regression. So it's worth fixing because it's a real - albeit very rare - problem (especially since the enhanched rep instruction model of memcpy could easily be *worse* than the 16-bit-at-a-time manual version), but I wouldn't consider it anywhere near high priority. Linus ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
On 30 January 2015 at 10:03, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Jan 29, 2015 at 3:57 PM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: >> >> I can take this through the tty tree, but can I put it in linux-next and >> wait for the 3.20 merge window to give people who might notice a >> slow-down a chance to object? > > Yes. The problem only affects one (or a couple of) truly outrageously > bad graphics cards that are only used in servers (because they are > such crap that they wouldn't be acceptable anywhere else anyway), and > they have afaik never worked with 64-bit kernels, so it's not even a > regression. > > So it's worth fixing because it's a real - albeit very rare - problem > (especially since the enhanched rep instruction model of memcpy could > easily be *worse* than the 16-bit-at-a-time manual version), but I > wouldn't consider it anywhere near high priority. > Totally not a priority, it just finally got tested for RHEL so I wanted to make sure I posted it upstream before I forgot about it for months, I also filed: https://bugzilla.kernel.org/show_bug.cgi?id=92311 since the RH bug is private and full of crap, that bug contains a screenshot of the remote console to see what sort of crap it produces. Dave. ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
On Thu, 29 Jan 2015 15:40:33 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie <airlied@redhat.com> wrote: > > > > Linus, this came up a while back I finally got some confirmation > > that it fixes those servers. > > I'm certainly ok with this. which way should it go in? The users are: > > - drivers/tty/vt/vt.c (Greg KH, "tty layer") > > - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends) > > and it might make sense to have *some* indication of how much worse > this makes fbcon performance in particular.. For devices that have no hardware scrolling it used to be double digit percentages difference between 32 and 64bit when reading from the fb because the reads are not posted and the latency killed you. Writes - not so big a deal - but the bridge should combine them anyway. I imagine 16bit read would be unprintably bad. Is it reads or writes that kill the card ? Also note that switching to lots of small writes may break the 3Dfx driver for the early 3Dfx PCI cards - they are really quite touchy about how they are fed. Unfortunately fbcon still matters for dumb EFI framebuffer fallbacks. vgacon it doesn't matter (if it was too slow you could make vgacon as fast as you want by only updating the off screen characters once per vertical blank). fbcon that is a bit harder as you are allowed to scribble on the display as well. You can't even check open/mmapped as you can open, scribble and close. Alan ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
On Tue, Feb 3, 2015 at 4:54 PM, One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> wrote: > On Thu, 29 Jan 2015 15:40:33 -0800 > Linus Torvalds <torvalds@linux-foundation.org> wrote: > >> On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie <airlied@redhat.com> wrote: >> > >> > Linus, this came up a while back I finally got some confirmation >> > that it fixes those servers. >> >> I'm certainly ok with this. which way should it go in? The users are: >> >> - drivers/tty/vt/vt.c (Greg KH, "tty layer") >> >> - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends) >> >> and it might make sense to have *some* indication of how much worse >> this makes fbcon performance in particular.. > > For devices that have no hardware scrolling it used to be double digit > percentages difference between 32 and 64bit when reading from the fb > because the reads are not posted and the latency killed you. Writes - not > so big a deal - but the bridge should combine them anyway. I imagine > 16bit read would be unprintably bad. Fbcon uses scr_mem{cpy,move}w() for the VT buffer (characters + attributes) only, not for the frame buffer data. So the performance degradation should be minimal. However, as this affects real VGA on x86 only, perhaps it can be fixed in arch/x86/include/asm/vga.h instead of include/linux/vt_buffer.h, so platforms not having VGA are not affected? We have these VT_BUF_* and scr_*() abstractions for a reason... If I'm not mistaken, that would be as simple as adding #define VT_BUF_HAVE_RW. #define scr_writew(val, addr) (*(addr) = (val)) #define scr_readw(addr) (*(addr)) to arch/x86/include/asm/vga.h. If someone wants to put one of the "bad" VGA cards in a non-x86 PCI slot, perhaps a few more architecture-specific asm/vga.h have to be updated: $ git grep -w VT_BUF_HAVE_RW -- arch arch/alpha/include/asm/vga.h:#define VT_BUF_HAVE_RW arch/mips/include/asm/vga.h:#define VT_BUF_HAVE_RW arch/powerpc/include/asm/vga.h:#define VT_BUF_HAVE_RW arch/sparc/include/asm/vga.h:#define VT_BUF_HAVE_RW arch/tile/include/asm/vga.h:#define VT_BUF_HAVE_RW Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
> If I'm not mistaken, that would be as simple as adding > > #define VT_BUF_HAVE_RW. > #define scr_writew(val, addr) (*(addr) = (val)) > #define scr_readw(addr) (*(addr)) > > to arch/x86/include/asm/vga.h. and stick an #if defined (CONFIG_SUPPORT_SHITE_VGA_ADAPTERS) #endif around that and its sorted as an option everyone can leave off but the afflicted. Alan ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
On 5 February 2015 at 11:35, One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> wrote: >> If I'm not mistaken, that would be as simple as adding >> >> #define VT_BUF_HAVE_RW. >> #define scr_writew(val, addr) (*(addr) = (val)) >> #define scr_readw(addr) (*(addr)) >> >> to arch/x86/include/asm/vga.h. > > and stick an > > #if defined (CONFIG_SUPPORT_SHITE_VGA_ADAPTERS) > > #endif > > around that and its sorted as an option everyone can leave off but the > afflicted. Well, given all the distros will enable that, might as well be #if !defined(CONFIG_BREAK_SOME_HARDWARE_BUT_VGA_SCROLLING_WILL_BE_IMMEASURABLY_FASTER). ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
On Mon, Feb 9, 2015 at 11:35 AM, Daniel Stone <daniel@fooishbar.org> wrote: > On 5 February 2015 at 11:35, One Thousand Gnomes > <gnomes@lxorguk.ukuu.org.uk> wrote: >>> If I'm not mistaken, that would be as simple as adding >>> >>> #define VT_BUF_HAVE_RW. >>> #define scr_writew(val, addr) (*(addr) = (val)) >>> #define scr_readw(addr) (*(addr)) >>> >>> to arch/x86/include/asm/vga.h. >> >> and stick an >> >> #if defined (CONFIG_SUPPORT_SHITE_VGA_ADAPTERS) >> >> #endif >> >> around that and its sorted as an option everyone can leave off but the >> afflicted. > > Well, given all the distros will enable that, might as well be #if > !defined(CONFIG_BREAK_SOME_HARDWARE_BUT_VGA_SCROLLING_WILL_BE_IMMEASURABLY_FASTER). All distros on 1 out of 29 architectures? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
On 9 February 2015 at 10:49, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Mon, Feb 9, 2015 at 11:35 AM, Daniel Stone <daniel@fooishbar.org> wrote: >> On 5 February 2015 at 11:35, One Thousand Gnomes >> <gnomes@lxorguk.ukuu.org.uk> wrote: >>> #if defined (CONFIG_SUPPORT_SHITE_VGA_ADAPTERS) >>> >>> #endif >>> >>> around that and its sorted as an option everyone can leave off but the >>> afflicted. >> >> Well, given all the distros will enable that, might as well be #if >> !defined(CONFIG_BREAK_SOME_HARDWARE_BUT_VGA_SCROLLING_WILL_BE_IMMEASURABLY_FASTER). > > All distros on 1 out of 29 architectures? It's a fairly popular architecture. ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
On Mon, 9 Feb 2015 11:00:55 +0000 Daniel Stone <daniel@fooishbar.org> wrote: > On 9 February 2015 at 10:49, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > On Mon, Feb 9, 2015 at 11:35 AM, Daniel Stone <daniel@fooishbar.org> wrote: > >> On 5 February 2015 at 11:35, One Thousand Gnomes > >> <gnomes@lxorguk.ukuu.org.uk> wrote: > >>> #if defined (CONFIG_SUPPORT_SHITE_VGA_ADAPTERS) > >>> > >>> #endif > >>> > >>> around that and its sorted as an option everyone can leave off but the > >>> afflicted. > >> > >> Well, given all the distros will enable that, might as well be #if > >> !defined(CONFIG_BREAK_SOME_HARDWARE_BUT_VGA_SCROLLING_WILL_BE_IMMEASURABLY_FASTER). > > > > All distros on 1 out of 29 architectures? > > It's a fairly popular architecture. I imagine most distros wouldn't enable it even on x86. It's an incredibly obscure setup from the evidence of how long it took to get reported. Most distributions don't support non PAE processors and other far more common things 8) Alan ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ --
On Thu 2015-01-29 14:11:25, Dave Airlie wrote: > These two copy to/from VGA memory, however on the Silicon > Motion SMI750 VGA card on a 64-bit system cause console corruption. > > This is due to the hw being buggy and not handling a 64-bit transaction > correctly. > > We could try and create a 32-bit version of these routines, > but I'm not sure the optimisation is worth much today. > > Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1132826 > > Tested-by: Huawei engineering. > Signed-off-by: Dave Airlie <airlied@redhat.com> Actually... are you sure this is right fix? IOW can gcc do the optimalization behind your back and still break the buggy card? Pavel > diff --git a/include/linux/vt_buffer.h b/include/linux/vt_buffer.h > index 057db7d..f38c10b 100644 > --- a/include/linux/vt_buffer.h > +++ b/include/linux/vt_buffer.h > @@ -21,10 +21,6 @@ > #ifndef VT_BUF_HAVE_RW > #define scr_writew(val, addr) (*(addr) = (val)) > #define scr_readw(addr) (*(addr)) > -#define scr_memcpyw(d, s, c) memcpy(d, s, c) > -#define scr_memmovew(d, s, c) memmove(d, s, c) > -#define VT_BUF_HAVE_MEMCPYW > -#define VT_BUF_HAVE_MEMMOVEW > #endif > > #ifndef VT_BUF_HAVE_MEMSETW
diff --git a/include/linux/vt_buffer.h b/include/linux/vt_buffer.h index 057db7d..f38c10b 100644 --- a/include/linux/vt_buffer.h +++ b/include/linux/vt_buffer.h @@ -21,10 +21,6 @@ #ifndef VT_BUF_HAVE_RW #define scr_writew(val, addr) (*(addr) = (val)) #define scr_readw(addr) (*(addr)) -#define scr_memcpyw(d, s, c) memcpy(d, s, c) -#define scr_memmovew(d, s, c) memmove(d, s, c) -#define VT_BUF_HAVE_MEMCPYW -#define VT_BUF_HAVE_MEMMOVEW #endif #ifndef VT_BUF_HAVE_MEMSETW