diff mbox

[v1,06/47] mtrr: add __arch_phys_wc_add()

Message ID 1426893517-2511-7-git-send-email-mcgrof@do-not-panic.com (mailing list archive)
State New, archived
Headers show

Commit Message

Luis R. Rodriguez March 20, 2015, 11:17 p.m. UTC
From: "Luis R. Rodriguez" <mcgrof@suse.com>

Ideally on systems using PAT we can expect a swift
transition away from MTRR. There can be a few exceptions
to this, one is where device drivers are known to exist
on PATs with errata, another situation is observed on
old device drivers where devices had combined MMIO
register access with whatever area they typically
later wanted to end up using MTRR for on the same
PCI BAR. This situation can still be addressed by
splitting up ioremap'd PCI BAR into two ioremap'd
calls, one for MMIO registers, and another for whatever
is desirable for write-combining -- in order to
accomplish this though quite a bit of driver
restructuring is required.

Device drivers which are known to require large
amount of re-work in order to split ioremap'd areas
can use __arch_phys_wc_add() to avoid regressions
when PAT is enabled.

For a good example driver where things are neatly
split up on a PCI BAR refer the infiniband qib
driver. For a good example of a driver where good
amount of work is required refer to the infiniband
ipath driver.

This is *only* a transitive API -- and as such no new
drivers are ever expected to use this.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/io.h       |  4 ++++
 arch/x86/kernel/cpu/mtrr/main.c | 36 +++++++++++++++++++++++++++++-------
 include/linux/io.h              |  4 ++++
 3 files changed, 37 insertions(+), 7 deletions(-)

Comments

Andy Lutomirski March 20, 2015, 11:48 p.m. UTC | #1
On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Ideally on systems using PAT we can expect a swift
> transition away from MTRR. There can be a few exceptions
> to this, one is where device drivers are known to exist
> on PATs with errata, another situation is observed on
> old device drivers where devices had combined MMIO
> register access with whatever area they typically
> later wanted to end up using MTRR for on the same
> PCI BAR. This situation can still be addressed by
> splitting up ioremap'd PCI BAR into two ioremap'd
> calls, one for MMIO registers, and another for whatever
> is desirable for write-combining -- in order to
> accomplish this though quite a bit of driver
> restructuring is required.
>
> Device drivers which are known to require large
> amount of re-work in order to split ioremap'd areas
> can use __arch_phys_wc_add() to avoid regressions
> when PAT is enabled.
>
> For a good example driver where things are neatly
> split up on a PCI BAR refer the infiniband qib
> driver. For a good example of a driver where good
> amount of work is required refer to the infiniband
> ipath driver.
>
> This is *only* a transitive API -- and as such no new
> drivers are ever expected to use this.

What's the exact layout that this helps?  I'm sceptical that this can
ever be correct.

Is there some awful driver that has a large ioremap that's supposed to
contain multiple different memtypes?  If so, can we ioremap +
set_page_xyz instead?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain March 27, 2015, 7:53 p.m. UTC | #2
On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > Ideally on systems using PAT we can expect a swift
> > transition away from MTRR. There can be a few exceptions
> > to this, one is where device drivers are known to exist
> > on PATs with errata, another situation is observed on
> > old device drivers where devices had combined MMIO
> > register access with whatever area they typically
> > later wanted to end up using MTRR for on the same
> > PCI BAR. This situation can still be addressed by
> > splitting up ioremap'd PCI BAR into two ioremap'd
> > calls, one for MMIO registers, and another for whatever
> > is desirable for write-combining -- in order to
> > accomplish this though quite a bit of driver
> > restructuring is required.
> >
> > Device drivers which are known to require large
> > amount of re-work in order to split ioremap'd areas
> > can use __arch_phys_wc_add() to avoid regressions
> > when PAT is enabled.
> >
> > For a good example driver where things are neatly
> > split up on a PCI BAR refer the infiniband qib
> > driver. For a good example of a driver where good
> > amount of work is required refer to the infiniband
> > ipath driver.
> >
> > This is *only* a transitive API -- and as such no new
> > drivers are ever expected to use this.
> 
> What's the exact layout that this helps?  I'm sceptical that this can
> ever be correct.
> 
> Is there some awful driver that has a large ioremap that's supposed to
> contain multiple different memtypes? 

Yes, I cc'd you just now on one where I made changes on a driver which uses one
PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
regress those drivers by making the MTRR WC hole trick non functional.
The changes are non trivial and so in this series I supplied changes on
one driver only to show the effort required. The other drivers which
required this were:

Driver		File
------------------------------------------------------------
fusion		drivers/message/fusion/mptbase.c
ivtv		drivers/media/pci/ivtv/ivtvfb.c
ipath		drivers/infiniband/hw/ipath/ipath_driver.c

This series makes those drivers use __arch_phys_wc_add() more as a
transitory phase in hopes we can address the proper split as with the
atyfb illustrates. For ipath the changes required have a nice template
with the qib driver as they share very similar driver structure, the
qib driver *did* do the nice split.

> If so, can we ioremap + set_page_xyz instead?

I'm not sure I see which call we'd use.  Care to provide an example patch
alternative for the atyfb as a case in point alternative to the work required
to do the split?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski March 27, 2015, 7:58 p.m. UTC | #3
On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >
>> > Ideally on systems using PAT we can expect a swift
>> > transition away from MTRR. There can be a few exceptions
>> > to this, one is where device drivers are known to exist
>> > on PATs with errata, another situation is observed on
>> > old device drivers where devices had combined MMIO
>> > register access with whatever area they typically
>> > later wanted to end up using MTRR for on the same
>> > PCI BAR. This situation can still be addressed by
>> > splitting up ioremap'd PCI BAR into two ioremap'd
>> > calls, one for MMIO registers, and another for whatever
>> > is desirable for write-combining -- in order to
>> > accomplish this though quite a bit of driver
>> > restructuring is required.
>> >
>> > Device drivers which are known to require large
>> > amount of re-work in order to split ioremap'd areas
>> > can use __arch_phys_wc_add() to avoid regressions
>> > when PAT is enabled.
>> >
>> > For a good example driver where things are neatly
>> > split up on a PCI BAR refer the infiniband qib
>> > driver. For a good example of a driver where good
>> > amount of work is required refer to the infiniband
>> > ipath driver.
>> >
>> > This is *only* a transitive API -- and as such no new
>> > drivers are ever expected to use this.
>>
>> What's the exact layout that this helps?  I'm sceptical that this can
>> ever be correct.
>>
>> Is there some awful driver that has a large ioremap that's supposed to
>> contain multiple different memtypes?
>
> Yes, I cc'd you just now on one where I made changes on a driver which uses one
> PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> regress those drivers by making the MTRR WC hole trick non functional.
> The changes are non trivial and so in this series I supplied changes on
> one driver only to show the effort required. The other drivers which
> required this were:
>
> Driver          File
> ------------------------------------------------------------
> fusion          drivers/message/fusion/mptbase.c
> ivtv            drivers/media/pci/ivtv/ivtvfb.c
> ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>
> This series makes those drivers use __arch_phys_wc_add() more as a
> transitory phase in hopes we can address the proper split as with the
> atyfb illustrates. For ipath the changes required have a nice template
> with the qib driver as they share very similar driver structure, the
> qib driver *did* do the nice split.
>
>> If so, can we ioremap + set_page_xyz instead?
>
> I'm not sure I see which call we'd use.  Care to provide an example patch
> alternative for the atyfb as a case in point alternative to the work required
> to do the split?
>

I'm still confused.  Would it be insufficient to ioremap_nocache the
whole thing and then call set_memory_wc on parts of it?  (Sorry,
set_page_xyz was a typo.)

--Andy
Luis Chamberlain March 27, 2015, 8:30 p.m. UTC | #4
On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> <mcgrof@do-not-panic.com> wrote:
> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >
> >> > Ideally on systems using PAT we can expect a swift
> >> > transition away from MTRR. There can be a few exceptions
> >> > to this, one is where device drivers are known to exist
> >> > on PATs with errata, another situation is observed on
> >> > old device drivers where devices had combined MMIO
> >> > register access with whatever area they typically
> >> > later wanted to end up using MTRR for on the same
> >> > PCI BAR. This situation can still be addressed by
> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> > calls, one for MMIO registers, and another for whatever
> >> > is desirable for write-combining -- in order to
> >> > accomplish this though quite a bit of driver
> >> > restructuring is required.
> >> >
> >> > Device drivers which are known to require large
> >> > amount of re-work in order to split ioremap'd areas
> >> > can use __arch_phys_wc_add() to avoid regressions
> >> > when PAT is enabled.
> >> >
> >> > For a good example driver where things are neatly
> >> > split up on a PCI BAR refer the infiniband qib
> >> > driver. For a good example of a driver where good
> >> > amount of work is required refer to the infiniband
> >> > ipath driver.
> >> >
> >> > This is *only* a transitive API -- and as such no new
> >> > drivers are ever expected to use this.
> >>
> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> ever be correct.
> >>
> >> Is there some awful driver that has a large ioremap that's supposed to
> >> contain multiple different memtypes?
> >
> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> > regress those drivers by making the MTRR WC hole trick non functional.
> > The changes are non trivial and so in this series I supplied changes on
> > one driver only to show the effort required. The other drivers which
> > required this were:
> >
> > Driver          File
> > ------------------------------------------------------------
> > fusion          drivers/message/fusion/mptbase.c
> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >
> > This series makes those drivers use __arch_phys_wc_add() more as a
> > transitory phase in hopes we can address the proper split as with the
> > atyfb illustrates. For ipath the changes required have a nice template
> > with the qib driver as they share very similar driver structure, the
> > qib driver *did* do the nice split.
> >
> >> If so, can we ioremap + set_page_xyz instead?
> >
> > I'm not sure I see which call we'd use.  Care to provide an example patch
> > alternative for the atyfb as a case in point alternative to the work required
> > to do the split?
> >
> 
> I'm still confused.  Would it be insufficient to ioremap_nocache the
> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> set_page_xyz was a typo.)

I think that would be a sexy alternative.

In this driver's case the thing is a bit messy as it not only used
the WC MTRR for a hole but it also then used a UC MTRR on top of
it all, so since I already tried to address the split, and if we address
the power of 2 woes, I think it'd be best to try to remove the UC MTRR
and just avoid set_page_wc() in this driver's case, but for the other cases
(fusion, ivtv, ipath) I think this makes sense.

Thoughts?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski March 27, 2015, 9:23 p.m. UTC | #5
On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> >> <mcgrof@do-not-panic.com> wrote:
>> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >> >
>> >> > Ideally on systems using PAT we can expect a swift
>> >> > transition away from MTRR. There can be a few exceptions
>> >> > to this, one is where device drivers are known to exist
>> >> > on PATs with errata, another situation is observed on
>> >> > old device drivers where devices had combined MMIO
>> >> > register access with whatever area they typically
>> >> > later wanted to end up using MTRR for on the same
>> >> > PCI BAR. This situation can still be addressed by
>> >> > splitting up ioremap'd PCI BAR into two ioremap'd
>> >> > calls, one for MMIO registers, and another for whatever
>> >> > is desirable for write-combining -- in order to
>> >> > accomplish this though quite a bit of driver
>> >> > restructuring is required.
>> >> >
>> >> > Device drivers which are known to require large
>> >> > amount of re-work in order to split ioremap'd areas
>> >> > can use __arch_phys_wc_add() to avoid regressions
>> >> > when PAT is enabled.
>> >> >
>> >> > For a good example driver where things are neatly
>> >> > split up on a PCI BAR refer the infiniband qib
>> >> > driver. For a good example of a driver where good
>> >> > amount of work is required refer to the infiniband
>> >> > ipath driver.
>> >> >
>> >> > This is *only* a transitive API -- and as such no new
>> >> > drivers are ever expected to use this.
>> >>
>> >> What's the exact layout that this helps?  I'm sceptical that this can
>> >> ever be correct.
>> >>
>> >> Is there some awful driver that has a large ioremap that's supposed to
>> >> contain multiple different memtypes?
>> >
>> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
>> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
>> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
>> > regress those drivers by making the MTRR WC hole trick non functional.
>> > The changes are non trivial and so in this series I supplied changes on
>> > one driver only to show the effort required. The other drivers which
>> > required this were:
>> >
>> > Driver          File
>> > ------------------------------------------------------------
>> > fusion          drivers/message/fusion/mptbase.c
>> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
>> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>> >
>> > This series makes those drivers use __arch_phys_wc_add() more as a
>> > transitory phase in hopes we can address the proper split as with the
>> > atyfb illustrates. For ipath the changes required have a nice template
>> > with the qib driver as they share very similar driver structure, the
>> > qib driver *did* do the nice split.
>> >
>> >> If so, can we ioremap + set_page_xyz instead?
>> >
>> > I'm not sure I see which call we'd use.  Care to provide an example patch
>> > alternative for the atyfb as a case in point alternative to the work required
>> > to do the split?
>> >
>>
>> I'm still confused.  Would it be insufficient to ioremap_nocache the
>> whole thing and then call set_memory_wc on parts of it?  (Sorry,
>> set_page_xyz was a typo.)
>
> I think that would be a sexy alternative.
>
> In this driver's case the thing is a bit messy as it not only used
> the WC MTRR for a hole but it also then used a UC MTRR on top of
> it all, so since I already tried to address the split, and if we address
> the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> and just avoid set_page_wc() in this driver's case, but for the other cases
> (fusion, ivtv, ipath) I think this makes sense.
>
> Thoughts?

Once that WC MTRR is in place, I think you really need UC and not UC-
if you want to override it.  Otherwise I agree with all of this.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain March 27, 2015, 11:04 p.m. UTC | #6
On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> >> <mcgrof@do-not-panic.com> wrote:
> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >> >
> >> >> > Ideally on systems using PAT we can expect a swift
> >> >> > transition away from MTRR. There can be a few exceptions
> >> >> > to this, one is where device drivers are known to exist
> >> >> > on PATs with errata, another situation is observed on
> >> >> > old device drivers where devices had combined MMIO
> >> >> > register access with whatever area they typically
> >> >> > later wanted to end up using MTRR for on the same
> >> >> > PCI BAR. This situation can still be addressed by
> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> >> > calls, one for MMIO registers, and another for whatever
> >> >> > is desirable for write-combining -- in order to
> >> >> > accomplish this though quite a bit of driver
> >> >> > restructuring is required.
> >> >> >
> >> >> > Device drivers which are known to require large
> >> >> > amount of re-work in order to split ioremap'd areas
> >> >> > can use __arch_phys_wc_add() to avoid regressions
> >> >> > when PAT is enabled.
> >> >> >
> >> >> > For a good example driver where things are neatly
> >> >> > split up on a PCI BAR refer the infiniband qib
> >> >> > driver. For a good example of a driver where good
> >> >> > amount of work is required refer to the infiniband
> >> >> > ipath driver.
> >> >> >
> >> >> > This is *only* a transitive API -- and as such no new
> >> >> > drivers are ever expected to use this.
> >> >>
> >> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> >> ever be correct.
> >> >>
> >> >> Is there some awful driver that has a large ioremap that's supposed to
> >> >> contain multiple different memtypes?
> >> >
> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> >> > regress those drivers by making the MTRR WC hole trick non functional.
> >> > The changes are non trivial and so in this series I supplied changes on
> >> > one driver only to show the effort required. The other drivers which
> >> > required this were:
> >> >
> >> > Driver          File
> >> > ------------------------------------------------------------
> >> > fusion          drivers/message/fusion/mptbase.c
> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >> >
> >> > This series makes those drivers use __arch_phys_wc_add() more as a
> >> > transitory phase in hopes we can address the proper split as with the
> >> > atyfb illustrates. For ipath the changes required have a nice template
> >> > with the qib driver as they share very similar driver structure, the
> >> > qib driver *did* do the nice split.
> >> >
> >> >> If so, can we ioremap + set_page_xyz instead?
> >> >
> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
> >> > alternative for the atyfb as a case in point alternative to the work required
> >> > to do the split?
> >> >
> >>
> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> >> set_page_xyz was a typo.)
> >
> > I think that would be a sexy alternative.
> >
> > In this driver's case the thing is a bit messy as it not only used
> > the WC MTRR for a hole but it also then used a UC MTRR on top of
> > it all, so since I already tried to address the split, and if we address
> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> > and just avoid set_page_wc() in this driver's case, but for the other cases
> > (fusion, ivtv, ipath) I think this makes sense.
> >
> > Thoughts?
> 
> Once that WC MTRR is in place, I think you really need UC and not UC-
> if you want to override it.  Otherwise I agree with all of this.

Do you mean that the UC MTRR work around that was in place might not
have really been effective? Not quite sure what you mean. I don't think
I follow.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski March 27, 2015, 11:10 p.m. UTC | #7
On Fri, Mar 27, 2015 at 4:04 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> >> >> <mcgrof@do-not-panic.com> wrote:
>> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >> >> >
>> >> >> > Ideally on systems using PAT we can expect a swift
>> >> >> > transition away from MTRR. There can be a few exceptions
>> >> >> > to this, one is where device drivers are known to exist
>> >> >> > on PATs with errata, another situation is observed on
>> >> >> > old device drivers where devices had combined MMIO
>> >> >> > register access with whatever area they typically
>> >> >> > later wanted to end up using MTRR for on the same
>> >> >> > PCI BAR. This situation can still be addressed by
>> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
>> >> >> > calls, one for MMIO registers, and another for whatever
>> >> >> > is desirable for write-combining -- in order to
>> >> >> > accomplish this though quite a bit of driver
>> >> >> > restructuring is required.
>> >> >> >
>> >> >> > Device drivers which are known to require large
>> >> >> > amount of re-work in order to split ioremap'd areas
>> >> >> > can use __arch_phys_wc_add() to avoid regressions
>> >> >> > when PAT is enabled.
>> >> >> >
>> >> >> > For a good example driver where things are neatly
>> >> >> > split up on a PCI BAR refer the infiniband qib
>> >> >> > driver. For a good example of a driver where good
>> >> >> > amount of work is required refer to the infiniband
>> >> >> > ipath driver.
>> >> >> >
>> >> >> > This is *only* a transitive API -- and as such no new
>> >> >> > drivers are ever expected to use this.
>> >> >>
>> >> >> What's the exact layout that this helps?  I'm sceptical that this can
>> >> >> ever be correct.
>> >> >>
>> >> >> Is there some awful driver that has a large ioremap that's supposed to
>> >> >> contain multiple different memtypes?
>> >> >
>> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
>> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
>> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
>> >> > regress those drivers by making the MTRR WC hole trick non functional.
>> >> > The changes are non trivial and so in this series I supplied changes on
>> >> > one driver only to show the effort required. The other drivers which
>> >> > required this were:
>> >> >
>> >> > Driver          File
>> >> > ------------------------------------------------------------
>> >> > fusion          drivers/message/fusion/mptbase.c
>> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
>> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>> >> >
>> >> > This series makes those drivers use __arch_phys_wc_add() more as a
>> >> > transitory phase in hopes we can address the proper split as with the
>> >> > atyfb illustrates. For ipath the changes required have a nice template
>> >> > with the qib driver as they share very similar driver structure, the
>> >> > qib driver *did* do the nice split.
>> >> >
>> >> >> If so, can we ioremap + set_page_xyz instead?
>> >> >
>> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
>> >> > alternative for the atyfb as a case in point alternative to the work required
>> >> > to do the split?
>> >> >
>> >>
>> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
>> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
>> >> set_page_xyz was a typo.)
>> >
>> > I think that would be a sexy alternative.
>> >
>> > In this driver's case the thing is a bit messy as it not only used
>> > the WC MTRR for a hole but it also then used a UC MTRR on top of
>> > it all, so since I already tried to address the split, and if we address
>> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
>> > and just avoid set_page_wc() in this driver's case, but for the other cases
>> > (fusion, ivtv, ipath) I think this makes sense.
>> >
>> > Thoughts?
>>
>> Once that WC MTRR is in place, I think you really need UC and not UC-
>> if you want to override it.  Otherwise I agree with all of this.
>
> Do you mean that the UC MTRR work around that was in place might not
> have really been effective? Not quite sure what you mean. I don't think
> I follow.

I mean that the UC MTRR that overrides the WC MTRR was probably fine
(I hope smaller MTRRs override larger MTRRs).  But we should just
ditch UC MTRRs entirely, and setting UC in the page tables would work
on all CPUs *if we supported that*.  We'd need to add a couple trivial
helpers to do that.

--Andy

>
>   Luis
Luis Chamberlain March 27, 2015, 11:33 p.m. UTC | #8
On Fri, Mar 27, 2015 at 04:10:03PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 4:04 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> >> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> >> >> <mcgrof@do-not-panic.com> wrote:
> >> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >> >> >
> >> >> >> > Ideally on systems using PAT we can expect a swift
> >> >> >> > transition away from MTRR. There can be a few exceptions
> >> >> >> > to this, one is where device drivers are known to exist
> >> >> >> > on PATs with errata, another situation is observed on
> >> >> >> > old device drivers where devices had combined MMIO
> >> >> >> > register access with whatever area they typically
> >> >> >> > later wanted to end up using MTRR for on the same
> >> >> >> > PCI BAR. This situation can still be addressed by
> >> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> >> >> > calls, one for MMIO registers, and another for whatever
> >> >> >> > is desirable for write-combining -- in order to
> >> >> >> > accomplish this though quite a bit of driver
> >> >> >> > restructuring is required.
> >> >> >> >
> >> >> >> > Device drivers which are known to require large
> >> >> >> > amount of re-work in order to split ioremap'd areas
> >> >> >> > can use __arch_phys_wc_add() to avoid regressions
> >> >> >> > when PAT is enabled.
> >> >> >> >
> >> >> >> > For a good example driver where things are neatly
> >> >> >> > split up on a PCI BAR refer the infiniband qib
> >> >> >> > driver. For a good example of a driver where good
> >> >> >> > amount of work is required refer to the infiniband
> >> >> >> > ipath driver.
> >> >> >> >
> >> >> >> > This is *only* a transitive API -- and as such no new
> >> >> >> > drivers are ever expected to use this.
> >> >> >>
> >> >> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> >> >> ever be correct.
> >> >> >>
> >> >> >> Is there some awful driver that has a large ioremap that's supposed to
> >> >> >> contain multiple different memtypes?
> >> >> >
> >> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> >> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> >> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> >> >> > regress those drivers by making the MTRR WC hole trick non functional.
> >> >> > The changes are non trivial and so in this series I supplied changes on
> >> >> > one driver only to show the effort required. The other drivers which
> >> >> > required this were:
> >> >> >
> >> >> > Driver          File
> >> >> > ------------------------------------------------------------
> >> >> > fusion          drivers/message/fusion/mptbase.c
> >> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> >> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >> >> >
> >> >> > This series makes those drivers use __arch_phys_wc_add() more as a
> >> >> > transitory phase in hopes we can address the proper split as with the
> >> >> > atyfb illustrates. For ipath the changes required have a nice template
> >> >> > with the qib driver as they share very similar driver structure, the
> >> >> > qib driver *did* do the nice split.
> >> >> >
> >> >> >> If so, can we ioremap + set_page_xyz instead?
> >> >> >
> >> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
> >> >> > alternative for the atyfb as a case in point alternative to the work required
> >> >> > to do the split?
> >> >> >
> >> >>
> >> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
> >> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> >> >> set_page_xyz was a typo.)
> >> >
> >> > I think that would be a sexy alternative.
> >> >
> >> > In this driver's case the thing is a bit messy as it not only used
> >> > the WC MTRR for a hole but it also then used a UC MTRR on top of
> >> > it all, so since I already tried to address the split, and if we address
> >> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> >> > and just avoid set_page_wc() in this driver's case, but for the other cases
> >> > (fusion, ivtv, ipath) I think this makes sense.
> >> >
> >> > Thoughts?
> >>
> >> Once that WC MTRR is in place, I think you really need UC and not UC-
> >> if you want to override it.  Otherwise I agree with all of this.
> >
> > Do you mean that the UC MTRR work around that was in place might not
> > have really been effective? Not quite sure what you mean. I don't think
> > I follow.
> 
> I mean that the UC MTRR that overrides the WC MTRR was probably fine
> (I hope smaller MTRRs override larger MTRRs).  But we should just
> ditch UC MTRRs entirely, 

Agreed, this series does that, and this patch addresses the last
UC MTRR ;)

> and setting UC in the page tables would work on all CPUs *if we supported
> that*.  We'd need to add a couple trivial helpers to do that.

OK please check my latest reply and if you do not mind clarify what you mean
there as I am not sure if we're on the same page (no pun) here. I don't quite
follow this last statement.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas April 2, 2015, 8:21 p.m. UTC | #9
On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Ideally on systems using PAT we can expect a swift
> transition away from MTRR. There can be a few exceptions
> to this, one is where device drivers are known to exist
> on PATs with errata,

This probably makes sense to someone steeped in MTRR and PAT, but not
otherwise.  "One exception is where drivers are known to exist on PATs
with errata"?  The drivers exist, independent of PAT/MTRR/errata.  Do
you mean there are drivers that can't be converted from MTRR to PAT
because some PATs are broken?

I don't really know anything about MTRR or PAT; I'm just trying to
figure out how to parse this paragraph.

> another situation is observed on
> old device drivers where devices had combined MMIO
> register access with whatever area they typically
> later wanted to end up using MTRR for on the same
> PCI BAR. This situation can still be addressed by
> splitting up ioremap'd PCI BAR into two ioremap'd
> calls, one for MMIO registers, and another for whatever
> is desirable for write-combining -- in order to
> accomplish this though quite a bit of driver
> restructuring is required.
>
> Device drivers which are known to require large
> amount of re-work in order to split ioremap'd areas
> can use __arch_phys_wc_add() to avoid regressions
> when PAT is enabled.
>
> For a good example driver where things are neatly
> split up on a PCI BAR refer the infiniband qib
> driver. For a good example of a driver where good
> amount of work is required refer to the infiniband
> ipath driver.
>
> This is *only* a transitive API -- and as such no new
> drivers are ever expected to use this.

"transient"?  Do you mean you intend to remove this API in the near future?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain April 2, 2015, 8:55 p.m. UTC | #10
On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > Ideally on systems using PAT we can expect a swift
> > transition away from MTRR. There can be a few exceptions
> > to this, one is where device drivers are known to exist
> > on PATs with errata,
> 
> This probably makes sense to someone steeped in MTRR and PAT, but not
> otherwise.  "One exception is where drivers are known to exist on PATs
> with errata"?  The drivers exist, independent of PAT/MTRR/errata.  Do
> you mean there are drivers that can't be converted from MTRR to PAT
> because some PATs are broken?

Well there is that but it seems we have motivation to
address the PAT broken systems so this would be one of the
lower priority reasons to consider adding this API. The
more important reason is below.

> I don't really know anything about MTRR or PAT; I'm just trying to
> figure out how to parse this paragraph.

Sure.

> > another situation is observed on
> > old device drivers where devices had combined MMIO
> > register access with whatever area they typically
> > later wanted to end up using MTRR for on the same
> > PCI BAR. This situation can still be addressed by
> > splitting up ioremap'd PCI BAR into two ioremap'd
> > calls, one for MMIO registers, and another for whatever
> > is desirable for write-combining -- in order to
> > accomplish this though quite a bit of driver
> > restructuring is required.
> >
> > Device drivers which are known to require large
> > amount of re-work in order to split ioremap'd areas
> > can use __arch_phys_wc_add() to avoid regressions
> > when PAT is enabled.
> >
> > For a good example driver where things are neatly
> > split up on a PCI BAR refer the infiniband qib
> > driver. For a good example of a driver where good
> > amount of work is required refer to the infiniband
> > ipath driver.
> >
> > This is *only* a transitive API -- and as such no new
> > drivers are ever expected to use this.
> 
> "transient"?  Do you mean you intend to remove this API in the near future?

That's correct, the problem is that in order to use PAT cleanly we'd need to
change these drivers to not overlap ioremap'd areas otherwise things can get
quite complex, and changing the way we do the ioremap() calls on a driver might
require a bit of work. The atyfb driver changes I did are an example of the
types of changes that are expected.  In the most complex worst cases there are
MTRR "hole" tricks used, and as can be observed with the atyfb driver changes
there are a series of things to consider when this is done specially in light
of eventually making strong UC the default instead of UC-.

I might be able to work around not adding this API by reviewing the users I had
in this series again and seeing if something similar to what I will do on atyfb
can be done in the meantime by using ioremap_uc(). Its not clear to me yet.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas April 2, 2015, 10:35 p.m. UTC | #11
[-cc Venkatesh, Suresh]

On Thu, Apr 2, 2015 at 3:55 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
>> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>

>> > This is *only* a transitive API -- and as such no new
>> > drivers are ever expected to use this.
>>
>> "transient"?  Do you mean you intend to remove this API in the near future?
>
> That's correct, the problem is that in order to use PAT cleanly we'd need to
> change these drivers ...

I was just trying to ask whether you intended to write "transient"
instead of "transitive."  But I'm not doing a very good job :)

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain April 2, 2015, 10:54 p.m. UTC | #12
On Thu, Apr 2, 2015 at 3:35 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [-cc Venkatesh, Suresh]
>
> On Thu, Apr 2, 2015 at 3:55 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
>>> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
>>> <mcgrof@do-not-panic.com> wrote:
>>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
>>> > This is *only* a transitive API -- and as such no new
>>> > drivers are ever expected to use this.
>>>
>>> "transient"?  Do you mean you intend to remove this API in the near future?
>>
>> That's correct, the problem is that in order to use PAT cleanly we'd need to
>> change these drivers ...
>
> I was just trying to ask whether you intended to write "transient"
> instead of "transitive."  But I'm not doing a very good job :)

Yes, corrected, thanks.

 Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 34a5b93..a144d05 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -338,6 +338,10 @@  extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 #define IO_SPACE_LIMIT 0xffff
 
 #ifdef CONFIG_MTRR
+extern int __must_check __arch_phys_wc_add(unsigned long base,
+					   unsigned long size);
+#define __arch_phys_wc_add __arch_phys_wc_add
+
 extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 7db9c47..5ae830b 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -538,23 +538,24 @@  int mtrr_del(int reg, unsigned long base, unsigned long size)
 EXPORT_SYMBOL(mtrr_del);
 
 /**
- * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
+ * __arch_phys_wc_add - add a WC MTRR even if PAT is available
  * @base: Physical base address
  * @size: Size of region
  *
- * If PAT is available, this does nothing.  If PAT is unavailable, it
- * attempts to add a WC MTRR covering size bytes starting at base and
- * logs an error if this fails.
+ * We typically do not want to use MTRR if PAT is available but there
+ * are some drivers which require significant work to get this to work
+ * properly. This call should only be used by those drivers where it is
+ * clear that hard work is required to modify them to use arch_phys_wc_add()
  *
  * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
  * but drivers should not try to interpret that return value.
  */
-int arch_phys_wc_add(unsigned long base, unsigned long size)
+int __arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled || !mtrr_enabled)
-		return 0;  /* Success!  (We don't need to do anything.) */
+	if (!mtrr_enabled)
+		return 0;
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
 	if (ret < 0) {
@@ -564,6 +565,27 @@  int arch_phys_wc_add(unsigned long base, unsigned long size)
 	}
 	return ret + MTRR_TO_PHYS_WC_OFFSET;
 }
+EXPORT_SYMBOL_GPL(__arch_phys_wc_add);
+
+/**
+ * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
+ * @base: Physical base address
+ * @size: Size of region
+ *
+ * If PAT is available, this does nothing.  If PAT is unavailable, it
+ * attempts to add a WC MTRR covering size bytes starting at base and
+ * logs an error if this fails.
+ *
+ * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
+ * but drivers should not try to interpret that return value.
+ */
+int arch_phys_wc_add(unsigned long base, unsigned long size)
+{
+	if (pat_enabled || !mtrr_enabled)
+		return 0;  /* Success!  (We don't need to do anything.) */
+
+	return __arch_phys_wc_add(base, size);
+}
 EXPORT_SYMBOL(arch_phys_wc_add);
 
 /*
diff --git a/include/linux/io.h b/include/linux/io.h
index 91101a1..ecc51c3 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,10 @@  static inline void arch_phys_wc_del(int handle)
 }
 
 #define arch_phys_wc_add arch_phys_wc_add
+#ifndef __arch_phys_wc_add
+#define __arch_phys_wc_add arch_phys_wc_add
+#endif
+
 #endif
 
 #endif /* _LINUX_IO_H */