diff mbox

[2/3] iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices

Message ID 55684F1C.3050702@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Robin Murphy May 29, 2015, 11:35 a.m. UTC
Hi Joerg,

On 29/05/15 07:43, Joerg Roedel wrote:
> Hi Will,
>
> On Wed, May 20, 2015 at 06:09:26PM +0100, Will Deacon wrote:
>> On Tue, May 19, 2015 at 04:24:35PM +0100, Joerg Roedel wrote:
>>>> +	/* Page sizes */
>>>> +	if (reg & IDR5_GRAN64K)
>>>> +		pgsize_bitmap |= SZ_64K | SZ_512M;
>>>> +	if (reg & IDR5_GRAN16K)
>>>> +		pgsize_bitmap |= SZ_16K | SZ_32M;
>>>> +	if (reg & IDR5_GRAN4K)
>>>> +		pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
>>>> +
>>>> +	arm_smmu_ops.pgsize_bitmap &= pgsize_bitmap;
>>>
>>> So this could effictivly lead to a zero pgsize_bitmap when there are
>>> SMMUs in the system with support for different page sizes, no?
>>
>> Indeed, if there is no common page size then we end up not being able to
>> support any. I tried to resolve this by moving the bitmap out of the
>> iommu_ops and into the iommu_domain, but you weren't fond of that idea ;)
>
> Well, what you could do (and what I think the core should do at some
> point) is to build the resulting page-size bitmap by taking the biggest
> minimum page-size from all iommus and OR the page-size bigger than that
> together. For a system with 3 SMMUs supporting all of the above
> pgsize_bitmaps the resulting bitmap would look like this:
>
> 	SZ_64K | SZ_32M | SZ_2M | SZ_512M | SZ_1G;
>
> With the biggest minimum page-size all of the bigger page-sizes can be
> emulated.  But that is not necessary for this patch-set, just a
> suggestion for future work.

The trouble with this is, what about the CPU page size? Say you have 
some multimedia subsystem with its own integrated SMMU and for that 
they've only implemented the 16K granule scheme because it works best 
for the video hardware (and the GPU driver is making direct IOMMU API 
calls to remap carved-out RAM rather than using DMA-mapping). Now, the 
SMMU on the compute side of the SoC serving the general peripherals will 
be rendered useless by bumping the system-wide minimum page size up to 
16K, because it then can't map that scatterlist of discontiguous 4K 
pages that the USB controller needs...

I think this really represents another push to get away from (or at 
least around) the page-at-a-time paradigm - if the IOMMU API itself 
wasn't too fussed about page sizes and could let drivers handle the full 
map/unmap requests however they see fit, I think we could bypass a lot 
of these issues. We've already got the Intel IOMMU driver doing horrible 
hacks with the pgsize_bitmap to cheat the system, I'm sure we don't want 
to add any more of that. How about something like the below diff as a 
first step?

Robin.

---8<---

From: Robin Murphy <robin.murphy@arm.com>
Date: Fri, 29 May 2015 12:05:58 +0100
Subject: [PATCH] iommu: Allow drivers to support arbitrary map/unmap 
requests

In certain situations, the page-at-a-time operation of the IOMMU API
becomes problematic. For instance, IOMMUs which can support different
page sizes per domain confound the use of a single pgsize_bitmap, and
more generally it can be horribly inefficient for large operations
(such as tearing down VFIO domains).

To resolve this, allow drivers to expose their own low-level map/unmap
handlers, in the manner of iommu_map_sg, which may bypass this
restriction as necessary. The existing callbacks are renamed to better
reflect their page-oriented nature.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
  drivers/iommu/amd_iommu.c      |  4 ++--
  drivers/iommu/arm-smmu.c       |  4 ++--
  drivers/iommu/exynos-iommu.c   |  4 ++--
  drivers/iommu/intel-iommu.c    |  4 ++--
  drivers/iommu/iommu.c          | 14 ++++++++++----
  drivers/iommu/ipmmu-vmsa.c     |  4 ++--
  drivers/iommu/msm_iommu.c      |  4 ++--
  drivers/iommu/omap-iommu.c     |  4 ++--
  drivers/iommu/rockchip-iommu.c |  4 ++--
  drivers/iommu/shmobile-iommu.c |  4 ++--
  drivers/iommu/tegra-gart.c     |  4 ++--
  drivers/iommu/tegra-smmu.c     |  4 ++--
  include/linux/iommu.h          |  8 +++++++-
  13 files changed, 39 insertions(+), 27 deletions(-)

chunks
   * to an iommu domain
   * @iova_to_phys: translate iova to physical address
@@ -147,7 +149,11 @@ struct iommu_ops {
  	int (*map)(struct iommu_domain *domain, unsigned long iova,
  		   phys_addr_t paddr, size_t size, int prot);
  	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-		     size_t size);
+			size_t size);
+	int (*map_page)(struct iommu_domain *domain, unsigned long iova,
+			phys_addr_t paddr, size_t size, int prot);
+	size_t (*unmap_page)(struct iommu_domain *domain, unsigned long iova,
+			     size_t size);
  	size_t (*map_sg)(struct iommu_domain *domain, unsigned long iova,
  			 struct scatterlist *sg, unsigned int nents, int prot);
  	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t 
iova);

Comments

Joerg Roedel May 29, 2015, 2:40 p.m. UTC | #1
On Fri, May 29, 2015 at 12:35:56PM +0100, Robin Murphy wrote:
> The trouble with this is, what about the CPU page size? Say you have
> some multimedia subsystem with its own integrated SMMU and for that
> they've only implemented the 16K granule scheme because it works
> best for the video hardware (and the GPU driver is making direct
> IOMMU API calls to remap carved-out RAM rather than using
> DMA-mapping). Now, the SMMU on the compute side of the SoC serving
> the general peripherals will be rendered useless by bumping the
> system-wide minimum page size up to 16K, because it then can't map
> that scatterlist of discontiguous 4K pages that the USB controller
> needs...
> 
> I think this really represents another push to get away from (or at
> least around) the page-at-a-time paradigm - if the IOMMU API itself
> wasn't too fussed about page sizes and could let drivers handle the
> full map/unmap requests however they see fit, I think we could
> bypass a lot of these issues. We've already got the Intel IOMMU
> driver doing horrible hacks with the pgsize_bitmap to cheat the
> system, I'm sure we don't want to add any more of that. How about
> something like the below diff as a first step?

Moving functionality out of the iommu core code into the drivers is the
wrong direction imo. It is better to solve it with something like

	struct iommu_domain *iommu_domain_alloc_for_group(struct iommu_group *group);

Which gets us a domain that can only be assigned to that particular
group. Since there is a clear one-to-many relationship between a
hardware iommu and the groups of devices behind it, we could propagate
the pgsize_bitmap from the iommu to the group and then to the domain.

Domains allocated via iommu_domain_alloc() would get the merged
pgsize_bitmap like I described in my previous mail.

But to make this happen we need a representation of single hardware
iommu instances in the iommu core first.


	Joerg
Will Deacon June 1, 2015, 9:40 a.m. UTC | #2
Hi Joerg,

On Fri, May 29, 2015 at 03:40:43PM +0100, Joerg Roedel wrote:
> On Fri, May 29, 2015 at 12:35:56PM +0100, Robin Murphy wrote:
> > The trouble with this is, what about the CPU page size? Say you have
> > some multimedia subsystem with its own integrated SMMU and for that
> > they've only implemented the 16K granule scheme because it works
> > best for the video hardware (and the GPU driver is making direct
> > IOMMU API calls to remap carved-out RAM rather than using
> > DMA-mapping). Now, the SMMU on the compute side of the SoC serving
> > the general peripherals will be rendered useless by bumping the
> > system-wide minimum page size up to 16K, because it then can't map
> > that scatterlist of discontiguous 4K pages that the USB controller
> > needs...
> > 
> > I think this really represents another push to get away from (or at
> > least around) the page-at-a-time paradigm - if the IOMMU API itself
> > wasn't too fussed about page sizes and could let drivers handle the
> > full map/unmap requests however they see fit, I think we could
> > bypass a lot of these issues. We've already got the Intel IOMMU
> > driver doing horrible hacks with the pgsize_bitmap to cheat the
> > system, I'm sure we don't want to add any more of that. How about
> > something like the below diff as a first step?
> 
> Moving functionality out of the iommu core code into the drivers is the
> wrong direction imo. It is better to solve it with something like
> 
> 	struct iommu_domain *iommu_domain_alloc_for_group(struct iommu_group *group);
> 
> Which gets us a domain that can only be assigned to that particular
> group. Since there is a clear one-to-many relationship between a
> hardware iommu and the groups of devices behind it, we could propagate
> the pgsize_bitmap from the iommu to the group and then to the domain.
> 
> Domains allocated via iommu_domain_alloc() would get the merged
> pgsize_bitmap like I described in my previous mail.
> 
> But to make this happen we need a representation of single hardware
> iommu instances in the iommu core first.

I like this proposal. The only remaining part is that the pgsize bitmap
inherited by the domain can only truly be finalised once the page table
is allocated, which in turn can only happen once we've identified the
IOMMU instance.

In the case of iommu_domain_alloc_for_group, that's nice and
straightforward (i.e. as part of the call) but for the default
iommu_domain_alloc() case, we'd have to continue postponing things to
->attach time before we could provide a reliable pgsize_bitmap. This is
to handle the case where an SMMU may be able to support only a subset of
the pgsize_bitmap on a per-domain basis.

In which situation do you think the merged pgsize_bitmap would get used?
The only place I can think of is if we're trying to call iommu_map on a
domain with no devices attached. However, if that domain was created
using iommu_domain_alloc() then the pgsize_bitmap is the least of our
worries -- we don't even know the page table format!

Will
Joerg Roedel June 2, 2015, 7:39 a.m. UTC | #3
Hi Will,

On Mon, Jun 01, 2015 at 10:40:14AM +0100, Will Deacon wrote:
> I like this proposal. The only remaining part is that the pgsize bitmap
> inherited by the domain can only truly be finalised once the page table
> is allocated, which in turn can only happen once we've identified the
> IOMMU instance.

We know the IOMMU instance for domains allocted with
iommu_domain_alloc_for_group because every group is behind a single
IOMMU instance. So the page-table can be allocated at domain creation
time and mappings can be created before the group itself is attached.

> In the case of iommu_domain_alloc_for_group, that's nice and
> straightforward (i.e. as part of the call) but for the default
> iommu_domain_alloc() case, we'd have to continue postponing things to
> ->attach time before we could provide a reliable pgsize_bitmap. This is
> to handle the case where an SMMU may be able to support only a subset of
> the pgsize_bitmap on a per-domain basis.

I don't think we need to postpone anything. Domains returned by
iommu_domain_alloc() need to work for all groups. If there are multiple
IOMMUs in the system with different capabilities/page-table formats the
IOMMU core needs to build multiple page-tables for that domain.

VFIO already does that as a workaround of how things are done currently,
but I really think this should be done in the IOMMU core code.

> In which situation do you think the merged pgsize_bitmap would get used?
> The only place I can think of is if we're trying to call iommu_map on a
> domain with no devices attached. However, if that domain was created
> using iommu_domain_alloc() then the pgsize_bitmap is the least of our
> worries -- we don't even know the page table format!

The first usecase for the merged pgsize_bitmap that comes to mind is to
determine the minimum page-size that can be used for a domain. In the
SMMUv3 case when there are IOMMUs with different minium page-sizes in one
system, this would be the biggest minimum page-size of all IOMMUs.

Probably there are other uses for that bitmap that show up at
implementation time.


	Joerg
Will Deacon June 2, 2015, 9:47 a.m. UTC | #4
On Tue, Jun 02, 2015 at 08:39:56AM +0100, Joerg Roedel wrote:
> On Mon, Jun 01, 2015 at 10:40:14AM +0100, Will Deacon wrote:
> > I like this proposal. The only remaining part is that the pgsize bitmap
> > inherited by the domain can only truly be finalised once the page table
> > is allocated, which in turn can only happen once we've identified the
> > IOMMU instance.
> 
> We know the IOMMU instance for domains allocted with
> iommu_domain_alloc_for_group because every group is behind a single
> IOMMU instance. So the page-table can be allocated at domain creation
> time and mappings can be created before the group itself is attached.

Indeed -- I think this form of allocation makes a lot of sense.

> > In the case of iommu_domain_alloc_for_group, that's nice and
> > straightforward (i.e. as part of the call) but for the default
> > iommu_domain_alloc() case, we'd have to continue postponing things to
> > ->attach time before we could provide a reliable pgsize_bitmap. This is
> > to handle the case where an SMMU may be able to support only a subset of
> > the pgsize_bitmap on a per-domain basis.
> 
> I don't think we need to postpone anything. Domains returned by
> iommu_domain_alloc() need to work for all groups. If there are multiple
> IOMMUs in the system with different capabilities/page-table formats the
> IOMMU core needs to build multiple page-tables for that domain.

I really don't think that's feasible. We support an awful lot of
combinations that affect the page table format on ARM: there are 5
different formats, each supporting different translation regimes (sets
of page size) and each of those needs configuring for input/output
address sizes to compute the number of levels. You could easily end up
with over 20 page tables and their corresponding sets of control
register values.

Given that we only install one page table in the end, I'm struggling to
see the issue with postponing its allocation until we've figured out
the IOMMU instance thanks to a device attach.

> VFIO already does that as a workaround of how things are done currently,
> but I really think this should be done in the IOMMU core code.

Postponing page table creation to ->attach works fine with VFIO today.
AFAICT, it never tries to map pages for an empty domain and therefore
the IOMMU instance is always known by the IOMMU driver at ->map time.

> > In which situation do you think the merged pgsize_bitmap would get used?
> > The only place I can think of is if we're trying to call iommu_map on a
> > domain with no devices attached. However, if that domain was created
> > using iommu_domain_alloc() then the pgsize_bitmap is the least of our
> > worries -- we don't even know the page table format!
> 
> The first usecase for the merged pgsize_bitmap that comes to mind is to
> determine the minimum page-size that can be used for a domain. In the
> SMMUv3 case when there are IOMMUs with different minium page-sizes in one
> system, this would be the biggest minimum page-size of all IOMMUs.

Understood, but I'm trying to understand why you'd want to know that for
an empty (no devices attached) domain. I don't think we currently have
any use-cases for that and it's really not a practical thing to support.

Will
Joerg Roedel June 2, 2015, 6:43 p.m. UTC | #5
On Tue, Jun 02, 2015 at 10:47:46AM +0100, Will Deacon wrote:
> On Tue, Jun 02, 2015 at 08:39:56AM +0100, Joerg Roedel wrote:
> > I don't think we need to postpone anything. Domains returned by
> > iommu_domain_alloc() need to work for all groups. If there are multiple
> > IOMMUs in the system with different capabilities/page-table formats the
> > IOMMU core needs to build multiple page-tables for that domain.
> 
> I really don't think that's feasible. We support an awful lot of
> combinations that affect the page table format on ARM: there are 5
> different formats, each supporting different translation regimes (sets
> of page size) and each of those needs configuring for input/output
> address sizes to compute the number of levels. You could easily end up
> with over 20 page tables and their corresponding sets of control
> register values.

We only need to build page-tables for IOMMUs that are actually in the
system. And this also only for domains that are not tied to a particular
group. If some hardware vendor is crazy enough to put 20 IOMMUs in the
system with each having its own page-table format, then be it so. But in
reality I think we will have only a handful of different page-tables.

> Given that we only install one page table in the end, I'm struggling to
> see the issue with postponing its allocation until we've figured out
> the IOMMU instance thanks to a device attach.

It is a question of the use-case for the domain. A device driver (like
for USB or graphics) would allocate the domain with the _for_group
function and end up with only one page-table.

But if VFIO comes around and wants to attach devices to a KVM guest it
would use iommu_domain_alloc() and has then all page-tables it possibly
needs (for devices attached at start and even devices that might later
be hotplugged into the guest).


	Joerg
diff mbox

Patch

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index e43d489..13e94ae 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3418,8 +3418,8 @@  static const struct iommu_ops amd_iommu_ops = {
  	.domain_free  = amd_iommu_domain_free,
  	.attach_dev = amd_iommu_attach_device,
  	.detach_dev = amd_iommu_detach_device,
-	.map = amd_iommu_map,
-	.unmap = amd_iommu_unmap,
+	.map_page = amd_iommu_map,
+	.unmap_page = amd_iommu_unmap,
  	.map_sg = default_iommu_map_sg,
  	.iova_to_phys = amd_iommu_iova_to_phys,
  	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 66a803b..427ae40 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1447,8 +1447,8 @@  static struct iommu_ops arm_smmu_ops = {
  	.domain_free		= arm_smmu_domain_free,
  	.attach_dev		= arm_smmu_attach_dev,
  	.detach_dev		= arm_smmu_detach_dev,
-	.map			= arm_smmu_map,
-	.unmap			= arm_smmu_unmap,
+	.map_page		= arm_smmu_map,
+	.unmap_page		= arm_smmu_unmap,
  	.map_sg			= default_iommu_map_sg,
  	.iova_to_phys		= arm_smmu_iova_to_phys,
  	.add_device		= arm_smmu_add_device,
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 3e89850..9edf14a 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -1182,8 +1182,8 @@  static const struct iommu_ops exynos_iommu_ops = {
  	.domain_free = exynos_iommu_domain_free,
  	.attach_dev = exynos_iommu_attach_device,
  	.detach_dev = exynos_iommu_detach_device,
-	.map = exynos_iommu_map,
-	.unmap = exynos_iommu_unmap,
+	.map_page = exynos_iommu_map,
+	.unmap_page = exynos_iommu_unmap,
  	.map_sg = default_iommu_map_sg,
  	.iova_to_phys = exynos_iommu_iova_to_phys,
  	.add_device = exynos_iommu_add_device,
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 68d43be..67de73d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4597,8 +4597,8 @@  static const struct iommu_ops intel_iommu_ops = {
  	.domain_free	= intel_iommu_domain_free,
  	.attach_dev	= intel_iommu_attach_device,
  	.detach_dev	= intel_iommu_detach_device,
-	.map		= intel_iommu_map,
-	.unmap		= intel_iommu_unmap,
+	.map_page	= intel_iommu_map,
+	.unmap_page	= intel_iommu_unmap,
  	.map_sg		= default_iommu_map_sg,
  	.iova_to_phys	= intel_iommu_iova_to_phys,
  	.add_device	= intel_iommu_add_device,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d4f527e..7312623 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1033,7 +1033,10 @@  int iommu_map(struct iommu_domain *domain, 
unsigned long iova,
  	size_t orig_size = size;
  	int ret = 0;

-	if (unlikely(domain->ops->map == NULL ||
+	if (domain->ops->map)
+		return domain->ops->map(domain, iova, paddr, size, prot);
+
+	if (unlikely(domain->ops->map_page == NULL ||
  		     domain->ops->pgsize_bitmap == 0UL))
  		return -ENODEV;

@@ -1062,7 +1065,7 @@  int iommu_map(struct iommu_domain *domain, 
unsigned long iova,
  		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
  			 iova, &paddr, pgsize);

-		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		ret = domain->ops->map_page(domain, iova, paddr, pgsize, prot);
  		if (ret)
  			break;

@@ -1087,7 +1090,10 @@  size_t iommu_unmap(struct iommu_domain *domain, 
unsigned long iova, size_t size)
  	unsigned int min_pagesz;
  	unsigned long orig_iova = iova;

-	if (unlikely(domain->ops->unmap == NULL ||
+	if (domain->ops->unmap)
+		return domain->ops->unmap(domain, iova, size);
+
+	if (unlikely(domain->ops->unmap_page == NULL ||
  		     domain->ops->pgsize_bitmap == 0UL))
  		return -ENODEV;

@@ -1117,7 +1123,7 @@  size_t iommu_unmap(struct iommu_domain *domain, 
unsigned long iova, size_t size)
  	while (unmapped < size) {
  		size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);

-		unmapped_page = domain->ops->unmap(domain, iova, pgsize);
+		unmapped_page = domain->ops->unmap_page(domain, iova, pgsize);
  		if (!unmapped_page)
  			break;

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 1a67c53..1db3486 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -746,8 +746,8 @@  static const struct iommu_ops ipmmu_ops = {
  	.domain_free = ipmmu_domain_free,
  	.attach_dev = ipmmu_attach_device,
  	.detach_dev = ipmmu_detach_device,
-	.map = ipmmu_map,
-	.unmap = ipmmu_unmap,
+	.map_page = ipmmu_map,
+	.unmap_page = ipmmu_unmap,
  	.map_sg = default_iommu_map_sg,
  	.iova_to_phys = ipmmu_iova_to_phys,
  	.add_device = ipmmu_add_device,
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 15a2063..ea40c86 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -677,8 +677,8 @@  static const struct iommu_ops msm_iommu_ops = {
  	.domain_free = msm_iommu_domain_free,
  	.attach_dev = msm_iommu_attach_dev,
  	.detach_dev = msm_iommu_detach_dev,
-	.map = msm_iommu_map,
-	.unmap = msm_iommu_unmap,
+	.map_page = msm_iommu_map,
+	.unmap_page = msm_iommu_unmap,
  	.map_sg = default_iommu_map_sg,
  	.iova_to_phys = msm_iommu_iova_to_phys,
  	.pgsize_bitmap = MSM_IOMMU_PGSIZES,
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index a22c33d..784d29c 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1371,8 +1371,8 @@  static const struct iommu_ops omap_iommu_ops = {
  	.domain_free	= omap_iommu_domain_free,
  	.attach_dev	= omap_iommu_attach_dev,
  	.detach_dev	= omap_iommu_detach_dev,
-	.map		= omap_iommu_map,
-	.unmap		= omap_iommu_unmap,
+	.map_page	= omap_iommu_map,
+	.unmap_page	= omap_iommu_unmap,
  	.map_sg		= default_iommu_map_sg,
  	.iova_to_phys	= omap_iommu_iova_to_phys,
  	.add_device	= omap_iommu_add_device,
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index cab2145..b10d5b2 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -964,8 +964,8 @@  static const struct iommu_ops rk_iommu_ops = {
  	.domain_free = rk_iommu_domain_free,
  	.attach_dev = rk_iommu_attach_device,
  	.detach_dev = rk_iommu_detach_device,
-	.map = rk_iommu_map,
-	.unmap = rk_iommu_unmap,
+	.map_page = rk_iommu_map,
+	.unmap_page = rk_iommu_unmap,
  	.add_device = rk_iommu_add_device,
  	.remove_device = rk_iommu_remove_device,
  	.iova_to_phys = rk_iommu_iova_to_phys,
diff --git a/drivers/iommu/shmobile-iommu.c b/drivers/iommu/shmobile-iommu.c
index a028751..f587313 100644
--- a/drivers/iommu/shmobile-iommu.c
+++ b/drivers/iommu/shmobile-iommu.c
@@ -366,8 +366,8 @@  static const struct iommu_ops shmobile_iommu_ops = {
  	.domain_free = shmobile_iommu_domain_free,
  	.attach_dev = shmobile_iommu_attach_device,
  	.detach_dev = shmobile_iommu_detach_device,
-	.map = shmobile_iommu_map,
-	.unmap = shmobile_iommu_unmap,
+	.map_page = shmobile_iommu_map,
+	.unmap_page = shmobile_iommu_unmap,
  	.map_sg = default_iommu_map_sg,
  	.iova_to_phys = shmobile_iommu_iova_to_phys,
  	.add_device = shmobile_iommu_add_device,
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 37e708f..8e4d6f1 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -340,9 +340,9 @@  static const struct iommu_ops gart_iommu_ops = {
  	.domain_free	= gart_iommu_domain_free,
  	.attach_dev	= gart_iommu_attach_dev,
  	.detach_dev	= gart_iommu_detach_dev,
-	.map		= gart_iommu_map,
+	.map_page	= gart_iommu_map,
  	.map_sg		= default_iommu_map_sg,
-	.unmap		= gart_iommu_unmap,
+	.unmap_page	= gart_iommu_unmap,
  	.iova_to_phys	= gart_iommu_iova_to_phys,
  	.pgsize_bitmap	= GART_IOMMU_PGSIZES,
  };
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index c845d99..38915fc 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -650,8 +650,8 @@  static const struct iommu_ops tegra_smmu_ops = {
  	.detach_dev = tegra_smmu_detach_dev,
  	.add_device = tegra_smmu_add_device,
  	.remove_device = tegra_smmu_remove_device,
-	.map = tegra_smmu_map,
-	.unmap = tegra_smmu_unmap,
+	.map_page = tegra_smmu_map,
+	.unmap_page = tegra_smmu_unmap,
  	.map_sg = default_iommu_map_sg,
  	.iova_to_phys = tegra_smmu_iova_to_phys,

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0546b87..5857bd6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,6 +124,8 @@  enum iommu_attr {
   * @detach_dev: detach device from an iommu domain
   * @map: map a physically contiguous memory region to an iommu domain
   * @unmap: unmap a physically contiguous memory region from an iommu 
domain
+ * @map_page: map a physical region corresponding to a single iommu page
+ * @unmap_page: unmap a physical region corresponding to a single iommu 
page
   * @map_sg: map a scatter-gather list of physically contiguous memory