Message ID | 20220321165049.35985-5-sven@svenpeter.dev (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Apple M1 (Pro/Max) NVMe driver | expand |
On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote: > > The NVMe co-processor on the Apple M1 uses a DMA address filter called > SART for some DMA transactions. This adds a simple driver used to > configure the memory regions from which DMA transactions are allowed. > > Co-developed-by: Hector Martin <marcan@marcan.st> > Signed-off-by: Hector Martin <marcan@marcan.st> > Signed-off-by: Sven Peter <sven@svenpeter.dev> Can you add some explanation about why this uses a custom interface instead of hooking into the dma_map_ops? > +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags, > + phys_addr_t *paddr, size_t *size) > +{ > + u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index)); > + u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index)); Why do you use the _relaxed() accessors here and elsewhere in the driver? > +struct apple_sart *apple_sart_get(struct device *dev) > +{ > + struct device_node *sart_node; > + struct platform_device *sart_pdev; > + struct apple_sart *sart; > + > + sart_node = of_parse_phandle(dev->of_node, "apple,sart", 0); > + if (!sart_node) > + return ERR_PTR(ENODEV); The error pointers need to take negative values, like 'ERR_PTR(-ENODEV)', here and everywhere else in the driver. Arnd
> +/* > + * Adds the region [paddr, paddr+size] to the DMA allow list. > + * > + * @sart: SART reference > + * @paddr: Start address of the region to be used for DMA > + * @size: Size of the region to be used for DMA. > + */ > +int apple_sart_add_allowed_region(struct apple_sart *sart, phys_addr_t paddr, > + size_t size); > + > +/* > + * Removes the region [paddr, paddr+size] from the DMA allow list. > + * > + * Note that exact same paddr and size used for apple_sart_add_allowed_region > + * have to be passed. > + * > + * @sart: SART reference > + * @paddr: Start address of the region no longer used for DMA > + * @size: Size of the region no longer used for DMA. > + */ > +int apple_sart_remove_allowed_region(struct apple_sart *sart, phys_addr_t paddr, > + size_t size); Might it be simpler for add_allowed_region to return a handle that remove_allowed_region takes instead of paddr+size? Then remove_allowed_region doesn't have to walk anything, and the requirement to use the same paddr/size is implicit.
Hi, Thanks for the review! On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote: > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote: >> >> The NVMe co-processor on the Apple M1 uses a DMA address filter called >> SART for some DMA transactions. This adds a simple driver used to >> configure the memory regions from which DMA transactions are allowed. >> >> Co-developed-by: Hector Martin <marcan@marcan.st> >> Signed-off-by: Hector Martin <marcan@marcan.st> >> Signed-off-by: Sven Peter <sven@svenpeter.dev> > > Can you add some explanation about why this uses a custom interface > instead of hooking into the dma_map_ops? Sure. In a perfect world this would just be an IOMMU implementation but since SART can't create any real IOVA space using pagetables it doesn't fit inside that subsytem. In a slightly less perfect world I could just implement dma_map_ops here but that won't work either because not all DMA buffers of the NVMe device have to go through SART and those allocations happen inside the same device and would use the same dma_map_ops. The NVMe controller has two separate DMA filters: - NVMMU, which must be set up for any command that uses PRPs and ensures that the DMA transactions only touch the pages listed inside the PRP structure. NVMMU itself is tightly coupled to the NVMe controller: The list of allowed pages is configured based on command's tag id and even commands that require no DMA transactions must be listed inside NVMMU before they are started. - SART, which must be set up for some shared memory buffers (e.g. log messages from the NVMe firmware) and for some NVMe debug commands that don't use PRPs. SART is only loosely coupled to the NVMe controller and could also be used together with other devices. It's also the only thing that changed between M1 and M1 Pro/Max/Ultra and that's why I decided to separate it from the NVMe driver. I'll add this explanation to the commit message. > >> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags, >> + phys_addr_t *paddr, size_t *size) >> +{ >> + u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index)); >> + u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index)); > > Why do you use the _relaxed() accessors here and elsewhere in the driver? This device itself doesn't do any DMA transactions so it needs no memory synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write from/to these buffers (multiple times) and they have the required barriers in place whenever they are used. These buffers so far are only allocated at probe time though so even using the normal writel/readl here won't hurt performance at all. I can just use those if you prefer or alternatively add a comment why _relaxed is fine here. This is a bit similar to the discussion for the pinctrl series last year [1]. > >> +struct apple_sart *apple_sart_get(struct device *dev) >> +{ >> + struct device_node *sart_node; >> + struct platform_device *sart_pdev; >> + struct apple_sart *sart; >> + >> + sart_node = of_parse_phandle(dev->of_node, "apple,sart", 0); >> + if (!sart_node) >> + return ERR_PTR(ENODEV); > > The error pointers need to take negative values, like 'ERR_PTR(-ENODEV)', > here and everywhere else in the driver. Ouch, that's my second bug of that kind in the past days. I'll fix it here and check the other patches in this series as well. Thanks, Sven [1] https://lore.kernel.org/lkml/87sfz8zdzb.wl-maz@kernel.org/
On Mon, Mar 21, 2022, at 18:17, Alyssa Rosenzweig wrote: >> +/* >> + * Adds the region [paddr, paddr+size] to the DMA allow list. >> + * >> + * @sart: SART reference >> + * @paddr: Start address of the region to be used for DMA >> + * @size: Size of the region to be used for DMA. >> + */ >> +int apple_sart_add_allowed_region(struct apple_sart *sart, phys_addr_t paddr, >> + size_t size); >> + >> +/* >> + * Removes the region [paddr, paddr+size] from the DMA allow list. >> + * >> + * Note that exact same paddr and size used for apple_sart_add_allowed_region >> + * have to be passed. >> + * >> + * @sart: SART reference >> + * @paddr: Start address of the region no longer used for DMA >> + * @size: Size of the region no longer used for DMA. >> + */ >> +int apple_sart_remove_allowed_region(struct apple_sart *sart, phys_addr_t paddr, >> + size_t size); > > Might it be simpler for add_allowed_region to return a handle that > remove_allowed_region takes instead of paddr+size? Then > remove_allowed_region doesn't have to walk anything, and the requirement > to use the same paddr/size is implicit. I liked that idea and just prototyped it and them remembered why I have the API like it is: In a perfect world this would be an IOMMU and just implement the IOMMU device ops. In a slightly less perfect world this would implement dma_map_ops and transparently allow using the normal DMA API inside the NVMe driver. In Apple's world the NVMe driver needs two separate IOMMU-like filters and you can't have two separate dma_map_ops without ugly hacks. This will usually be used just next to dma_map_coherent and dma_free_coherent and for those you need to keep track of phys_addr, dma_addr and size anyway, e.g. (with the error handling removed here) bfr->buffer = dma_alloc_coherent(anv->dev, size, &iova, GFP_KERNEL); ret = apple_sart_add_allowed_region(anv->sart, iova, size); and later then apple_sart_remove_allowed_region(anv->sart, bfr->iova, bfr->size); dma_free_coherent(anv->dev, bfr->size, bfr->buffer, bfr->iova); Best, Sven
On Sat, Apr 2, 2022 at 2:38 PM Sven Peter <sven@svenpeter.dev> wrote: > On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote: > > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote: > >> The NVMe co-processor on the Apple M1 uses a DMA address filter called > >> SART for some DMA transactions. This adds a simple driver used to > >> configure the memory regions from which DMA transactions are allowed. > >> > >> Co-developed-by: Hector Martin <marcan@marcan.st> > >> Signed-off-by: Hector Martin <marcan@marcan.st> > >> Signed-off-by: Sven Peter <sven@svenpeter.dev> > > > > Can you add some explanation about why this uses a custom interface > > instead of hooking into the dma_map_ops? > > Sure. > In a perfect world this would just be an IOMMU implementation but since > SART can't create any real IOVA space using pagetables it doesn't fit > inside that subsytem. > > In a slightly less perfect world I could just implement dma_map_ops here > but that won't work either because not all DMA buffers of the NVMe > device have to go through SART and those allocations happen > inside the same device and would use the same dma_map_ops. > > The NVMe controller has two separate DMA filters: > > - NVMMU, which must be set up for any command that uses PRPs and > ensures that the DMA transactions only touch the pages listed > inside the PRP structure. NVMMU itself is tightly coupled > to the NVMe controller: The list of allowed pages is configured > based on command's tag id and even commands that require no DMA > transactions must be listed inside NVMMU before they are started. > - SART, which must be set up for some shared memory buffers (e.g. > log messages from the NVMe firmware) and for some NVMe debug > commands that don't use PRPs. > SART is only loosely coupled to the NVMe controller and could > also be used together with other devices. It's also the only > thing that changed between M1 and M1 Pro/Max/Ultra and that's > why I decided to separate it from the NVMe driver. > > I'll add this explanation to the commit message. Ok, thanks. > >> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags, > >> + phys_addr_t *paddr, size_t *size) > >> +{ > >> + u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index)); > >> + u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index)); > > > > Why do you use the _relaxed() accessors here and elsewhere in the driver? > > This device itself doesn't do any DMA transactions so it needs no memory > synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write > from/to these buffers (multiple times) and they have the required barriers > in place whenever they are used. > > These buffers so far are only allocated at probe time though so even using > the normal writel/readl here won't hurt performance at all. I can just use > those if you prefer or alternatively add a comment why _relaxed is fine here. > > This is a bit similar to the discussion for the pinctrl series last year [1]. I think it's better to only use the _relaxed version where it actually helps, with a comment about it, and use the normal version elsewhere, in particular in functions that you have copied from the normal nvme driver. I had tried to compare some of your code with the other version and was rather confused by that. Arnd
On Sat, Apr 02, 2022 at 09:07:17PM +0200, Arnd Bergmann wrote: > On Sat, Apr 2, 2022 at 2:38 PM Sven Peter <sven@svenpeter.dev> wrote: > > On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote: > > > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote: > > >> The NVMe co-processor on the Apple M1 uses a DMA address filter called > > >> SART for some DMA transactions. This adds a simple driver used to > > >> configure the memory regions from which DMA transactions are allowed. > > >> > > >> Co-developed-by: Hector Martin <marcan@marcan.st> > > >> Signed-off-by: Hector Martin <marcan@marcan.st> > > >> Signed-off-by: Sven Peter <sven@svenpeter.dev> > > > > > > Can you add some explanation about why this uses a custom interface > > > instead of hooking into the dma_map_ops? > > > > Sure. > > In a perfect world this would just be an IOMMU implementation but since > > SART can't create any real IOVA space using pagetables it doesn't fit > > inside that subsytem. > > > > In a slightly less perfect world I could just implement dma_map_ops here > > but that won't work either because not all DMA buffers of the NVMe > > device have to go through SART and those allocations happen > > inside the same device and would use the same dma_map_ops. > > > > The NVMe controller has two separate DMA filters: > > > > - NVMMU, which must be set up for any command that uses PRPs and > > ensures that the DMA transactions only touch the pages listed > > inside the PRP structure. NVMMU itself is tightly coupled > > to the NVMe controller: The list of allowed pages is configured > > based on command's tag id and even commands that require no DMA > > transactions must be listed inside NVMMU before they are started. > > - SART, which must be set up for some shared memory buffers (e.g. > > log messages from the NVMe firmware) and for some NVMe debug > > commands that don't use PRPs. > > SART is only loosely coupled to the NVMe controller and could > > also be used together with other devices. It's also the only > > thing that changed between M1 and M1 Pro/Max/Ultra and that's > > why I decided to separate it from the NVMe driver. > > > > I'll add this explanation to the commit message. > > Ok, thanks. > > > >> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags, > > >> + phys_addr_t *paddr, size_t *size) > > >> +{ > > >> + u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index)); > > >> + u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index)); > > > > > > Why do you use the _relaxed() accessors here and elsewhere in the driver? > > > > This device itself doesn't do any DMA transactions so it needs no memory > > synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write > > from/to these buffers (multiple times) and they have the required barriers > > in place whenever they are used. > > > > These buffers so far are only allocated at probe time though so even using > > the normal writel/readl here won't hurt performance at all. I can just use > > those if you prefer or alternatively add a comment why _relaxed is fine here. > > > > This is a bit similar to the discussion for the pinctrl series last year [1]. > > I think it's better to only use the _relaxed version where it actually helps, > with a comment about it, and use the normal version elsewhere, in > particular in functions that you have copied from the normal nvme driver. > I had tried to compare some of your code with the other version and > was rather confused by that. Oh good, I tell folks the opposite (and others do too). We don't accept random explicit barriers without explanation, but implicit ones are okay? The resulting code on arm32 is also pretty horrible with the L2x0 and OMAP sync hooks not that that matters here. I don't really care too much which way we go, but we should document one rule and follow that. Rob
On 04/04/2022 23.58, Rob Herring wrote: > On Sat, Apr 02, 2022 at 09:07:17PM +0200, Arnd Bergmann wrote: >> On Sat, Apr 2, 2022 at 2:38 PM Sven Peter <sven@svenpeter.dev> wrote: >>> On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote: >>>> On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote: >>>>> The NVMe co-processor on the Apple M1 uses a DMA address filter called >>>>> SART for some DMA transactions. This adds a simple driver used to >>>>> configure the memory regions from which DMA transactions are allowed. >>>>> >>>>> Co-developed-by: Hector Martin <marcan@marcan.st> >>>>> Signed-off-by: Hector Martin <marcan@marcan.st> >>>>> Signed-off-by: Sven Peter <sven@svenpeter.dev> >>>> >>>> Can you add some explanation about why this uses a custom interface >>>> instead of hooking into the dma_map_ops? >>> >>> Sure. >>> In a perfect world this would just be an IOMMU implementation but since >>> SART can't create any real IOVA space using pagetables it doesn't fit >>> inside that subsytem. >>> >>> In a slightly less perfect world I could just implement dma_map_ops here >>> but that won't work either because not all DMA buffers of the NVMe >>> device have to go through SART and those allocations happen >>> inside the same device and would use the same dma_map_ops. >>> >>> The NVMe controller has two separate DMA filters: >>> >>> - NVMMU, which must be set up for any command that uses PRPs and >>> ensures that the DMA transactions only touch the pages listed >>> inside the PRP structure. NVMMU itself is tightly coupled >>> to the NVMe controller: The list of allowed pages is configured >>> based on command's tag id and even commands that require no DMA >>> transactions must be listed inside NVMMU before they are started. >>> - SART, which must be set up for some shared memory buffers (e.g. >>> log messages from the NVMe firmware) and for some NVMe debug >>> commands that don't use PRPs. >>> SART is only loosely coupled to the NVMe controller and could >>> also be used together with other devices. It's also the only >>> thing that changed between M1 and M1 Pro/Max/Ultra and that's >>> why I decided to separate it from the NVMe driver. >>> >>> I'll add this explanation to the commit message. >> >> Ok, thanks. >> >>>>> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags, >>>>> + phys_addr_t *paddr, size_t *size) >>>>> +{ >>>>> + u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index)); >>>>> + u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index)); >>>> >>>> Why do you use the _relaxed() accessors here and elsewhere in the driver? >>> >>> This device itself doesn't do any DMA transactions so it needs no memory >>> synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write >>> from/to these buffers (multiple times) and they have the required barriers >>> in place whenever they are used. >>> >>> These buffers so far are only allocated at probe time though so even using >>> the normal writel/readl here won't hurt performance at all. I can just use >>> those if you prefer or alternatively add a comment why _relaxed is fine here. >>> >>> This is a bit similar to the discussion for the pinctrl series last year [1]. >> >> I think it's better to only use the _relaxed version where it actually helps, >> with a comment about it, and use the normal version elsewhere, in >> particular in functions that you have copied from the normal nvme driver. >> I had tried to compare some of your code with the other version and >> was rather confused by that. > > Oh good, I tell folks the opposite (and others do too). We don't accept > random explicit barriers without explanation, but implicit ones are > okay? The resulting code on arm32 is also pretty horrible with the L2x0 > and OMAP sync hooks not that that matters here. > > I don't really care too much which way we go, but we should document one > rule and follow that. I'm sure maz@ has an opinion here too :-) (3... 2... 1... fight!)
On Mon, Apr 4, 2022, at 16:58, Rob Herring wrote: > On Sat, Apr 02, 2022 at 09:07:17PM +0200, Arnd Bergmann wrote: >> On Sat, Apr 2, 2022 at 2:38 PM Sven Peter <sven@svenpeter.dev> wrote: >> > On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote: >> > > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote: >> > >> The NVMe co-processor on the Apple M1 uses a DMA address filter called >> > >> SART for some DMA transactions. This adds a simple driver used to >> > >> configure the memory regions from which DMA transactions are allowed. >> > >> >> > >> Co-developed-by: Hector Martin <marcan@marcan.st> >> > >> Signed-off-by: Hector Martin <marcan@marcan.st> >> > >> Signed-off-by: Sven Peter <sven@svenpeter.dev> >> > > >> > > Can you add some explanation about why this uses a custom interface >> > > instead of hooking into the dma_map_ops? >> > >> > Sure. >> > In a perfect world this would just be an IOMMU implementation but since >> > SART can't create any real IOVA space using pagetables it doesn't fit >> > inside that subsytem. >> > >> > In a slightly less perfect world I could just implement dma_map_ops here >> > but that won't work either because not all DMA buffers of the NVMe >> > device have to go through SART and those allocations happen >> > inside the same device and would use the same dma_map_ops. >> > >> > The NVMe controller has two separate DMA filters: >> > >> > - NVMMU, which must be set up for any command that uses PRPs and >> > ensures that the DMA transactions only touch the pages listed >> > inside the PRP structure. NVMMU itself is tightly coupled >> > to the NVMe controller: The list of allowed pages is configured >> > based on command's tag id and even commands that require no DMA >> > transactions must be listed inside NVMMU before they are started. >> > - SART, which must be set up for some shared memory buffers (e.g. >> > log messages from the NVMe firmware) and for some NVMe debug >> > commands that don't use PRPs. >> > SART is only loosely coupled to the NVMe controller and could >> > also be used together with other devices. It's also the only >> > thing that changed between M1 and M1 Pro/Max/Ultra and that's >> > why I decided to separate it from the NVMe driver. >> > >> > I'll add this explanation to the commit message. >> >> Ok, thanks. >> >> > >> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags, >> > >> + phys_addr_t *paddr, size_t *size) >> > >> +{ >> > >> + u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index)); >> > >> + u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index)); >> > > >> > > Why do you use the _relaxed() accessors here and elsewhere in the driver? >> > >> > This device itself doesn't do any DMA transactions so it needs no memory >> > synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write >> > from/to these buffers (multiple times) and they have the required barriers >> > in place whenever they are used. >> > >> > These buffers so far are only allocated at probe time though so even using >> > the normal writel/readl here won't hurt performance at all. I can just use >> > those if you prefer or alternatively add a comment why _relaxed is fine here. >> > >> > This is a bit similar to the discussion for the pinctrl series last year [1]. >> >> I think it's better to only use the _relaxed version where it actually helps, >> with a comment about it, and use the normal version elsewhere, in >> particular in functions that you have copied from the normal nvme driver. >> I had tried to compare some of your code with the other version and >> was rather confused by that. > > Oh good, I tell folks the opposite (and others do too). We don't accept > random explicit barriers without explanation, but implicit ones are > okay? The resulting code on arm32 is also pretty horrible with the L2x0 > and OMAP sync hooks not that that matters here. > > I don't really care too much which way we go, but we should document one > rule and follow that. I don't have a strong opinion either. Arnd's approach is currently documented in Documentation/driver-api/device-io.rst fwiw: On architectures that require an expensive barrier for serializing against DMA, these "relaxed" versions of the MMIO accessors only serialize against each other, but contain a less expensive barrier operation. A device driver might use these in a particularly performance sensitive fast path, with a comment that explains why the usage in a specific location is safe without the extra barriers. Sven
diff --git a/MAINTAINERS b/MAINTAINERS index 027c3b4ad61c..3d37fe7a0408 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1787,6 +1787,7 @@ F: drivers/watchdog/apple_wdt.c F: include/dt-bindings/interrupt-controller/apple-aic.h F: include/dt-bindings/pinctrl/apple.h F: include/linux/apple-mailbox.h +F: include/linux/soc/apple/* ARM/ARTPEC MACHINE SUPPORT M: Jesper Nilsson <jesper.nilsson@axis.com> diff --git a/drivers/soc/apple/Kconfig b/drivers/soc/apple/Kconfig index 9b8de31d6a8f..8c37ffd53fbd 100644 --- a/drivers/soc/apple/Kconfig +++ b/drivers/soc/apple/Kconfig @@ -17,6 +17,17 @@ config APPLE_PMGR_PWRSTATE controls for SoC devices. This driver manages them through the generic power domain framework, and also provides reset support. +config APPLE_SART + tristate "Apple SART DMA address filter" + depends on ARCH_APPLE || COMPILE_TEST + default ARCH_APPLE + help + Apple SART is a simple DMA address filter used on Apple SoCs such + as the M1. It is usually required for the NVMe coprocessor which does + not use a proper IOMMU. + + Say 'y' here if you have an Apple SoC. + endmenu endif diff --git a/drivers/soc/apple/Makefile b/drivers/soc/apple/Makefile index c114e84667e4..c83c66317098 100644 --- a/drivers/soc/apple/Makefile +++ b/drivers/soc/apple/Makefile @@ -1,2 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only obj-$(CONFIG_APPLE_PMGR_PWRSTATE) += apple-pmgr-pwrstate.o + +obj-$(CONFIG_APPLE_SART) += apple-sart.o +apple-sart-y = sart.o diff --git a/drivers/soc/apple/sart.c b/drivers/soc/apple/sart.c new file mode 100644 index 000000000000..836ea68db2fc --- /dev/null +++ b/drivers/soc/apple/sart.c @@ -0,0 +1,318 @@ +// SPDX-License-Identifier: GPL-2.0-only OR MIT +/* + * Apple SART device driver + * Copyright (C) The Asahi Linux Contributors + * + * Apple SART is a simple address filter for some DMA transactions. + * Regions of physical memory must be added to the SART's allow + * list before any DMA can target these. Unlike a proper + * IOMMU no remapping can be done and special support in the + * consumer driver is required since not all DMA transactions of + * a single device are subject to SART filtering. + */ + +#include <linux/soc/apple/sart.h> +#include <linux/atomic.h> +#include <linux/bits.h> +#include <linux/bitfield.h> +#include <linux/io.h> +#include <linux/module.h> +#include <linux/of.h> +#include <linux/of_platform.h> +#include <linux/platform_device.h> +#include <linux/types.h> + +#define APPLE_SART_MAX_ENTRIES 16 + +/* This is probably a bitfield but the exact meaning of each bit is unknown. */ +#define APPLE_SART_FLAGS_ALLOW 0xff + +/* SARTv2 registers */ +#define APPLE_SART2_CONFIG(idx) (0x00 + 4 * (idx)) +#define APPLE_SART2_CONFIG_FLAGS GENMASK(31, 24) +#define APPLE_SART2_CONFIG_SIZE GENMASK(23, 0) +#define APPLE_SART2_CONFIG_SIZE_SHIFT 12 +#define APPLE_SART2_CONFIG_SIZE_MAX GENMASK(23, 0) + +#define APPLE_SART2_PADDR(idx) (0x40 + 4 * (idx)) +#define APPLE_SART2_PADDR_SHIFT 12 + +/* SARTv3 registers */ +#define APPLE_SART3_CONFIG(idx) (0x00 + 4 * (idx)) + +#define APPLE_SART3_PADDR(idx) (0x40 + 4 * (idx)) +#define APPLE_SART3_PADDR_SHIFT 12 + +#define APPLE_SART3_SIZE(idx) (0x80 + 4 * (idx)) +#define APPLE_SART3_SIZE_SHIFT 12 +#define APPLE_SART3_SIZE_MAX GENMASK(29, 0) + +struct apple_sart_ops { + void (*get_entry)(struct apple_sart *sart, int index, u8 *flags, + phys_addr_t *paddr, size_t *size); + int (*set_entry)(struct apple_sart *sart, int index, u8 flags, + phys_addr_t paddr, size_t size); +}; + +struct apple_sart { + struct device *dev; + void __iomem *regs; + + const struct apple_sart_ops *ops; + + unsigned long protected_entries; + unsigned long used_entries; +}; + +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags, + phys_addr_t *paddr, size_t *size) +{ + u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index)); + u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index)); + u32 size_ = FIELD_GET(APPLE_SART2_CONFIG_SIZE, cfg); + + *flags = FIELD_GET(APPLE_SART2_CONFIG_FLAGS, cfg); + *size = (size_t)size_ << APPLE_SART2_CONFIG_SIZE_SHIFT; + *paddr = (phys_addr_t)paddr_ << APPLE_SART2_PADDR_SHIFT; +} + +static int sart2_set_entry(struct apple_sart *sart, int index, u8 flags, + phys_addr_t paddr, size_t size) +{ + u32 cfg; + + if (size & ((1 << APPLE_SART2_CONFIG_SIZE_SHIFT) - 1)) + return -EINVAL; + if (paddr & ((1 << APPLE_SART2_PADDR_SHIFT) - 1)) + return -EINVAL; + + size >>= APPLE_SART2_CONFIG_SIZE_SHIFT; + paddr >>= APPLE_SART2_PADDR_SHIFT; + + if (size > APPLE_SART2_CONFIG_SIZE_MAX) + return -EINVAL; + + cfg = FIELD_PREP(APPLE_SART2_CONFIG_FLAGS, flags); + cfg |= FIELD_PREP(APPLE_SART2_CONFIG_SIZE, size); + + writel_relaxed(paddr, sart->regs + APPLE_SART2_PADDR(index)); + writel_relaxed(cfg, sart->regs + APPLE_SART2_CONFIG(index)); + + return 0; +} + +static struct apple_sart_ops sart_ops_v2 = { + .get_entry = sart2_get_entry, + .set_entry = sart2_set_entry, +}; + +static void sart3_get_entry(struct apple_sart *sart, int index, u8 *flags, + phys_addr_t *paddr, size_t *size) +{ + u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART3_PADDR(index)); + u32 size_ = readl_relaxed(sart->regs + APPLE_SART3_SIZE(index)); + + *flags = readl_relaxed(sart->regs + APPLE_SART3_CONFIG(index)); + *size = (size_t)size_ << APPLE_SART3_SIZE_SHIFT; + *paddr = (phys_addr_t)paddr_ << APPLE_SART3_PADDR_SHIFT; +} + +static int sart3_set_entry(struct apple_sart *sart, int index, u8 flags, + phys_addr_t paddr, size_t size) +{ + if (size & ((1 << APPLE_SART3_SIZE_SHIFT) - 1)) + return -EINVAL; + if (paddr & ((1 << APPLE_SART3_PADDR_SHIFT) - 1)) + return -EINVAL; + + paddr >>= APPLE_SART3_PADDR_SHIFT; + size >>= APPLE_SART3_SIZE_SHIFT; + + if (size > APPLE_SART3_SIZE_MAX) + return -EINVAL; + + writel_relaxed(paddr, sart->regs + APPLE_SART3_PADDR(index)); + writel_relaxed(size, sart->regs + APPLE_SART3_SIZE(index)); + writel_relaxed(flags, sart->regs + APPLE_SART3_CONFIG(index)); + + return 0; +} + +static struct apple_sart_ops sart_ops_v3 = { + .get_entry = sart3_get_entry, + .set_entry = sart3_set_entry, +}; + +static int apple_sart_probe(struct platform_device *pdev) +{ + int i; + struct apple_sart *sart; + struct device *dev = &pdev->dev; + + sart = devm_kzalloc(dev, sizeof(*sart), GFP_KERNEL); + if (!sart) + return -ENOMEM; + + sart->dev = dev; + sart->ops = of_device_get_match_data(dev); + + sart->regs = devm_platform_ioremap_resource(pdev, 0); + if (IS_ERR(sart->regs)) + return PTR_ERR(sart->regs); + + for (i = 0; i < APPLE_SART_MAX_ENTRIES; ++i) { + u8 flags; + size_t size; + phys_addr_t paddr; + + sart->ops->get_entry(sart, i, &flags, &paddr, &size); + + if (!flags) + continue; + + dev_dbg(sart->dev, + "SART bootloader entry: index %02d; flags: 0x%02x; paddr: %pa; size: 0x%zx\n", + i, flags, &paddr, size); + set_bit(i, &sart->protected_entries); + } + + platform_set_drvdata(pdev, sart); + return 0; +} + +struct apple_sart *apple_sart_get(struct device *dev) +{ + struct device_node *sart_node; + struct platform_device *sart_pdev; + struct apple_sart *sart; + + sart_node = of_parse_phandle(dev->of_node, "apple,sart", 0); + if (!sart_node) + return ERR_PTR(ENODEV); + + sart_pdev = of_find_device_by_node(sart_node); + of_node_put(sart_node); + + if (!sart_pdev) + return ERR_PTR(ENODEV); + + sart = dev_get_drvdata(&sart_pdev->dev); + if (!sart) + return ERR_PTR(EPROBE_DEFER); + + device_link_add(dev, &sart_pdev->dev, + DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER); + + return sart; +} +EXPORT_SYMBOL(apple_sart_get); + +int apple_sart_add_allowed_region(struct apple_sart *sart, phys_addr_t paddr, + size_t size) +{ + int i, ret; + + for (i = 0; i < APPLE_SART_MAX_ENTRIES; ++i) { + if (test_bit(i, &sart->protected_entries)) + continue; + if (test_and_set_bit(i, &sart->used_entries)) + continue; + + ret = sart->ops->set_entry(sart, i, APPLE_SART_FLAGS_ALLOW, + paddr, size); + if (ret) { + dev_dbg(sart->dev, + "unable to set entry %d to [%pa, 0x%zx]\n", + i, &paddr, size); + clear_bit(i, &sart->used_entries); + return ret; + } + + dev_dbg(sart->dev, "wrote [%pa, 0x%zx] to %d\n", &paddr, size, + i); + return 0; + } + + dev_warn(sart->dev, + "no free entries left to add [paddr: 0x%llx, size: 0x%zx]\n", + paddr, size); + + return -EBUSY; +} +EXPORT_SYMBOL(apple_sart_add_allowed_region); + +int apple_sart_remove_allowed_region(struct apple_sart *sart, phys_addr_t paddr, + size_t size) +{ + int i; + + dev_dbg(sart->dev, + "will remove [paddr: %pa, size: 0x%zx] from allowed regions\n", + &paddr, size); + + for (i = 0; i < APPLE_SART_MAX_ENTRIES; ++i) { + u8 eflags; + size_t esize; + phys_addr_t epaddr; + + if (test_bit(i, &sart->protected_entries)) + continue; + + sart->ops->get_entry(sart, i, &eflags, &epaddr, &esize); + + if (epaddr != paddr || esize != size) + continue; + + sart->ops->set_entry(sart, i, 0, 0, 0); + + clear_bit(i, &sart->used_entries); + dev_dbg(sart->dev, "cleared entry %d\n", i); + return 0; + } + + dev_warn(sart->dev, "entry [paddr: 0x%llx, size: 0x%zx] not found\n", + paddr, size); + + return -EINVAL; +} +EXPORT_SYMBOL(apple_sart_remove_allowed_region); + +static void apple_sart_shutdown(struct platform_device *pdev) +{ + struct apple_sart *sart = dev_get_drvdata(&pdev->dev); + int i; + + for (i = 0; i < APPLE_SART_MAX_ENTRIES; ++i) { + if (test_bit(i, &sart->protected_entries)) + continue; + + sart->ops->set_entry(sart, i, 0, 0, 0); + } +} + +static const struct of_device_id apple_sart_of_match[] = { + { + .compatible = "apple,t6000-sart", + .data = &sart_ops_v3, + }, + { + .compatible = "apple,t8103-sart", + .data = &sart_ops_v2, + }, + {} +}; +MODULE_DEVICE_TABLE(of, apple_sart_of_match); + +static struct platform_driver apple_sart_driver = { + .driver = { + .name = "apple-sart", + .of_match_table = apple_sart_of_match, + }, + .probe = apple_sart_probe, + .shutdown = apple_sart_shutdown, +}; +module_platform_driver(apple_sart_driver); + +MODULE_LICENSE("Dual MIT/GPL"); +MODULE_AUTHOR("Sven Peter <sven@svenpeter.dev>"); +MODULE_DESCRIPTION("Apple SART driver"); diff --git a/include/linux/soc/apple/sart.h b/include/linux/soc/apple/sart.h new file mode 100644 index 000000000000..4e3d4960be5a --- /dev/null +++ b/include/linux/soc/apple/sart.h @@ -0,0 +1,77 @@ +/* SPDX-License-Identifier: GPL-2.0-only OR MIT */ +/* + * Apple SART device driver + * Copyright (C) The Asahi Linux Contributors + * + * Apple SART is a simple address filter for DMA transactions. + * Regions of physical memory must be added to the SART's allow + * list before any DMA can target these. Unlike a proper + * IOMMU no remapping can be done. + */ + +#ifndef _LINUX_SOC_APPLE_SART_H_ +#define _LINUX_SOC_APPLE_SART_H_ + +#include <linux/device.h> +#include <linux/err.h> +#include <linux/types.h> + +struct apple_sart; + +#if IS_ENABLED(CONFIG_APPLE_SART) + +/* + * Get a reference to the SART attached to dev. + * + * Looks for the phandle reference in apple,sart and returns a pointer + * to the corresponding apple_sart struct to be used with + * apple_sart_add_allowed_region and apple_sart_remove_allowed_region. + */ +struct apple_sart *apple_sart_get(struct device *dev); + +/* + * Adds the region [paddr, paddr+size] to the DMA allow list. + * + * @sart: SART reference + * @paddr: Start address of the region to be used for DMA + * @size: Size of the region to be used for DMA. + */ +int apple_sart_add_allowed_region(struct apple_sart *sart, phys_addr_t paddr, + size_t size); + +/* + * Removes the region [paddr, paddr+size] from the DMA allow list. + * + * Note that exact same paddr and size used for apple_sart_add_allowed_region + * have to be passed. + * + * @sart: SART reference + * @paddr: Start address of the region no longer used for DMA + * @size: Size of the region no longer used for DMA. + */ +int apple_sart_remove_allowed_region(struct apple_sart *sart, phys_addr_t paddr, + size_t size); + +#else + +static inline struct apple_sart *apple_sart_get(struct device *dev) +{ + return ERR_PTR(-ENODEV); +} + +static inline int apple_sart_add_allowed_region(struct apple_sart *sart, + phys_addr_t paddr, size_t size) +{ + return -ENODEV; +} + +static inline int apple_sart_remove_allowed_region(struct apple_sart *sart, + phys_addr_t paddr, + size_t size) +{ + return -ENODEV; +} + +#endif /* IS_ENABLED(CONFIG_APPLE_SART) */ + +#endif /* _LINUX_SOC_APPLE_SART_H_ */