Message ID | 1716564705-9929-1-git-send-email-quic_mojha@quicinc.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] firmware: qcom_scm: Add a padded page to ensure DMA memory from lower 4GB | expand |
On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote: > For SCM protection, memory allocation should be physically contiguous, > 4K aligned, and non-cacheable to avoid XPU violations. This granularity > of protection applies from the secure world. Additionally, it's possible > that a 32-bit secure peripheral will access memory in SoCs like > sm8{4|5|6}50 for some remote processors. Therefore, memory allocation > needs to be done in the lower 4 GB range. To achieve this, Linux's CMA > pool can be used with dma_alloc APIs. > > However, dma_alloc APIs will fall back to the buddy pool if the requested > size is less than or equal to PAGE_SIZE. It's also possible that the remote > processor's metadata blob size is less than a PAGE_SIZE. Even though the > DMA APIs align the requested memory size to PAGE_SIZE, they can still fall > back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}` > is disabled. Does "fail" here mean that the buddy heap returns a failure - in some case where dma_alloc would have succeeded, or that it does give you a PAGE_SIZE allocation which doesn't meeting your requirements? From this I do find the behavior of dma_alloc unintuitive, do we know if there's a reason for the "equal to PAGE_SIZE" case you describe here? > > To address this issue, use an extra page as padding to ensure allocation > from the CMA region. Since this memory is temporary, it will be released > once the remote processor is up or in case of any failure. > Thanks for updating the commit message, this is good. Regards, Bjorn > Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> > --- > Changes in v2: > - Described the issue more clearly in commit text. > > drivers/firmware/qcom/qcom_scm.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c > index 520de9b5633a..0426972178a4 100644 > --- a/drivers/firmware/qcom/qcom_scm.c > +++ b/drivers/firmware/qcom/qcom_scm.c > @@ -538,6 +538,7 @@ static void qcom_scm_set_download_mode(bool enable) > int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, > struct qcom_scm_pas_metadata *ctx) > { > + size_t page_aligned_size; > dma_addr_t mdata_phys; > void *mdata_buf; > int ret; > @@ -555,7 +556,8 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, > * data blob, so make sure it's physically contiguous, 4K aligned and > * non-cachable to avoid XPU violations. > */ > - mdata_buf = dma_alloc_coherent(__scm->dev, size, &mdata_phys, > + page_aligned_size = PAGE_ALIGN(size + PAGE_SIZE); > + mdata_buf = dma_alloc_coherent(__scm->dev, page_aligned_size, &mdata_phys, > GFP_KERNEL); > if (!mdata_buf) { > dev_err(__scm->dev, "Allocation of metadata buffer failed.\n"); > @@ -580,11 +582,11 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, > > out: > if (ret < 0 || !ctx) { > - dma_free_coherent(__scm->dev, size, mdata_buf, mdata_phys); > + dma_free_coherent(__scm->dev, page_aligned_size, mdata_buf, mdata_phys); > } else if (ctx) { > ctx->ptr = mdata_buf; > ctx->phys = mdata_phys; > - ctx->size = size; > + ctx->size = page_aligned_size; > } > > return ret ? : res.result[0]; > -- > 2.7.4 >
On 5/27/2024 2:16 AM, Bjorn Andersson wrote: > On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote: >> For SCM protection, memory allocation should be physically contiguous, >> 4K aligned, and non-cacheable to avoid XPU violations. This granularity >> of protection applies from the secure world. Additionally, it's possible >> that a 32-bit secure peripheral will access memory in SoCs like >> sm8{4|5|6}50 for some remote processors. Therefore, memory allocation >> needs to be done in the lower 4 GB range. To achieve this, Linux's CMA >> pool can be used with dma_alloc APIs. >> >> However, dma_alloc APIs will fall back to the buddy pool if the requested >> size is less than or equal to PAGE_SIZE. It's also possible that the remote >> processor's metadata blob size is less than a PAGE_SIZE. Even though the >> DMA APIs align the requested memory size to PAGE_SIZE, they can still fall >> back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}` >> is disabled. > > Does "fail" here mean that the buddy heap returns a failure - in some > case where dma_alloc would have succeeded, or that it does give you > a PAGE_SIZE allocation which doesn't meeting your requirements? Yes, buddy will also try to allocate memory and may not get PAGE_SIZE memory in lower 4GB(for 32bit capable device) if CONFIG_ZONE_{DMA|DMA32} is disabled. However, DMA memory would have successful such case if padding is added to size to cross > PAGE_SIZE. > > From this I do find the behavior of dma_alloc unintuitive, do we know if > there's a reason for the "equal to PAGE_SIZE" case you describe here? I am not a memory expert but the reason i can think of could be, <= PAGE_SIZE can anyway possible to be requested outside DMA coherent api's with kmalloc and friends api and that could be the reason it is falling back to buddy pool in DMA api. -Mukesh
diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c index 520de9b5633a..0426972178a4 100644 --- a/drivers/firmware/qcom/qcom_scm.c +++ b/drivers/firmware/qcom/qcom_scm.c @@ -538,6 +538,7 @@ static void qcom_scm_set_download_mode(bool enable) int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, struct qcom_scm_pas_metadata *ctx) { + size_t page_aligned_size; dma_addr_t mdata_phys; void *mdata_buf; int ret; @@ -555,7 +556,8 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, * data blob, so make sure it's physically contiguous, 4K aligned and * non-cachable to avoid XPU violations. */ - mdata_buf = dma_alloc_coherent(__scm->dev, size, &mdata_phys, + page_aligned_size = PAGE_ALIGN(size + PAGE_SIZE); + mdata_buf = dma_alloc_coherent(__scm->dev, page_aligned_size, &mdata_phys, GFP_KERNEL); if (!mdata_buf) { dev_err(__scm->dev, "Allocation of metadata buffer failed.\n"); @@ -580,11 +582,11 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, out: if (ret < 0 || !ctx) { - dma_free_coherent(__scm->dev, size, mdata_buf, mdata_phys); + dma_free_coherent(__scm->dev, page_aligned_size, mdata_buf, mdata_phys); } else if (ctx) { ctx->ptr = mdata_buf; ctx->phys = mdata_phys; - ctx->size = size; + ctx->size = page_aligned_size; } return ret ? : res.result[0];
For SCM protection, memory allocation should be physically contiguous, 4K aligned, and non-cacheable to avoid XPU violations. This granularity of protection applies from the secure world. Additionally, it's possible that a 32-bit secure peripheral will access memory in SoCs like sm8{4|5|6}50 for some remote processors. Therefore, memory allocation needs to be done in the lower 4 GB range. To achieve this, Linux's CMA pool can be used with dma_alloc APIs. However, dma_alloc APIs will fall back to the buddy pool if the requested size is less than or equal to PAGE_SIZE. It's also possible that the remote processor's metadata blob size is less than a PAGE_SIZE. Even though the DMA APIs align the requested memory size to PAGE_SIZE, they can still fall back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}` is disabled. To address this issue, use an extra page as padding to ensure allocation from the CMA region. Since this memory is temporary, it will be released once the remote processor is up or in case of any failure. Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> --- Changes in v2: - Described the issue more clearly in commit text. drivers/firmware/qcom/qcom_scm.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)