Message ID | 1481795755-15302-3-git-send-email-absahu@codeaurora.org (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Thu, Dec 15, 2016 at 03:25:52PM +0530, Abhishek Sahu wrote: > The current DMA APIs only support SGL or data in generic format. > The QCA BAM DMA engine data cannot be mapped with already > available APIs due to following reasons. > > 1. The QCA BAM DMA engine uses custom flags which cannot be > mapped with generic DMA engine flags. > 2. Some peripheral driver like QCA QPIC NAND/LCD requires to > set specific flags (Like NWD, EOT) for some of the descriptors > in scatter gather list. The already available mapping APIs take > flags parameter in API itself and there is no support for > passing DMA specific flags for each SGL entry. > > Now this patch adds the support for making the DMA descriptor from > custom data with new DMA mapping function prep_dma_custom_mapping. > The peripheral driver will pass the custom data in this function and > DMA engine driver will form the descriptor according to its own > logic. In future, this API can be used by any other DMA engine > drivers also which are unable to do DMA mapping with already > available API’s. > > Signed-off-by: Abhishek Sahu <absahu@codeaurora.org> > --- > include/linux/dmaengine.h | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h > index cc535a4..6324c1f 100644 > --- a/include/linux/dmaengine.h > +++ b/include/linux/dmaengine.h > @@ -692,6 +692,8 @@ struct dma_filter { > * be called after period_len bytes have been transferred. > * @device_prep_interleaved_dma: Transfer expression in a generic way. > * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address > + * @device_prep_dma_custom_mapping: prepares a dma operation from dma driver > + * specific custom data > * @device_config: Pushes a new configuration to a channel, return 0 or an error > * code > * @device_pause: Pauses any transfer happening on a channel. Returns > @@ -783,6 +785,9 @@ struct dma_device { > struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)( > struct dma_chan *chan, dma_addr_t dst, u64 data, > unsigned long flags); > + struct dma_async_tx_descriptor *(*device_prep_dma_custom_mapping)( > + struct dma_chan *chan, void *data, > + unsigned long flags); This needs a discussion. Why do you need to add a new API for framework. What are NWD and EOT flags, cna you details out the flags?
On Sun, Dec 18, 2016 at 09:56:02PM +0530, Vinod Koul wrote: > On Thu, Dec 15, 2016 at 03:25:52PM +0530, Abhishek Sahu wrote: > > The current DMA APIs only support SGL or data in generic format. > > The QCA BAM DMA engine data cannot be mapped with already > > available APIs due to following reasons. > > > > 1. The QCA BAM DMA engine uses custom flags which cannot be > > mapped with generic DMA engine flags. > > 2. Some peripheral driver like QCA QPIC NAND/LCD requires to > > set specific flags (Like NWD, EOT) for some of the descriptors > > in scatter gather list. The already available mapping APIs take > > flags parameter in API itself and there is no support for > > passing DMA specific flags for each SGL entry. > > > > Now this patch adds the support for making the DMA descriptor from > > custom data with new DMA mapping function prep_dma_custom_mapping. > > The peripheral driver will pass the custom data in this function and > > DMA engine driver will form the descriptor according to its own > > logic. In future, this API can be used by any other DMA engine > > drivers also which are unable to do DMA mapping with already > > available API’s. > > > > Signed-off-by: Abhishek Sahu <absahu@codeaurora.org> > > --- > > include/linux/dmaengine.h | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h > > index cc535a4..6324c1f 100644 > > --- a/include/linux/dmaengine.h > > +++ b/include/linux/dmaengine.h > > @@ -692,6 +692,8 @@ struct dma_filter { > > * be called after period_len bytes have been transferred. > > * @device_prep_interleaved_dma: Transfer expression in a generic way. > > * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address > > + * @device_prep_dma_custom_mapping: prepares a dma operation from dma driver > > + * specific custom data > > * @device_config: Pushes a new configuration to a channel, return 0 or an error > > * code > > * @device_pause: Pauses any transfer happening on a channel. Returns > > @@ -783,6 +785,9 @@ struct dma_device { > > struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)( > > struct dma_chan *chan, dma_addr_t dst, u64 data, > > unsigned long flags); > > + struct dma_async_tx_descriptor *(*device_prep_dma_custom_mapping)( > > + struct dma_chan *chan, void *data, > > + unsigned long flags); > > This needs a discussion. Why do you need to add a new API for framework. > > What are NWD and EOT flags, cna you details out the flags? These are the notify when done and end of transaction flags. I believe the last time we talked about this, we (Vinod and I) agreed to just expose a QCOM only interface to set the special transaction flags. You'd then have to have some other API to fixup the descriptor with the right qcom flags. Ahbishek, correct me where i am wrong on the following: So two main differences between a normal descriptor and a command descriptor: 1) size of the descriptor 2) the flag setting 3) data sent in is a modified scatter gather that includes flags , vs a normal scatter gather So the CMD descriptors in a given sgl can all have varying flags set? I'd assume they all have CMD flag set. Do the current users of the command descriptors coalesce all of their requests into a big list? So a couple of thoughts on how to deal with this: 1) Define a virtual channel for the command descriptors vs a normal DMA transaction. This lets you use the same hardware channel, but lets you discern which descriptor format you need to use. The only issue I see with this is the required change in device tree binding to target the right type of channel (cmd vs normal). 2) Provide an API to set flags for the descriptor on a whole descriptor basis. 3) If you have a set of transactions described by an sgl that has disparate use of flags, you split the list and use a separate transaction. In other words, we need to enforce that the flag set API will be applied to all descriptors described by an sgl. This means that the whole transaction may be comprised of multiple async TX descriptors. Regards, Andy -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Dec 18, 2016 at 11:06:42PM -0600, Andy Gross wrote: > On Sun, Dec 18, 2016 at 09:56:02PM +0530, Vinod Koul wrote: > > On Thu, Dec 15, 2016 at 03:25:52PM +0530, Abhishek Sahu wrote: > > > The current DMA APIs only support SGL or data in generic format. > > > The QCA BAM DMA engine data cannot be mapped with already > > > available APIs due to following reasons. > > > > > > 1. The QCA BAM DMA engine uses custom flags which cannot be > > > mapped with generic DMA engine flags. > > > 2. Some peripheral driver like QCA QPIC NAND/LCD requires to > > > set specific flags (Like NWD, EOT) for some of the descriptors > > > in scatter gather list. The already available mapping APIs take > > > flags parameter in API itself and there is no support for > > > passing DMA specific flags for each SGL entry. > > > > > > Now this patch adds the support for making the DMA descriptor from > > > custom data with new DMA mapping function prep_dma_custom_mapping. > > > The peripheral driver will pass the custom data in this function and > > > DMA engine driver will form the descriptor according to its own > > > logic. In future, this API can be used by any other DMA engine > > > drivers also which are unable to do DMA mapping with already > > > available API’s. > > > > > > Signed-off-by: Abhishek Sahu <absahu@codeaurora.org> > > > --- > > > include/linux/dmaengine.h | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h > > > index cc535a4..6324c1f 100644 > > > --- a/include/linux/dmaengine.h > > > +++ b/include/linux/dmaengine.h > > > @@ -692,6 +692,8 @@ struct dma_filter { > > > * be called after period_len bytes have been transferred. > > > * @device_prep_interleaved_dma: Transfer expression in a generic way. > > > * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address > > > + * @device_prep_dma_custom_mapping: prepares a dma operation from dma driver > > > + * specific custom data > > > * @device_config: Pushes a new configuration to a channel, return 0 or an error > > > * code > > > * @device_pause: Pauses any transfer happening on a channel. Returns > > > @@ -783,6 +785,9 @@ struct dma_device { > > > struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)( > > > struct dma_chan *chan, dma_addr_t dst, u64 data, > > > unsigned long flags); > > > + struct dma_async_tx_descriptor *(*device_prep_dma_custom_mapping)( > > > + struct dma_chan *chan, void *data, > > > + unsigned long flags); > > > > This needs a discussion. Why do you need to add a new API for framework. > > > > What are NWD and EOT flags, cna you details out the flags? > > These are the notify when done and end of transaction flags. I believe the last > time we talked about this, we (Vinod and I) agreed to just expose a QCOM only interface to set > the special transaction flags. You'd then have to have some other API to fixup > the descriptor with the right qcom flags. Okay, do you have pointer on that one, will avoid asking the same questions again. > Ahbishek, correct me where i am wrong on the following: > So two main differences between a normal descriptor and a command descriptor: > 1) size of the descriptor > 2) the flag setting > 3) data sent in is a modified scatter gather that includes flags , vs a normal > scatter gather > > So the CMD descriptors in a given sgl can all have varying flags set? I'd assume > they all have CMD flag set. Do the current users of the command descriptors > coalesce all of their requests into a big list? > > So a couple of thoughts on how to deal with this: > > 1) Define a virtual channel for the command descriptors vs a normal DMA > transaction. This lets you use the same hardware channel, but lets you discern > which descriptor format you need to use. The only issue I see with this is the > required change in device tree binding to target the right type of channel (cmd > vs normal). Or mark the descriptor is cmd and write accordingly... > > 2) Provide an API to set flags for the descriptor on a whole descriptor basis. > > 3) If you have a set of transactions described by an sgl that has disparate use > of flags, you split the list and use a separate transaction. In other words, we > need to enforce that the flag set API will be applied to all descriptors > described by an sgl. This means that the whole transaction may be comprised of > multiple async TX descriptors.
On Mon, Dec 19, 2016 at 09:19:23PM +0530, Vinod Koul wrote: > On Sun, Dec 18, 2016 at 11:06:42PM -0600, Andy Gross wrote: > > On Sun, Dec 18, 2016 at 09:56:02PM +0530, Vinod Koul wrote: > > > On Thu, Dec 15, 2016 at 03:25:52PM +0530, Abhishek Sahu wrote: > > > > The current DMA APIs only support SGL or data in generic format. > > > > The QCA BAM DMA engine data cannot be mapped with already > > > > available APIs due to following reasons. > > > > > > > > 1. The QCA BAM DMA engine uses custom flags which cannot be > > > > mapped with generic DMA engine flags. > > > > 2. Some peripheral driver like QCA QPIC NAND/LCD requires to > > > > set specific flags (Like NWD, EOT) for some of the descriptors > > > > in scatter gather list. The already available mapping APIs take > > > > flags parameter in API itself and there is no support for > > > > passing DMA specific flags for each SGL entry. > > > > > > > > Now this patch adds the support for making the DMA descriptor from > > > > custom data with new DMA mapping function prep_dma_custom_mapping. > > > > The peripheral driver will pass the custom data in this function and > > > > DMA engine driver will form the descriptor according to its own > > > > logic. In future, this API can be used by any other DMA engine > > > > drivers also which are unable to do DMA mapping with already > > > > available API’s. > > > > > > > > Signed-off-by: Abhishek Sahu <absahu@codeaurora.org> > > > > --- > > > > include/linux/dmaengine.h | 5 +++++ > > > > 1 file changed, 5 insertions(+) > > > > > > > > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h > > > > index cc535a4..6324c1f 100644 > > > > --- a/include/linux/dmaengine.h > > > > +++ b/include/linux/dmaengine.h > > > > @@ -692,6 +692,8 @@ struct dma_filter { > > > > * be called after period_len bytes have been transferred. > > > > * @device_prep_interleaved_dma: Transfer expression in a generic way. > > > > * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address > > > > + * @device_prep_dma_custom_mapping: prepares a dma operation from dma driver > > > > + * specific custom data > > > > * @device_config: Pushes a new configuration to a channel, return 0 or an error > > > > * code > > > > * @device_pause: Pauses any transfer happening on a channel. Returns > > > > @@ -783,6 +785,9 @@ struct dma_device { > > > > struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)( > > > > struct dma_chan *chan, dma_addr_t dst, u64 data, > > > > unsigned long flags); > > > > + struct dma_async_tx_descriptor *(*device_prep_dma_custom_mapping)( > > > > + struct dma_chan *chan, void *data, > > > > + unsigned long flags); > > > > > > This needs a discussion. Why do you need to add a new API for framework. > > > > > > What are NWD and EOT flags, cna you details out the flags? > > > > These are the notify when done and end of transaction flags. I believe the last > > time we talked about this, we (Vinod and I) agreed to just expose a QCOM only interface to set > > the special transaction flags. You'd then have to have some other API to fixup > > the descriptor with the right qcom flags. > > Okay, do you have pointer on that one, will avoid asking the same questions > again. I'll see if I can find the correspondance and send to you directly. > > > Ahbishek, correct me where i am wrong on the following: > > So two main differences between a normal descriptor and a command descriptor: > > 1) size of the descriptor > > 2) the flag setting > > 3) data sent in is a modified scatter gather that includes flags , vs a normal > > scatter gather > > > > So the CMD descriptors in a given sgl can all have varying flags set? I'd assume > > they all have CMD flag set. Do the current users of the command descriptors > > coalesce all of their requests into a big list? > > > > So a couple of thoughts on how to deal with this: > > > > 1) Define a virtual channel for the command descriptors vs a normal DMA > > transaction. This lets you use the same hardware channel, but lets you discern > > which descriptor format you need to use. The only issue I see with this is the > > required change in device tree binding to target the right type of channel (cmd > > vs normal). > > Or mark the descriptor is cmd and write accordingly... The only issue i see with that is knowing how much to pre-allocate during the prep call. The flag set API would be called on the allocated tx descriptor. So you'd have to know up front and be able to specify it. > > > > > 2) Provide an API to set flags for the descriptor on a whole descriptor basis. > > > > 3) If you have a set of transactions described by an sgl that has disparate use > > of flags, you split the list and use a separate transaction. In other words, we > > need to enforce that the flag set API will be applied to all descriptors > > described by an sgl. This means that the whole transaction may be comprised of > > multiple async TX descriptors. Regards, Andy -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2016-12-19 23:22, Andy Gross wrote: > On Mon, Dec 19, 2016 at 09:19:23PM +0530, Vinod Koul wrote: >> On Sun, Dec 18, 2016 at 11:06:42PM -0600, Andy Gross wrote: >> > On Sun, Dec 18, 2016 at 09:56:02PM +0530, Vinod Koul wrote: >> > > On Thu, Dec 15, 2016 at 03:25:52PM +0530, Abhishek Sahu wrote: >> > > > The current DMA APIs only support SGL or data in generic format. >> > > > The QCA BAM DMA engine data cannot be mapped with already >> > > > available APIs due to following reasons. >> > > > >> > > > 1. The QCA BAM DMA engine uses custom flags which cannot be >> > > > mapped with generic DMA engine flags. >> > > > 2. Some peripheral driver like QCA QPIC NAND/LCD requires to >> > > > set specific flags (Like NWD, EOT) for some of the descriptors >> > > > in scatter gather list. The already available mapping APIs take >> > > > flags parameter in API itself and there is no support for >> > > > passing DMA specific flags for each SGL entry. >> > > > >> > > > Now this patch adds the support for making the DMA descriptor from >> > > > custom data with new DMA mapping function prep_dma_custom_mapping. >> > > > The peripheral driver will pass the custom data in this function and >> > > > DMA engine driver will form the descriptor according to its own >> > > > logic. In future, this API can be used by any other DMA engine >> > > > drivers also which are unable to do DMA mapping with already >> > > > available API’s. >> > > > >> > > > Signed-off-by: Abhishek Sahu <absahu@codeaurora.org> >> > > > --- >> > > > include/linux/dmaengine.h | 5 +++++ >> > > > 1 file changed, 5 insertions(+) >> > > > >> > > This needs a discussion. Why do you need to add a new API for framework. >> > > >> > > What are NWD and EOT flags, cna you details out the flags? >> > The QCA BAM descriptor has multiple flags. Following is the detailed explanation for these flags 1. EOT (End of Transfer) – this flag is used to specify end of transaction at the end of this descriptor. 2. EOB (End of Blcok) – this flag is used to specify end of block at the end of this descriptor. 3. NWD (Notify When Done) – when set, NWD provides a handshake between peripheral and BAM indicating the transaction is truly done and data/command has delivered its destination. SW can use the NWD feature in order to make the BAM to separate between executions of consecutive descriptors. This can be useful for features like Command Descriptor. 4. CMD (Command) - allows the SW to create descriptors of type Command which does not generate any data transmissions but configures registers in the Peripheral (write operations, and read registers operations). >> > These are the notify when done and end of transaction flags. I believe the last >> > time we talked about this, we (Vinod and I) agreed to just expose a QCOM only interface to set >> > the special transaction flags. You'd then have to have some other API to fixup >> > the descriptor with the right qcom flags. >> >> Okay, do you have pointer on that one, will avoid asking the same >> questions >> again. > > I'll see if I can find the correspondance and send to you directly. > >> >> > Ahbishek, correct me where i am wrong on the following: >> > So two main differences between a normal descriptor and a command descriptor: >> > 1) size of the descriptor >> > 2) the flag setting >> > 3) data sent in is a modified scatter gather that includes flags , vs a normal >> > scatter gather Top level descriptor is same for both. Only difference is Command flag. The command descriptor will contain list of register read/write instead of data address The peripheral driver can form the list with helper function provided in patch 5 and submit it to BAM. The main issue is for other flag like EOT/NWD. The top level descriptor is again in the form of list where BAM writes the address of the list in register before starting of transfer. In this list, each element will have different flags. >> > >> > So the CMD descriptors in a given sgl can all have varying flags set? I'd assume >> > they all have CMD flag set. Do the current users of the command descriptors >> > coalesce all of their requests into a big list? >> > The main user for command descriptor is currently QPIC NAND/LCD. The NAND uses 3 BAM channels- tx, rx and command. NAND controller do the data transfer in chunk of codeword (maximum 544 bytes). NAND chip does the data transfer on page basis so each page read/write can have multiple codewords. The NAND driver prepares command, tx(write) or rx(read) descriptor for complete page , submit it to BAM and wait for its completion. So NAND driver coalesces all of their requests into a big list. In this big list, 1. Some of the request for command channel requires NWD flag to be set. 2. TX request depends upon the setting of EOT flag so some of the TX request in complete page write will contain EOT flag and others will not. So this custom mapping will be used for data descriptor also in NAND driver. >> > So a couple of thoughts on how to deal with this: >> > >> > 1) Define a virtual channel for the command descriptors vs a normal DMA >> > transaction. This lets you use the same hardware channel, but lets you discern >> > which descriptor format you need to use. The only issue I see with this is the >> > required change in device tree binding to target the right type of channel (cmd >> > vs normal). >> >> Or mark the descriptor is cmd and write accordingly... > > The only issue i see with that is knowing how much to pre-allocate > during the > prep call. The flag set API would be called on the allocated tx > descriptor. So > you'd have to know up front and be able to specify it. > >> >> > >> > 2) Provide an API to set flags for the descriptor on a whole descriptor basis. >> > >> > 3) If you have a set of transactions described by an sgl that has disparate use >> > of flags, you split the list and use a separate transaction. In other words, we >> > need to enforce that the flag set API will be applied to all descriptors >> > described by an sgl. This means that the whole transaction may be comprised of >> > multiple async TX descriptors. Each async TX descriptor will generate the BAM interrupt so we are deviating from main purpose of DMA where ideally we should get the interrupt at the end of transfer. This is the main reason for going for this patch. With the submitted patch, only 1 interrupt per channel is required for complete NAND page and it solves the setting of BAM specific flags also. Only issue with this patch is adding new API in DMA framework itself. But this API can be used by other DMA engines in future for which mapping cannot be done with available APIs and if this mapping is vendor specific. > > Regards, > > Andy
On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote: <snip> > >>Okay, do you have pointer on that one, will avoid asking the same > >>questions > >>again. > > > >I'll see if I can find the correspondance and send to you directly. > > > >> > >>> Ahbishek, correct me where i am wrong on the following: > >>> So two main differences between a normal descriptor and a command descriptor: > >>> 1) size of the descriptor > >>> 2) the flag setting > >>> 3) data sent in is a modified scatter gather that includes flags , vs a normal > >>> scatter gather > > Top level descriptor is same for both. Only difference is Command flag. The > command descriptor will contain list of register read/write instead of data > address > The peripheral driver can form the list with helper function provided in > patch 5 > and submit it to BAM. The main issue is for other flag like EOT/NWD. > > The top level descriptor is again in the form of list where BAM writes the > address of the list in register before starting of transfer. In this list, > each element will have different flags. Ah that's right. The command descriptor information is the format of the data pointed to by the sgl. So you'd have some set of register read/writes described in those entries. > > >>> > >>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume > >>> they all have CMD flag set. Do the current users of the command descriptors > >>> coalesce all of their requests into a big list? > >>> > > The main user for command descriptor is currently QPIC NAND/LCD. The NAND > uses > 3 BAM channels- tx, rx and command. NAND controller do the data transfer in > chunk of codeword (maximum 544 bytes). NAND chip does the data transfer on > page basis so each page read/write can have multiple codewords. The NAND > driver prepares command, tx(write) or rx(read) descriptor for complete page > , submit it to BAM and wait for its completion. So NAND driver coalesces > all of their requests into a big list. In this big list, > > 1. Some of the request for command channel requires NWD flag to be set. I'd expect this to occur at the end of a chain. So if you had 5 CMD descriptors described in the SGL, only the last descriptor would have the NWD set. Correct? > 2. TX request depends upon the setting of EOT flag so some of the TX request > in complete page write will contain EOT flag and others will not. So this > custom mapping will be used for data descriptor also in NAND driver. Can you give a sequence description of the descriptors and flags? I haven't seen the NAND documentation that describes the sequence/flow. > >>> So a couple of thoughts on how to deal with this: > >>> > >>> 1) Define a virtual channel for the command descriptors vs a normal DMA > >>> transaction. This lets you use the same hardware channel, but lets you discern > >>> which descriptor format you need to use. The only issue I see with this is the > >>> required change in device tree binding to target the right type of channel (cmd > >>> vs normal). > >> > >>Or mark the descriptor is cmd and write accordingly... > > > >The only issue i see with that is knowing how much to pre-allocate during > >the > >prep call. The flag set API would be called on the allocated tx > >descriptor. So > >you'd have to know up front and be able to specify it. > > > >> > >>> > >>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis. > >>> > >>> 3) If you have a set of transactions described by an sgl that has disparate use > >>> of flags, you split the list and use a separate transaction. In other words, we > >>> need to enforce that the flag set API will be applied to all descriptors > >>> described by an sgl. This means that the whole transaction may be comprised of > >>> multiple async TX descriptors. > > Each async TX descriptor will generate the BAM interrupt so we are deviating > from main purpose of DMA where ideally we should get the interrupt at the > end > of transfer. This is the main reason for going for this patch. If the client queues 1 descriptor or 5 descriptors, it doesn't matter that much. The client knows when it is done by waiting for the descriptors to complete. It is less efficient than grouping them all, but it should still work. > > With the submitted patch, only 1 interrupt per channel is required for > complete NAND page and it solves the setting of BAM specific flags also. > Only issue with this patch is adding new API in DMA framework itself. But > this API can be used by other DMA engines in future for which mapping cannot > be done with available APIs and if this mapping is vendor specific. I guess the key point in all of this is that the DMA operation being done is not a normal data flow to/from the device. It's direct remote register access to the device using address information contained in the sgl. And you are collating the standard data access along with the special command access in the same API call. Regards, Andy -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2016-12-21 01:55, Andy Gross wrote: > On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote: > > <snip> > >> >>Okay, do you have pointer on that one, will avoid asking the same >> >>questions >> >>again. >> > >> >I'll see if I can find the correspondance and send to you directly. >> > >> >> >> >>> Ahbishek, correct me where i am wrong on the following: >> >>> So two main differences between a normal descriptor and a command descriptor: >> >>> 1) size of the descriptor >> >>> 2) the flag setting >> >>> 3) data sent in is a modified scatter gather that includes flags , vs a normal >> >>> scatter gather >> >> Top level descriptor is same for both. Only difference is Command >> flag. The >> command descriptor will contain list of register read/write instead of >> data >> address >> The peripheral driver can form the list with helper function provided >> in >> patch 5 >> and submit it to BAM. The main issue is for other flag like EOT/NWD. >> >> The top level descriptor is again in the form of list where BAM writes >> the >> address of the list in register before starting of transfer. In this >> list, >> each element will have different flags. > > Ah that's right. The command descriptor information is the format of > the data > pointed to by the sgl. So you'd have some set of register read/writes > described > in those entries. > >> >> >>> >> >>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume >> >>> they all have CMD flag set. Do the current users of the command descriptors >> >>> coalesce all of their requests into a big list? >> >>> >> >> The main user for command descriptor is currently QPIC NAND/LCD. The >> NAND >> uses >> 3 BAM channels- tx, rx and command. NAND controller do the data >> transfer in >> chunk of codeword (maximum 544 bytes). NAND chip does the data >> transfer on >> page basis so each page read/write can have multiple codewords. The >> NAND >> driver prepares command, tx(write) or rx(read) descriptor for complete >> page >> , submit it to BAM and wait for its completion. So NAND driver >> coalesces >> all of their requests into a big list. In this big list, >> >> 1. Some of the request for command channel requires NWD flag to be >> set. > > I'd expect this to occur at the end of a chain. So if you had 5 CMD > descriptors > described in the SGL, only the last descriptor would have the NWD set. > Correct? > >> 2. TX request depends upon the setting of EOT flag so some of the TX >> request >> in complete page write will contain EOT flag and others will not. >> So this >> custom mapping will be used for data descriptor also in NAND >> driver. > > Can you give a sequence description of the descriptors and flags? I > haven't > seen the NAND documentation that describes the sequence/flow. Following is the sample list of command descriptor for page write(2K page). The actual list will contain more no of descriptor which involves spare area transfer also. Index INT NWD CMD 24bit Register Address 0 - - 1 0x0000F0 (EBI2_ECC_BUF_CFG) 1 - - 1 0x000020 (NAND_DEVn_CFG0) 0x000024 (NAND_DEVn_CFG1) 0x0000F0 (EBI2_ECC_BUF_CFG) 0x00000C (NAND_FLASH_CHIP_SELECT) 2 - - 1 0x000004 (NAND_ADDR0) 0x000008 (NAND_ADDR1) 3 - 1 1 0x000000 (NAND_FLASH_CMD) 4 - 1 1 0x000010 (NANDC_EXEC_CMD) 5 - - 1 0x000014 (NAND_FLASH_STATUS) 6 - 1 1 0x000000 (NAND_FLASH_CMD) 7 - 1 1 0x000010 (NANDC_EXEC_CMD) 8 - - 1 0x000014 (NAND_FLASH_STATUS) 9 - 1 1 0x000000 (NAND_FLASH_CMD) 10 - 1 1 0x000010 (NANDC_EXEC_CMD) 11 - - 1 0x000014 (NAND_FLASH_STATUS) 12 - 1 1 0x000000 (NAND_FLASH_CMD) 13 - 1 1 0x000010 (NANDC_EXEC_CMD) 14 1 - 1 0x000014 (NAND_FLASH_STATUS) 15 - - 1 0x000044 (NAND_FLASH_READ_STATUS) 0x000014 (NAND_FLASH_STATUS) > >> >>> So a couple of thoughts on how to deal with this: >> >>> >> >>> 1) Define a virtual channel for the command descriptors vs a normal DMA >> >>> transaction. This lets you use the same hardware channel, but lets you discern >> >>> which descriptor format you need to use. The only issue I see with this is the >> >>> required change in device tree binding to target the right type of channel (cmd >> >>> vs normal). >> >> >> >>Or mark the descriptor is cmd and write accordingly... >> > >> >The only issue i see with that is knowing how much to pre-allocate during >> >the >> >prep call. The flag set API would be called on the allocated tx >> >descriptor. So >> >you'd have to know up front and be able to specify it. >> > >> >> >> >>> >> >>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis. >> >>> >> >>> 3) If you have a set of transactions described by an sgl that has disparate use >> >>> of flags, you split the list and use a separate transaction. In other words, we >> >>> need to enforce that the flag set API will be applied to all descriptors >> >>> described by an sgl. This means that the whole transaction may be comprised of >> >>> multiple async TX descriptors. >> >> Each async TX descriptor will generate the BAM interrupt so we are >> deviating >> from main purpose of DMA where ideally we should get the interrupt at >> the >> end >> of transfer. This is the main reason for going for this patch. > > If the client queues 1 descriptor or 5 descriptors, it doesn't matter > that much. > The client knows when it is done by waiting for the descriptors to > complete. It > is less efficient than grouping them all, but it should still work. > Yes. client will wait for final descriptor completion. But these interrupts will be overhead for CPU. For 1-2 page it won't matter much I guess it will be significant for complete chip read/write(during boot and FS i.e JFFS operations). >> >> With the submitted patch, only 1 interrupt per channel is required for >> complete NAND page and it solves the setting of BAM specific flags >> also. >> Only issue with this patch is adding new API in DMA framework itself. >> But >> this API can be used by other DMA engines in future for which mapping >> cannot >> be done with available APIs and if this mapping is vendor specific. > > I guess the key point in all of this is that the DMA operation being > done is not > a normal data flow to/from the device. It's direct remote register > access to > the device using address information contained in the sgl. And you are > collating the standard data access along with the special command > access in the > same API call. Yes. Normally DMA engine (QCA ADM DMA engine also) allows to read/write memory mapped registers just like data. But BAM is different (Since it is not a global DMA Engine and coupled with peripheral). Also, this different flag requirement is not just for command descriptors but for data descriptors also. BAM data access and command access differs only with flag and register read/write list. The register read and write list will be simply array of struct bam_cmd_element added in patch struct bam_cmd_element { __le32 addr:24; __le32 command:8; __le32 data; __le32 mask; __le32 reserved; }; The address and size of the array will be passed in data and size field of SGL. If we want to form the SGL for mentioned list then we will have SGL of size 15 with just one descriptor. Now we require different flag for each SG entry. currently SG does not have flag parameter and we can't add flag parameter just for our requirement in generic SG. So we have added the custom mapping function and passed modified SG as parameter which is generic SG and flag.
On Thu, Dec 22, 2016 at 01:04:37AM +0530, Abhishek Sahu wrote: > On 2016-12-21 01:55, Andy Gross wrote: > >On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote: > > > ><snip> > > > >>>>Okay, do you have pointer on that one, will avoid asking the same > >>>>questions > >>>>again. > >>> > >>>I'll see if I can find the correspondance and send to you directly. > >>> > >>>> > >>>>> Ahbishek, correct me where i am wrong on the following: > >>>>> So two main differences between a normal descriptor and a command descriptor: > >>>>> 1) size of the descriptor > >>>>> 2) the flag setting > >>>>> 3) data sent in is a modified scatter gather that includes flags , vs a normal > >>>>> scatter gather > >> > >>Top level descriptor is same for both. Only difference is Command flag. > >>The > >>command descriptor will contain list of register read/write instead of > >>data > >>address > >>The peripheral driver can form the list with helper function provided in > >>patch 5 > >>and submit it to BAM. The main issue is for other flag like EOT/NWD. > >> > >>The top level descriptor is again in the form of list where BAM writes > >>the > >>address of the list in register before starting of transfer. In this > >>list, > >>each element will have different flags. > > > >Ah that's right. The command descriptor information is the format of the > >data > >pointed to by the sgl. So you'd have some set of register read/writes > >described > >in those entries. > > > >> > >>>>> > >>>>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume > >>>>> they all have CMD flag set. Do the current users of the command descriptors > >>>>> coalesce all of their requests into a big list? > >>>>> > >> > >>The main user for command descriptor is currently QPIC NAND/LCD. The > >>NAND > >>uses > >>3 BAM channels- tx, rx and command. NAND controller do the data transfer > >>in > >>chunk of codeword (maximum 544 bytes). NAND chip does the data transfer > >>on > >>page basis so each page read/write can have multiple codewords. The NAND > >>driver prepares command, tx(write) or rx(read) descriptor for complete > >>page > >>, submit it to BAM and wait for its completion. So NAND driver coalesces > >>all of their requests into a big list. In this big list, > >> > >>1. Some of the request for command channel requires NWD flag to be set. > > > >I'd expect this to occur at the end of a chain. So if you had 5 CMD > >descriptors > >described in the SGL, only the last descriptor would have the NWD set. > >Correct? > > > >>2. TX request depends upon the setting of EOT flag so some of the TX > >>request > >> in complete page write will contain EOT flag and others will not. So > >>this > >> custom mapping will be used for data descriptor also in NAND driver. > > > >Can you give a sequence description of the descriptors and flags? I > >haven't > >seen the NAND documentation that describes the sequence/flow. > > Following is the sample list of command descriptor for page write(2K page). > The actual list will contain more no of descriptor which involves > spare area transfer also. > > Index INT NWD CMD 24bit Register Address > 0 - - 1 0x0000F0 (EBI2_ECC_BUF_CFG) > > 1 - - 1 0x000020 (NAND_DEVn_CFG0) > 0x000024 (NAND_DEVn_CFG1) > 0x0000F0 (EBI2_ECC_BUF_CFG) > 0x00000C (NAND_FLASH_CHIP_SELECT) > > 2 - - 1 0x000004 (NAND_ADDR0) > 0x000008 (NAND_ADDR1) > > 3 - 1 1 0x000000 (NAND_FLASH_CMD) > > 4 - 1 1 0x000010 (NANDC_EXEC_CMD) > > 5 - - 1 0x000014 (NAND_FLASH_STATUS) > > 6 - 1 1 0x000000 (NAND_FLASH_CMD) > > 7 - 1 1 0x000010 (NANDC_EXEC_CMD) > > 8 - - 1 0x000014 (NAND_FLASH_STATUS) > > 9 - 1 1 0x000000 (NAND_FLASH_CMD) > > 10 - 1 1 0x000010 (NANDC_EXEC_CMD) > > 11 - - 1 0x000014 (NAND_FLASH_STATUS) > > 12 - 1 1 0x000000 (NAND_FLASH_CMD) > > 13 - 1 1 0x000010 (NANDC_EXEC_CMD) > > 14 1 - 1 0x000014 (NAND_FLASH_STATUS) > > 15 - - 1 0x000044 (NAND_FLASH_READ_STATUS) > 0x000014 (NAND_FLASH_STATUS) Yeah I was expecting something like: - Setup NAND controller using some command writes (indices 0-4) Loop doing the following until all the data is done: - Send/Receive the Data - Check status. The only one that sticks out to me is index 14. Is the INT flag there to mark the actual end of the data transfer from the device? Then you do one more Status read. > > > >>>>> So a couple of thoughts on how to deal with this: > >>>>> > >>>>> 1) Define a virtual channel for the command descriptors vs a normal DMA > >>>>> transaction. This lets you use the same hardware channel, but lets you discern > >>>>> which descriptor format you need to use. The only issue I see with this is the > >>>>> required change in device tree binding to target the right type of channel (cmd > >>>>> vs normal). > >>>> > >>>>Or mark the descriptor is cmd and write accordingly... > >>> > >>>The only issue i see with that is knowing how much to pre-allocate during > >>>the > >>>prep call. The flag set API would be called on the allocated tx > >>>descriptor. So > >>>you'd have to know up front and be able to specify it. > >>> > >>>> > >>>>> > >>>>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis. > >>>>> > >>>>> 3) If you have a set of transactions described by an sgl that has disparate use > >>>>> of flags, you split the list and use a separate transaction. In other words, we > >>>>> need to enforce that the flag set API will be applied to all descriptors > >>>>> described by an sgl. This means that the whole transaction may be comprised of > >>>>> multiple async TX descriptors. > >> > >>Each async TX descriptor will generate the BAM interrupt so we are > >>deviating > >>from main purpose of DMA where ideally we should get the interrupt at > >>the > >>end > >>of transfer. This is the main reason for going for this patch. > > > >If the client queues 1 descriptor or 5 descriptors, it doesn't matter that > >much. > >The client knows when it is done by waiting for the descriptors to > >complete. It > >is less efficient than grouping them all, but it should still work. > > > Yes. client will wait for final descriptor completion. But these interrupts > will be overhead for CPU. For 1-2 page it won't matter much I guess it will > be > significant for complete chip read/write(during boot and FS i.e JFFS > operations). > >> > >>With the submitted patch, only 1 interrupt per channel is required for > >>complete NAND page and it solves the setting of BAM specific flags also. > >>Only issue with this patch is adding new API in DMA framework itself. > >>But > >>this API can be used by other DMA engines in future for which mapping > >>cannot > >>be done with available APIs and if this mapping is vendor specific. > > > >I guess the key point in all of this is that the DMA operation being done > >is not > >a normal data flow to/from the device. It's direct remote register access > >to > >the device using address information contained in the sgl. And you are > >collating the standard data access along with the special command access > >in the > >same API call. > Yes. Normally DMA engine (QCA ADM DMA engine also) allows to read/write > memory mapped > registers just like data. But BAM is different (Since it is not a global DMA > Engine > and coupled with peripheral). Also, this different flag requirement is not > just > for command descriptors but for data descriptors also. > > BAM data access and command access differs only with flag and register > read/write > list. The register read and write list will be simply array of > struct bam_cmd_element added in patch > struct bam_cmd_element { > __le32 addr:24; > __le32 command:8; > __le32 data; > __le32 mask; > __le32 reserved; > }; > > The address and size of the array will be passed in data and size field of > SGL. > If we want to form the SGL for mentioned list then we will have SGL of size > 15 > with just one descriptor. > > Now we require different flag for each SG entry. currently SG does not have > flag parameter and we can't add flag parameter just for our requirement in > generic SG. So we have added the custom mapping function and passed modified > SG > as parameter which is generic SG and flag. I really think that we need some additional API that allows for the flag munging for the descriptors instead of overriding the prep_slave_sg. We already needed to change the way the flags are passed anyway. And instead of building up a special sg list, the API should take a structure that has a 1:1 mapping of the flags to the descriptors. And you would call this API on your descriptor before issuing it. So build up the sglist. Call the prep_slave_sg. You get back a tx descriptor that underneath is a bam descriptor. Then call the API giving the descriptor and the structure that defines the flags for the descriptors. Then submit the descriptor. Something like: int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx, u16 *flags) { struct bam_async_desc async_desc = container_of(tx, struct bam_async_desc, vd.tx); int i; for (i = 0; i < async_desc->num_desc; i++) async_desc->desc[i].flags = cpu_to_le16(flags[i]); } EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags) This applies the flags directly to the underlying hardware descriptors. The prep_slave_sg call would need to remove all the flag munging. The bam_start_dma would need to account for this as well by only setting the INT flag if the transfer cannot get all of the descriptors in the FIFO. Regards, Andy -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2016-12-29 23:24, Andy Gross wrote: > On Thu, Dec 22, 2016 at 01:04:37AM +0530, Abhishek Sahu wrote: >> On 2016-12-21 01:55, Andy Gross wrote: >> >On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote: >> > >> ><snip> >> > >> >>>>Okay, do you have pointer on that one, will avoid asking the same >> >>>>questions >> >>>>again. >> >>> >> >>>I'll see if I can find the correspondance and send to you directly. >> >>> >> >>>> >> >>>>> Ahbishek, correct me where i am wrong on the following: >> >>>>> So two main differences between a normal descriptor and a command descriptor: >> >>>>> 1) size of the descriptor >> >>>>> 2) the flag setting >> >>>>> 3) data sent in is a modified scatter gather that includes flags , vs a normal >> >>>>> scatter gather >> >> >> >>Top level descriptor is same for both. Only difference is Command flag. >> >>The >> >>command descriptor will contain list of register read/write instead of >> >>data >> >>address >> >>The peripheral driver can form the list with helper function provided in >> >>patch 5 >> >>and submit it to BAM. The main issue is for other flag like EOT/NWD. >> >> >> >>The top level descriptor is again in the form of list where BAM writes >> >>the >> >>address of the list in register before starting of transfer. In this >> >>list, >> >>each element will have different flags. >> > >> >Ah that's right. The command descriptor information is the format of the >> >data >> >pointed to by the sgl. So you'd have some set of register read/writes >> >described >> >in those entries. >> > >> >> >> >>>>> >> >>>>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume >> >>>>> they all have CMD flag set. Do the current users of the command descriptors >> >>>>> coalesce all of their requests into a big list? >> >>>>> >> >> >> >>The main user for command descriptor is currently QPIC NAND/LCD. The >> >>NAND >> >>uses >> >>3 BAM channels- tx, rx and command. NAND controller do the data transfer >> >>in >> >>chunk of codeword (maximum 544 bytes). NAND chip does the data transfer >> >>on >> >>page basis so each page read/write can have multiple codewords. The NAND >> >>driver prepares command, tx(write) or rx(read) descriptor for complete >> >>page >> >>, submit it to BAM and wait for its completion. So NAND driver coalesces >> >>all of their requests into a big list. In this big list, >> >> >> >>1. Some of the request for command channel requires NWD flag to be set. >> > >> >I'd expect this to occur at the end of a chain. So if you had 5 CMD >> >descriptors >> >described in the SGL, only the last descriptor would have the NWD set. >> >Correct? >> > >> >>2. TX request depends upon the setting of EOT flag so some of the TX >> >>request >> >> in complete page write will contain EOT flag and others will not. So >> >>this >> >> custom mapping will be used for data descriptor also in NAND driver. >> > >> >Can you give a sequence description of the descriptors and flags? I >> >haven't >> >seen the NAND documentation that describes the sequence/flow. >> >> Following is the sample list of command descriptor for page write(2K >> page). >> The actual list will contain more no of descriptor which involves >> spare area transfer also. >> >> Index INT NWD CMD 24bit Register Address >> 0 - - 1 0x0000F0 (EBI2_ECC_BUF_CFG) >> >> 1 - - 1 0x000020 (NAND_DEVn_CFG0) >> 0x000024 (NAND_DEVn_CFG1) >> 0x0000F0 (EBI2_ECC_BUF_CFG) >> 0x00000C (NAND_FLASH_CHIP_SELECT) >> >> 2 - - 1 0x000004 (NAND_ADDR0) >> 0x000008 (NAND_ADDR1) >> >> 3 - 1 1 0x000000 (NAND_FLASH_CMD) >> >> 4 - 1 1 0x000010 (NANDC_EXEC_CMD) >> >> 5 - - 1 0x000014 (NAND_FLASH_STATUS) >> >> 6 - 1 1 0x000000 (NAND_FLASH_CMD) >> >> 7 - 1 1 0x000010 (NANDC_EXEC_CMD) >> >> 8 - - 1 0x000014 (NAND_FLASH_STATUS) >> >> 9 - 1 1 0x000000 (NAND_FLASH_CMD) >> >> 10 - 1 1 0x000010 (NANDC_EXEC_CMD) >> >> 11 - - 1 0x000014 (NAND_FLASH_STATUS) >> >> 12 - 1 1 0x000000 (NAND_FLASH_CMD) >> >> 13 - 1 1 0x000010 (NANDC_EXEC_CMD) >> >> 14 1 - 1 0x000014 (NAND_FLASH_STATUS) >> >> 15 - - 1 0x000044 (NAND_FLASH_READ_STATUS) >> 0x000014 (NAND_FLASH_STATUS) > > Yeah I was expecting something like: > - Setup NAND controller using some command writes (indices 0-4) > Loop doing the following until all the data is done: > - Send/Receive the Data > - Check status. > > The only one that sticks out to me is index 14. Is the INT flag there > to mark > the actual end of the data transfer from the device? Then you do one > more > Status read. > This is sample list given in NAND document. INT will be set only for the last command. I checked the NAND driver in which status will be read only once for each codeword. >> > >> >>>>> So a couple of thoughts on how to deal with this: >> >>>>> >> >>>>> 1) Define a virtual channel for the command descriptors vs a normal DMA >> >>>>> transaction. This lets you use the same hardware channel, but lets you discern >> >>>>> which descriptor format you need to use. The only issue I see with this is the >> >>>>> required change in device tree binding to target the right type of channel (cmd >> >>>>> vs normal). >> >>>> >> >>>>Or mark the descriptor is cmd and write accordingly... >> >>> >> >>>The only issue i see with that is knowing how much to pre-allocate during >> >>>the >> >>>prep call. The flag set API would be called on the allocated tx >> >>>descriptor. So >> >>>you'd have to know up front and be able to specify it. >> >>> >> >>>> >> >>>>> >> >>>>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis. >> >>>>> >> >>>>> 3) If you have a set of transactions described by an sgl that has disparate use >> >>>>> of flags, you split the list and use a separate transaction. In other words, we >> >>>>> need to enforce that the flag set API will be applied to all descriptors >> >>>>> described by an sgl. This means that the whole transaction may be comprised of >> >>>>> multiple async TX descriptors. >> >> >> >>Each async TX descriptor will generate the BAM interrupt so we are >> >>deviating >> >>from main purpose of DMA where ideally we should get the interrupt at >> >>the >> >>end >> >>of transfer. This is the main reason for going for this patch. >> > >> >If the client queues 1 descriptor or 5 descriptors, it doesn't matter that >> >much. >> >The client knows when it is done by waiting for the descriptors to >> >complete. It >> >is less efficient than grouping them all, but it should still work. >> > >> Yes. client will wait for final descriptor completion. But these >> interrupts >> will be overhead for CPU. For 1-2 page it won't matter much I guess it >> will >> be >> significant for complete chip read/write(during boot and FS i.e JFFS >> operations). >> >> >> >>With the submitted patch, only 1 interrupt per channel is required for >> >>complete NAND page and it solves the setting of BAM specific flags also. >> >>Only issue with this patch is adding new API in DMA framework itself. >> >>But >> >>this API can be used by other DMA engines in future for which mapping >> >>cannot >> >>be done with available APIs and if this mapping is vendor specific. >> > >> >I guess the key point in all of this is that the DMA operation being done >> >is not >> >a normal data flow to/from the device. It's direct remote register access >> >to >> >the device using address information contained in the sgl. And you are >> >collating the standard data access along with the special command access >> >in the >> >same API call. >> Yes. Normally DMA engine (QCA ADM DMA engine also) allows to >> read/write >> memory mapped >> registers just like data. But BAM is different (Since it is not a >> global DMA >> Engine >> and coupled with peripheral). Also, this different flag requirement is >> not >> just >> for command descriptors but for data descriptors also. >> >> BAM data access and command access differs only with flag and register >> read/write >> list. The register read and write list will be simply array of >> struct bam_cmd_element added in patch >> struct bam_cmd_element { >> __le32 addr:24; >> __le32 command:8; >> __le32 data; >> __le32 mask; >> __le32 reserved; >> }; >> >> The address and size of the array will be passed in data and size >> field of >> SGL. >> If we want to form the SGL for mentioned list then we will have SGL of >> size >> 15 >> with just one descriptor. >> >> Now we require different flag for each SG entry. currently SG does not >> have >> flag parameter and we can't add flag parameter just for our >> requirement in >> generic SG. So we have added the custom mapping function and passed >> modified >> SG >> as parameter which is generic SG and flag. > > I really think that we need some additional API that allows for the > flag munging > for the descriptors instead of overriding the prep_slave_sg. We > already needed > to change the way the flags are passed anyway. And instead of building > up a > special sg list, the API should take a structure that has a 1:1 mapping > of the > flags to the descriptors. And you would call this API on your > descriptor before > issuing it. > > So build up the sglist. Call the prep_slave_sg. You get back a tx > descriptor > that underneath is a bam descriptor. Then call the API giving the > descriptor > and the structure that defines the flags for the descriptors. Then > submit the > descriptor. > > Something like: > int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx, > u16 *flags) > { > struct bam_async_desc async_desc = container_of(tx, > struct bam_async_desc, > vd.tx); > int i; > > for (i = 0; i < async_desc->num_desc; i++) > async_desc->desc[i].flags = cpu_to_le16(flags[i]); > } > > EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags) > We want to tightly couple the SG and its flag. But if this 1:1 mapping is acceptable in linux kernel then we can go ahead with this. It will solve our requirement and does not require any change in Linux DMA API. I will do the same and will submit the new patches. > This applies the flags directly to the underlying hardware descriptors. > The > prep_slave_sg call would need to remove all the flag munging. The > bam_start_dma > would need to account for this as well by only setting the INT flag if > the > transfer cannot get all of the descriptors in the FIFO. It seems no major change is required in prep_slave_sg or bam_start_dma since it is just setting INT flag for last entry which is required for QPIC drivers also. We need to change the assignment of flag with bitwise OR assignment for last BAM desc in function bam_start_dma /* set any special flags on the last descriptor */ if (async_desc->num_desc == async_desc->xfer_len) desc[async_desc->xfer_len - 1].flags |= cpu_to_le16(async_desc->flags); > > Regards, > > Andy
On Mon, Jan 02, 2017 at 07:55:37PM +0530, Abhishek Sahu wrote: > On 2016-12-29 23:24, Andy Gross wrote: > >On Thu, Dec 22, 2016 at 01:04:37AM +0530, Abhishek Sahu wrote: > >>On 2016-12-21 01:55, Andy Gross wrote: > >>>On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote: > >>> > >>><snip> > >>> > >>>>>>Okay, do you have pointer on that one, will avoid asking the same > >>>>>>questions > >>>>>>again. > >>>>> > >>>>>I'll see if I can find the correspondance and send to you directly. > >>>>> > >>>>>> > >>>>>>> Ahbishek, correct me where i am wrong on the following: > >>>>>>> So two main differences between a normal descriptor and a command descriptor: > >>>>>>> 1) size of the descriptor > >>>>>>> 2) the flag setting > >>>>>>> 3) data sent in is a modified scatter gather that includes flags , vs a normal > >>>>>>> scatter gather > >>>> > >>>>Top level descriptor is same for both. Only difference is Command flag. > >>>>The > >>>>command descriptor will contain list of register read/write instead of > >>>>data > >>>>address > >>>>The peripheral driver can form the list with helper function provided in > >>>>patch 5 > >>>>and submit it to BAM. The main issue is for other flag like EOT/NWD. > >>>> > >>>>The top level descriptor is again in the form of list where BAM writes > >>>>the > >>>>address of the list in register before starting of transfer. In this > >>>>list, > >>>>each element will have different flags. > >>> > >>>Ah that's right. The command descriptor information is the format of the > >>>data > >>>pointed to by the sgl. So you'd have some set of register read/writes > >>>described > >>>in those entries. > >>> > >>>> > >>>>>>> > >>>>>>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume > >>>>>>> they all have CMD flag set. Do the current users of the command descriptors > >>>>>>> coalesce all of their requests into a big list? > >>>>>>> > >>>> > >>>>The main user for command descriptor is currently QPIC NAND/LCD. The > >>>>NAND > >>>>uses > >>>>3 BAM channels- tx, rx and command. NAND controller do the data transfer > >>>>in > >>>>chunk of codeword (maximum 544 bytes). NAND chip does the data transfer > >>>>on > >>>>page basis so each page read/write can have multiple codewords. The NAND > >>>>driver prepares command, tx(write) or rx(read) descriptor for complete > >>>>page > >>>>, submit it to BAM and wait for its completion. So NAND driver coalesces > >>>>all of their requests into a big list. In this big list, > >>>> > >>>>1. Some of the request for command channel requires NWD flag to be set. > >>> > >>>I'd expect this to occur at the end of a chain. So if you had 5 CMD > >>>descriptors > >>>described in the SGL, only the last descriptor would have the NWD set. > >>>Correct? > >>> > >>>>2. TX request depends upon the setting of EOT flag so some of the TX > >>>>request > >>>> in complete page write will contain EOT flag and others will not. So > >>>>this > >>>> custom mapping will be used for data descriptor also in NAND driver. > >>> > >>>Can you give a sequence description of the descriptors and flags? I > >>>haven't > >>>seen the NAND documentation that describes the sequence/flow. > >> > >>Following is the sample list of command descriptor for page write(2K > >>page). > >>The actual list will contain more no of descriptor which involves > >>spare area transfer also. > >> > >>Index INT NWD CMD 24bit Register Address > >>0 - - 1 0x0000F0 (EBI2_ECC_BUF_CFG) > >> > >>1 - - 1 0x000020 (NAND_DEVn_CFG0) > >> 0x000024 (NAND_DEVn_CFG1) > >> 0x0000F0 (EBI2_ECC_BUF_CFG) > >> 0x00000C (NAND_FLASH_CHIP_SELECT) > >> > >>2 - - 1 0x000004 (NAND_ADDR0) > >> 0x000008 (NAND_ADDR1) > >> > >>3 - 1 1 0x000000 (NAND_FLASH_CMD) > >> > >>4 - 1 1 0x000010 (NANDC_EXEC_CMD) > >> > >>5 - - 1 0x000014 (NAND_FLASH_STATUS) > >> > >>6 - 1 1 0x000000 (NAND_FLASH_CMD) > >> > >>7 - 1 1 0x000010 (NANDC_EXEC_CMD) > >> > >>8 - - 1 0x000014 (NAND_FLASH_STATUS) > >> > >>9 - 1 1 0x000000 (NAND_FLASH_CMD) > >> > >>10 - 1 1 0x000010 (NANDC_EXEC_CMD) > >> > >>11 - - 1 0x000014 (NAND_FLASH_STATUS) > >> > >>12 - 1 1 0x000000 (NAND_FLASH_CMD) > >> > >>13 - 1 1 0x000010 (NANDC_EXEC_CMD) > >> > >>14 1 - 1 0x000014 (NAND_FLASH_STATUS) > >> > >>15 - - 1 0x000044 (NAND_FLASH_READ_STATUS) > >> 0x000014 (NAND_FLASH_STATUS) > > > >Yeah I was expecting something like: > >- Setup NAND controller using some command writes (indices 0-4) > > Loop doing the following until all the data is done: > > - Send/Receive the Data > > - Check status. > > > >The only one that sticks out to me is index 14. Is the INT flag there to > >mark > >the actual end of the data transfer from the device? Then you do one more > >Status read. > > > This is sample list given in NAND document. INT will be set only for the > last > command. I checked the NAND driver in which status will be read only once > for > each codeword. > >>> > >>>>>>> So a couple of thoughts on how to deal with this: > >>>>>>> > >>>>>>> 1) Define a virtual channel for the command descriptors vs a normal DMA > >>>>>>> transaction. This lets you use the same hardware channel, but lets you discern > >>>>>>> which descriptor format you need to use. The only issue I see with this is the > >>>>>>> required change in device tree binding to target the right type of channel (cmd > >>>>>>> vs normal). > >>>>>> > >>>>>>Or mark the descriptor is cmd and write accordingly... > >>>>> > >>>>>The only issue i see with that is knowing how much to pre-allocate during > >>>>>the > >>>>>prep call. The flag set API would be called on the allocated tx > >>>>>descriptor. So > >>>>>you'd have to know up front and be able to specify it. > >>>>> > >>>>>> > >>>>>>> > >>>>>>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis. > >>>>>>> > >>>>>>> 3) If you have a set of transactions described by an sgl that has disparate use > >>>>>>> of flags, you split the list and use a separate transaction. In other words, we > >>>>>>> need to enforce that the flag set API will be applied to all descriptors > >>>>>>> described by an sgl. This means that the whole transaction may be comprised of > >>>>>>> multiple async TX descriptors. > >>>> > >>>>Each async TX descriptor will generate the BAM interrupt so we are > >>>>deviating > >>>>from main purpose of DMA where ideally we should get the interrupt at > >>>>the > >>>>end > >>>>of transfer. This is the main reason for going for this patch. > >>> > >>>If the client queues 1 descriptor or 5 descriptors, it doesn't matter that > >>>much. > >>>The client knows when it is done by waiting for the descriptors to > >>>complete. It > >>>is less efficient than grouping them all, but it should still work. > >>> > >>Yes. client will wait for final descriptor completion. But these > >>interrupts > >>will be overhead for CPU. For 1-2 page it won't matter much I guess it > >>will > >>be > >>significant for complete chip read/write(during boot and FS i.e JFFS > >>operations). > >>>> > >>>>With the submitted patch, only 1 interrupt per channel is required for > >>>>complete NAND page and it solves the setting of BAM specific flags also. > >>>>Only issue with this patch is adding new API in DMA framework itself. > >>>>But > >>>>this API can be used by other DMA engines in future for which mapping > >>>>cannot > >>>>be done with available APIs and if this mapping is vendor specific. > >>> > >>>I guess the key point in all of this is that the DMA operation being done > >>>is not > >>>a normal data flow to/from the device. It's direct remote register access > >>>to > >>>the device using address information contained in the sgl. And you are > >>>collating the standard data access along with the special command access > >>>in the > >>>same API call. > >>Yes. Normally DMA engine (QCA ADM DMA engine also) allows to read/write > >>memory mapped > >>registers just like data. But BAM is different (Since it is not a global > >>DMA > >>Engine > >>and coupled with peripheral). Also, this different flag requirement is > >>not > >>just > >>for command descriptors but for data descriptors also. > >> > >>BAM data access and command access differs only with flag and register > >>read/write > >>list. The register read and write list will be simply array of > >>struct bam_cmd_element added in patch > >>struct bam_cmd_element { > >> __le32 addr:24; > >> __le32 command:8; > >> __le32 data; > >> __le32 mask; > >> __le32 reserved; > >>}; > >> > >>The address and size of the array will be passed in data and size field > >>of > >>SGL. > >>If we want to form the SGL for mentioned list then we will have SGL of > >>size > >>15 > >>with just one descriptor. > >> > >>Now we require different flag for each SG entry. currently SG does not > >>have > >>flag parameter and we can't add flag parameter just for our requirement > >>in > >>generic SG. So we have added the custom mapping function and passed > >>modified > >>SG > >>as parameter which is generic SG and flag. > > > >I really think that we need some additional API that allows for the flag > >munging > >for the descriptors instead of overriding the prep_slave_sg. We already > >needed > >to change the way the flags are passed anyway. And instead of building up > >a > >special sg list, the API should take a structure that has a 1:1 mapping of > >the > >flags to the descriptors. And you would call this API on your descriptor > >before > >issuing it. > > > >So build up the sglist. Call the prep_slave_sg. You get back a tx > >descriptor > >that underneath is a bam descriptor. Then call the API giving the > >descriptor > >and the structure that defines the flags for the descriptors. Then submit > >the > >descriptor. > > > >Something like: > >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx, > > u16 *flags) > >{ > > struct bam_async_desc async_desc = container_of(tx, > > struct bam_async_desc, > > vd.tx); > > int i; > > > > for (i = 0; i < async_desc->num_desc; i++) > > async_desc->desc[i].flags = cpu_to_le16(flags[i]); > >} > > > >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags) > > > > We want to tightly couple the SG and its flag. But if this 1:1 mapping is > acceptable > in linux kernel then we can go ahead with this. It will solve our > requirement and > does not require any change in Linux DMA API. I will do the same and will > submit the > new patches. Thanks. Hopefully Vinod will be ok with the SoC specific EXPORT. > > >This applies the flags directly to the underlying hardware descriptors. > >The > >prep_slave_sg call would need to remove all the flag munging. The > >bam_start_dma > >would need to account for this as well by only setting the INT flag if the > >transfer cannot get all of the descriptors in the FIFO. > > It seems no major change is required in prep_slave_sg or bam_start_dma since > it is just setting INT flag for last entry which is required for QPIC > drivers > also. We need to change the assignment of flag with bitwise OR assignment > for > last BAM desc in function bam_start_dma > > /* set any special flags on the last descriptor */ > if (async_desc->num_desc == async_desc->xfer_len) > desc[async_desc->xfer_len - 1].flags |= > cpu_to_le16(async_desc->flags); Right. That can be done in the start_dma directly. That's the only function that can really determine when the INT flag will be set (other than the last descriptor). Regards, Andy -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 02, 2017 at 10:12:33AM -0600, Andy Gross wrote: > On Mon, Jan 02, 2017 at 07:55:37PM +0530, Abhishek Sahu wrote: > > >>>>>>> So a couple of thoughts on how to deal with this: > > >>>>>>> > > >>>>>>> 1) Define a virtual channel for the command descriptors vs a normal DMA > > >>>>>>> transaction. This lets you use the same hardware channel, but lets you discern > > >>>>>>> which descriptor format you need to use. The only issue I see with this is the > > >>>>>>> required change in device tree binding to target the right type of channel (cmd > > >>>>>>> vs normal). > > >>>>>> > > >>>>>>Or mark the descriptor is cmd and write accordingly... > > >>>>> > > >>>>>The only issue i see with that is knowing how much to pre-allocate during > > >>>>>the > > >>>>>prep call. The flag set API would be called on the allocated tx > > >>>>>descriptor. So > > >>>>>you'd have to know up front and be able to specify it. > > >>>>> > > >>>>>> > > >>>>>>> > > >>>>>>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis. > > >>>>>>> > > >>>>>>> 3) If you have a set of transactions described by an sgl that has disparate use > > >>>>>>> of flags, you split the list and use a separate transaction. In other words, we > > >>>>>>> need to enforce that the flag set API will be applied to all descriptors > > >>>>>>> described by an sgl. This means that the whole transaction may be comprised of > > >>>>>>> multiple async TX descriptors. > > >>>> > > >>>>Each async TX descriptor will generate the BAM interrupt so we are > > >>>>deviating > > >>>>from main purpose of DMA where ideally we should get the interrupt at > > >>>>the > > >>>>end > > >>>>of transfer. This is the main reason for going for this patch. > > >>> > > >>>If the client queues 1 descriptor or 5 descriptors, it doesn't matter that > > >>>much. > > >>>The client knows when it is done by waiting for the descriptors to > > >>>complete. It > > >>>is less efficient than grouping them all, but it should still work. > > >>> > > >>Yes. client will wait for final descriptor completion. But these > > >>interrupts > > >>will be overhead for CPU. For 1-2 page it won't matter much I guess it > > >>will > > >>be > > >>significant for complete chip read/write(during boot and FS i.e JFFS > > >>operations). > > >>>> > > >>>>With the submitted patch, only 1 interrupt per channel is required for > > >>>>complete NAND page and it solves the setting of BAM specific flags also. > > >>>>Only issue with this patch is adding new API in DMA framework itself. > > >>>>But > > >>>>this API can be used by other DMA engines in future for which mapping > > >>>>cannot > > >>>>be done with available APIs and if this mapping is vendor specific. > > >>> > > >>>I guess the key point in all of this is that the DMA operation being done > > >>>is not > > >>>a normal data flow to/from the device. It's direct remote register access > > >>>to > > >>>the device using address information contained in the sgl. And you are > > >>>collating the standard data access along with the special command access > > >>>in the > > >>>same API call. > > >>Yes. Normally DMA engine (QCA ADM DMA engine also) allows to read/write > > >>memory mapped > > >>registers just like data. But BAM is different (Since it is not a global > > >>DMA > > >>Engine > > >>and coupled with peripheral). Also, this different flag requirement is > > >>not > > >>just > > >>for command descriptors but for data descriptors also. > > >> > > >>BAM data access and command access differs only with flag and register > > >>read/write > > >>list. The register read and write list will be simply array of > > >>struct bam_cmd_element added in patch > > >>struct bam_cmd_element { > > >> __le32 addr:24; > > >> __le32 command:8; > > >> __le32 data; > > >> __le32 mask; > > >> __le32 reserved; > > >>}; > > >> > > >>The address and size of the array will be passed in data and size field > > >>of > > >>SGL. > > >>If we want to form the SGL for mentioned list then we will have SGL of > > >>size > > >>15 > > >>with just one descriptor. > > >> > > >>Now we require different flag for each SG entry. currently SG does not > > >>have > > >>flag parameter and we can't add flag parameter just for our requirement > > >>in > > >>generic SG. So we have added the custom mapping function and passed > > >>modified > > >>SG > > >>as parameter which is generic SG and flag. > > > > > >I really think that we need some additional API that allows for the flag > > >munging > > >for the descriptors instead of overriding the prep_slave_sg. We already > > >needed > > >to change the way the flags are passed anyway. And instead of building up > > >a > > >special sg list, the API should take a structure that has a 1:1 mapping of > > >the > > >flags to the descriptors. And you would call this API on your descriptor > > >before > > >issuing it. Munging wont be a good idea, but for some of the cases current flags can be used, and if need be, we can add additional flags > > > > > >So build up the sglist. Call the prep_slave_sg. You get back a tx > > >descriptor > > >that underneath is a bam descriptor. Then call the API giving the > > >descriptor > > >and the structure that defines the flags for the descriptors. Then submit > > >the > > >descriptor. > > > > > >Something like: > > >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx, > > > u16 *flags) > > >{ > > > struct bam_async_desc async_desc = container_of(tx, > > > struct bam_async_desc, > > > vd.tx); > > > int i; > > > > > > for (i = 0; i < async_desc->num_desc; i++) > > > async_desc->desc[i].flags = cpu_to_le16(flags[i]); > > >} > > > > > >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags) This makes it bam specific and causes issues if we want to use this code in generic libs, but yes this is an option but should be last resort.
On Thu, Jan 19, 2017 at 10:31:50AM +0530, Vinod Koul wrote: <snip> > > > > > > > >I really think that we need some additional API that allows for the flag > > > >munging > > > >for the descriptors instead of overriding the prep_slave_sg. We already > > > >needed > > > >to change the way the flags are passed anyway. And instead of building up > > > >a > > > >special sg list, the API should take a structure that has a 1:1 mapping of > > > >the > > > >flags to the descriptors. And you would call this API on your descriptor > > > >before > > > >issuing it. > > Munging wont be a good idea, but for some of the cases current flags can be > used, and if need be, we can add additional flags Is adding flags a possibility? I tried to match up BAM flags to ones that made sense that were currently defined, but adding a CMD flag would be kind of odd. It was kind of a stretch to use the PREP_FENCE for the notify when done flag. > > > > > > > >So build up the sglist. Call the prep_slave_sg. You get back a tx > > > >descriptor > > > >that underneath is a bam descriptor. Then call the API giving the > > > >descriptor > > > >and the structure that defines the flags for the descriptors. Then submit > > > >the > > > >descriptor. > > > > > > > >Something like: > > > >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx, > > > > u16 *flags) > > > >{ > > > > struct bam_async_desc async_desc = container_of(tx, > > > > struct bam_async_desc, > > > > vd.tx); > > > > int i; > > > > > > > > for (i = 0; i < async_desc->num_desc; i++) > > > > async_desc->desc[i].flags = cpu_to_le16(flags[i]); > > > >} > > > > > > > >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags) > > This makes it bam specific and causes issues if we want to use this code in > generic libs, but yes this is an option but should be last resort. If adding flags is a possibility (which it didn't seem to be in the past), that would make things much easier. Regards, Andy -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2017-01-19 19:43, Andy Gross wrote: > On Thu, Jan 19, 2017 at 10:31:50AM +0530, Vinod Koul wrote: > > <snip> > >> > > > >> > > >I really think that we need some additional API that allows for the flag >> > > >munging >> > > >for the descriptors instead of overriding the prep_slave_sg. We already >> > > >needed >> > > >to change the way the flags are passed anyway. And instead of building up >> > > >a >> > > >special sg list, the API should take a structure that has a 1:1 mapping of >> > > >the >> > > >flags to the descriptors. And you would call this API on your descriptor >> > > >before >> > > >issuing it. >> >> Munging wont be a good idea, but for some of the cases current flags >> can be >> used, and if need be, we can add additional flags > > Is adding flags a possibility? I tried to match up BAM flags to ones > that made > sense that were currently defined, but adding a CMD flag would be kind > of odd. > > It was kind of a stretch to use the PREP_FENCE for the notify when done > flag. > >> > > > >> > > >So build up the sglist. Call the prep_slave_sg. You get back a tx >> > > >descriptor >> > > >that underneath is a bam descriptor. Then call the API giving the >> > > >descriptor >> > > >and the structure that defines the flags for the descriptors. Then submit >> > > >the >> > > >descriptor. >> > > > >> > > >Something like: >> > > >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx, >> > > > u16 *flags) >> > > >{ >> > > > struct bam_async_desc async_desc = container_of(tx, >> > > > struct bam_async_desc, >> > > > vd.tx); >> > > > int i; >> > > > >> > > > for (i = 0; i < async_desc->num_desc; i++) >> > > > async_desc->desc[i].flags = cpu_to_le16(flags[i]); >> > > >} >> > > > >> > > >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags) >> >> This makes it bam specific and causes issues if we want to use this >> code in >> generic libs, but yes this is an option but should be last resort. > > If adding flags is a possibility (which it didn't seem to be in the > past), that > would make things much easier. Also, Main reason for this approach was to set different flags for each BAM descriptor present in a TX descriptor. > > Regards, > > Andy
On Thu, Jan 19, 2017 at 08:13:17AM -0600, Andy Gross wrote: > On Thu, Jan 19, 2017 at 10:31:50AM +0530, Vinod Koul wrote: > > <snip> > > > > > > > > > > >I really think that we need some additional API that allows for the flag > > > > >munging > > > > >for the descriptors instead of overriding the prep_slave_sg. We already > > > > >needed > > > > >to change the way the flags are passed anyway. And instead of building up > > > > >a > > > > >special sg list, the API should take a structure that has a 1:1 mapping of > > > > >the > > > > >flags to the descriptors. And you would call this API on your descriptor > > > > >before > > > > >issuing it. > > > > Munging wont be a good idea, but for some of the cases current flags can be > > used, and if need be, we can add additional flags > > Is adding flags a possibility? I tried to match up BAM flags to ones that made > sense that were currently defined, but adding a CMD flag would be kind of odd. Matching flags is a good idea wherever they match, overriding is not :) > It was kind of a stretch to use the PREP_FENCE for the notify when done flag. For that, we should use PREP_INTERUPT. DMAengine should assert interrupt only when this flag is set and continue to next transaction. > > > > > > > > > >So build up the sglist. Call the prep_slave_sg. You get back a tx > > > > >descriptor > > > > >that underneath is a bam descriptor. Then call the API giving the > > > > >descriptor > > > > >and the structure that defines the flags for the descriptors. Then submit > > > > >the > > > > >descriptor. > > > > > > > > > >Something like: > > > > >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx, > > > > > u16 *flags) > > > > >{ > > > > > struct bam_async_desc async_desc = container_of(tx, > > > > > struct bam_async_desc, > > > > > vd.tx); > > > > > int i; > > > > > > > > > > for (i = 0; i < async_desc->num_desc; i++) > > > > > async_desc->desc[i].flags = cpu_to_le16(flags[i]); > > > > >} > > > > > > > > > >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags) > > > > This makes it bam specific and causes issues if we want to use this code in > > generic libs, but yes this is an option but should be last resort. > > If adding flags is a possibility (which it didn't seem to be in the past), that > would make things much easier. Yes if we can describe them generically then yes. So with this and resuing existing flags without overriding, how many flags do we need..
On 2017-01-20 22:26, Vinod Koul wrote: > On Thu, Jan 19, 2017 at 08:13:17AM -0600, Andy Gross wrote: >> On Thu, Jan 19, 2017 at 10:31:50AM +0530, Vinod Koul wrote: >> <snip> >> > This makes it bam specific and causes issues if we want to use this code in >> > generic libs, but yes this is an option but should be last resort. >> >> If adding flags is a possibility (which it didn't seem to be in the >> past), that >> would make things much easier. > > Yes if we can describe them generically then yes. So with this and > resuing > existing flags without overriding, how many flags do we need.. Extremely Sorry for delayed response. I couldn't get time to work on this. Summarizing the original issue and suggestion mentioned in this mail thread. ISSUE 1: Using the BAM specific command flag CMD (Command) - allows the SW to create descriptors of type Command which does not generate any data transmissions but configures registers in the Peripheral (write operations, and read registers operations). If we are going to add this as a part of new flag then we require 1 new flags (DMA_PREP_CMD). ISSUE 2: Setting per SG flag Currently peripheral driver calls dmaengine_prep_slave_sg with flag as parameter. Some of the peripherals (like NAND) requires different flag to be set in for each SG. SOLUTION 1: The original patch adds prep_dma_custom_mapping in the generic DMA engine. The peripheral driver will pass the custom data in this function and DMA engine driver will form the descriptor according to its own logic. In future, this API can be used by any other DMA engine drivers also which are unable to do DMA mapping with already available API’s. Drawback: Requires adding additional API in DMA framework which uses void pointer. SOLUTION 2: Define and export BAM specific custom API that allows for the flag munging for the descriptors instead of overriding the prep_slave_sg. The API should take a structure that has a 1:1 mapping of the flags to the descriptors and this API will be called before submitting the descriptors. Drawback: This makes it BAM specific and causes issues if we want to use this code in generic libs. SOLUTION 3: Add CMD flags in generic DMA engine, split the list and call prep_slave_sg multiple times. Drawback: This flag is BAM specific and it can’t be used in other drivers without overriding. Also, each prep_slave_sg will generate the BAM hardware interrupt and impact the performance. I did some experiments and here are the results with linux mtd_speedtest: - With solution 1 and 2, - 2 interrupts will be generated per NAND page read/write for 2K page - The mtd read speed obtained is 8685 KiB/s - With solution 3, - 8 interrupts will be generated per NAND page read/write for 2K page - The mtd read speed 7419 KiB/s which increases boot time and decrease the File System KPI's
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index cc535a4..6324c1f 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -692,6 +692,8 @@ struct dma_filter { * be called after period_len bytes have been transferred. * @device_prep_interleaved_dma: Transfer expression in a generic way. * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address + * @device_prep_dma_custom_mapping: prepares a dma operation from dma driver + * specific custom data * @device_config: Pushes a new configuration to a channel, return 0 or an error * code * @device_pause: Pauses any transfer happening on a channel. Returns @@ -783,6 +785,9 @@ struct dma_device { struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)( struct dma_chan *chan, dma_addr_t dst, u64 data, unsigned long flags); + struct dma_async_tx_descriptor *(*device_prep_dma_custom_mapping)( + struct dma_chan *chan, void *data, + unsigned long flags); int (*device_config)(struct dma_chan *chan, struct dma_slave_config *config);
The current DMA APIs only support SGL or data in generic format. The QCA BAM DMA engine data cannot be mapped with already available APIs due to following reasons. 1. The QCA BAM DMA engine uses custom flags which cannot be mapped with generic DMA engine flags. 2. Some peripheral driver like QCA QPIC NAND/LCD requires to set specific flags (Like NWD, EOT) for some of the descriptors in scatter gather list. The already available mapping APIs take flags parameter in API itself and there is no support for passing DMA specific flags for each SGL entry. Now this patch adds the support for making the DMA descriptor from custom data with new DMA mapping function prep_dma_custom_mapping. The peripheral driver will pass the custom data in this function and DMA engine driver will form the descriptor according to its own logic. In future, this API can be used by any other DMA engine drivers also which are unable to do DMA mapping with already available API’s. Signed-off-by: Abhishek Sahu <absahu@codeaurora.org> --- include/linux/dmaengine.h | 5 +++++ 1 file changed, 5 insertions(+)