diff mbox

cleanup: Add 'struct dev' in the TTM layer to be passed in for DMA API calls.

Message ID 20110322143137.GA25113@dumpdata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Konrad Rzeszutek Wilk March 22, 2011, 2:31 p.m. UTC
None

Comments

Thomas Hellström (VMware) April 8, 2011, 2:57 p.m. UTC | #1
Konrad,

Sorry for waiting so long to answer. Workload is quite heavy ATM.
Please see inline.


On 03/31/2011 05:49 PM, Konrad Rzeszutek Wilk wrote:
>>> I can start this next week if you guys are comfortable with this idea.
>>>
>>>
>>>        
>> Konrad,
>>
>> 1) A couple of questions first. Where are the memory pools going to
>> end up in this design. Could you draft an API? How is page
>> accounting going to be taken care of? How do we differentiate
>> between running on bare metal and running on a hypervisor?
>>      
> My thought was that the memory pool's wouldn't be affected. Instead
> of all of the calls to alloc_page/__free_page (and dma_alloc_coherent/
> dma_free_coherent) would go through this API calls.
>
> What I thought off are three phases:
>
>   1). Get in the patch that passed in 'struct dev' to the dma_alloc_coherent
>    for 2.6.39 so that PowerPC folks can use the it with radeon cards. My
>    understanding is that the work you plan on to isn't going in 2.6.39
>    but rather in 2.6.40 - and if get my stuff ready (the other phases)
>    we can work out the kinks together. This way also the 'struct dev'
>    is passed in the TTM layer.
>    

I'm not happy with this solution. If something goes in, it should be 
complete, otherwise future work need to worry about not breaking 
something that's already broken. Also it adds things to TTM api's that 
are not really necessary.


I'd like to see a solution that  encapsulates all device-dependent stuff 
(including the dma adresses) in the ttm backend, so the TTM backend code 
is the only code that needs to worry about device dependent stuff. Core 
ttm should only need to worry about whether pages can be transferrable 
to other devices, and whether pages can be inserted into the page cache.

This change should be pretty straightforward. We move the ttm::pages 
array into the backend, and add ttm backend functions to allocate pages 
and to free pages. The backend is then completely free to keep track of 
page types and dma addresses completely hidden from core ttm, and we 
don't need to shuffle those around. This opens up both for completely 
device-private coherent memory and for "dummy device" coherent memory.

In the future, when TTM needs to move a ttm to another device, or when 
it needs to insert pages into the page cache, pages that are device 
specific will be copied and then freed. "Dummy device" pages can be 
transferred to other devices, but not inserted into the page cache.

/Thomas
Thomas Hellström (VMware) April 8, 2011, 2:58 p.m. UTC | #2
On 04/08/2011 04:57 PM, Thomas Hellstrom wrote:
> Konrad,
>
> Sorry for waiting so long to answer. Workload is quite heavy ATM.
> Please see inline.
>
>
> On 03/31/2011 05:49 PM, Konrad Rzeszutek Wilk wrote:
>>>> I can start this next week if you guys are comfortable with this idea.
>>>>
>>>>
>>> Konrad,
>>>
>>> 1) A couple of questions first. Where are the memory pools going to
>>> end up in this design. Could you draft an API? How is page
>>> accounting going to be taken care of? How do we differentiate
>>> between running on bare metal and running on a hypervisor?
>> My thought was that the memory pool's wouldn't be affected. Instead
>> of all of the calls to alloc_page/__free_page (and dma_alloc_coherent/
>> dma_free_coherent) would go through this API calls.
>>
>> What I thought off are three phases:
>>
>>   1). Get in the patch that passed in 'struct dev' to the 
>> dma_alloc_coherent
>>    for 2.6.39 so that PowerPC folks can use the it with radeon cards. My
>>    understanding is that the work you plan on to isn't going in 2.6.39
>>    but rather in 2.6.40 - and if get my stuff ready (the other phases)
>>    we can work out the kinks together. This way also the 'struct dev'
>>    is passed in the TTM layer.
>
> I'm not happy with this solution. If something goes in, it should be 
> complete, otherwise future work need to worry about not breaking 
> something that's already broken. Also it adds things to TTM api's that 
> are not really necessary.
>
>
> I'd like to see a solution that  encapsulates all device-dependent 
> stuff (including the dma adresses) in the ttm backend, so the TTM 
> backend code is the only code that needs to worry about device 
> dependent stuff. Core ttm should only need to worry about whether 
> pages can be transferrable to other devices, and whether pages can be 
> inserted into the page cache.
>
> This change should be pretty straightforward. We move the ttm::pages 
> array into the backend, and add ttm backend functions to allocate 
> pages and to free pages. The backend is then completely free to keep 
> track of page types and dma addresses completely hidden from core ttm, 
> and we don't need to shuffle those around. This opens up both for 
> completely device-private coherent memory and for "dummy device" 
> coherent memory.
>
> In the future, when TTM needs to move a ttm to another device, or when 
> it needs to insert pages into the page cache, pages that are device 
> specific will be copied and then freed. "Dummy device" pages can be 
> transferred to other devices, but not inserted into the page cache.
>
> /Thomas
>
>
Oh, I forgot, I'll be on vacation for a week with limited possibilities 
to read mail, but after that I can prototype the ttm backend api changes 
if necessary.

/Thomas
Konrad Rzeszutek Wilk April 8, 2011, 3:12 p.m. UTC | #3
On Fri, Apr 08, 2011 at 04:57:14PM +0200, Thomas Hellstrom wrote:
> Konrad,
> 
> Sorry for waiting so long to answer. Workload is quite heavy ATM.
> Please see inline.

OK. Thank you for taking a look... some questions before you
depart on vacation.

> >  1). Get in the patch that passed in 'struct dev' to the dma_alloc_coherent
> >   for 2.6.39 so that PowerPC folks can use the it with radeon cards. My
> >   understanding is that the work you plan on to isn't going in 2.6.39
> >   but rather in 2.6.40 - and if get my stuff ready (the other phases)
> >   we can work out the kinks together. This way also the 'struct dev'
> >   is passed in the TTM layer.
> 
> I'm not happy with this solution. If something goes in, it should be
> complete, otherwise future work need to worry about not breaking
> something that's already broken. Also it adds things to TTM api's

<nods>
> that are not really necessary.
> 
> 
> I'd like to see a solution that  encapsulates all device-dependent
> stuff (including the dma adresses) in the ttm backend, so the TTM
> backend code is the only code that needs to worry about device

I am a bit confused here. The usual "ttm backend" refers to the
device specific hooks (so the radeon/nouveau/via driver), which
use this structure: ttm_backend_func

That is not what you are referring to right?
> dependent stuff. Core ttm should only need to worry about whether
> pages can be transferrable to other devices, and whether pages can
> be inserted into the page cache.

Ok. So the core ttm would need to know the 'struct dev' to figure
out what the criteria are for transferring the page (ie, it is
Ok for a 64-bit card to use a 32-bit card's pages, but not the other
way around)..

> 
> This change should be pretty straightforward. We move the ttm::pages
> array into the backend, and add ttm backend functions to allocate
> pages and to free pages. The backend is then completely free to keep
> track of page types and dma addresses completely hidden from core
> ttm, and we don't need to shuffle those around. This opens up both
> for completely device-private coherent memory and for "dummy device"
> coherent memory.

The 'dummy device' is a bit of hack thought? Why not get rid
of that idea and just squirrel away the the 'struct dev' and let the
ttm::backend figure out how to allocate the pages?

> 
> In the future, when TTM needs to move a ttm to another device, or
> when it needs to insert pages into the page cache, pages that are
> device specific will be copied and then freed. "Dummy device" pages
> can be transferred to other devices, but not inserted into the page
> cache.

OK. That would require some extra function in the ttm::backend to
say "dont_stick_this_in_page_cache".
Thomas Hellström (VMware) April 8, 2011, 3:29 p.m. UTC | #4
On 04/08/2011 05:12 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Apr 08, 2011 at 04:57:14PM +0200, Thomas Hellstrom wrote:
>    
>> Konrad,
>>
>> Sorry for waiting so long to answer. Workload is quite heavy ATM.
>> Please see inline.
>>      
> OK. Thank you for taking a look... some questions before you
> depart on vacation.
>
>    
>>>   1). Get in the patch that passed in 'struct dev' to the dma_alloc_coherent
>>>    for 2.6.39 so that PowerPC folks can use the it with radeon cards. My
>>>    understanding is that the work you plan on to isn't going in 2.6.39
>>>    but rather in 2.6.40 - and if get my stuff ready (the other phases)
>>>    we can work out the kinks together. This way also the 'struct dev'
>>>    is passed in the TTM layer.
>>>        
>> I'm not happy with this solution. If something goes in, it should be
>> complete, otherwise future work need to worry about not breaking
>> something that's already broken. Also it adds things to TTM api's
>>      
> <nods>
>    
>> that are not really necessary.
>>
>>
>> I'd like to see a solution that  encapsulates all device-dependent
>> stuff (including the dma adresses) in the ttm backend, so the TTM
>> backend code is the only code that needs to worry about device
>>      
> I am a bit confused here. The usual "ttm backend" refers to the
> device specific hooks (so the radeon/nouveau/via driver), which
> use this structure: ttm_backend_func
>
> That is not what you are referring to right?
>    

Yes, exactly.

>> dependent stuff. Core ttm should only need to worry about whether
>> pages can be transferrable to other devices, and whether pages can
>> be inserted into the page cache.
>>      
> Ok. So the core ttm would need to know the 'struct dev' to figure
> out what the criteria are for transferring the page (ie, it is
> Ok for a 64-bit card to use a 32-bit card's pages, but not the other
> way around)..
>    

So the idea would be to have "ttm_backend::populate" decide whether the 
current pages are compatible with the device or not, and copy if that's 
the case.
Usually the pages are allocated by the backend itself and should be 
compatible, but the populate check would trigger if pages were 
transferred from another device. This case happens when the destination 
device has special requirements, and needs to be implemented in all 
backends when we start transfer TTMs between devices. Here we can use 
struct dev or something similar as a page compatibility identifier.

The other case is where the source device has special requirements, for 
example when the source device pages can't be inserted into the swap 
cache (This is the case you are referring to above). Core TTM does only 
need to know whether the pages are "normal pages" or not, and does not 
need to know about struct dev. Hence, the backend needs a query 
function, but not until we actually implement direct swap cache insertions.

So none of this stuff needs to be implemented now, and we can always 
hide struct dev in the backend.

>    
>> This change should be pretty straightforward. We move the ttm::pages
>> array into the backend, and add ttm backend functions to allocate
>> pages and to free pages. The backend is then completely free to keep
>> track of page types and dma addresses completely hidden from core
>> ttm, and we don't need to shuffle those around. This opens up both
>> for completely device-private coherent memory and for "dummy device"
>> coherent memory.
>>      
> The 'dummy device' is a bit of hack thought? Why not get rid
> of that idea and just squirrel away the the 'struct dev' and let the
> ttm::backend figure out how to allocate the pages?
>    

Yes it's a hack. The advantage of a dummy device is that pages will be 
movable across backends that share the same dummy device. For example 
between a radeon and a nouveau driver on a Xen platform.


>    
>> In the future, when TTM needs to move a ttm to another device, or
>> when it needs to insert pages into the page cache, pages that are
>> device specific will be copied and then freed. "Dummy device" pages
>> can be transferred to other devices, but not inserted into the page
>> cache.
>>      
> OK. That would require some extra function in the ttm::backend to
> say "dont_stick_this_in_page_cache".
>    

Correct. We should use the ttm backend query function discussed above, 
and enumerate queries.

Thanks
Thomas
diff mbox

Patch

diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index efed082..1986761 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -158,9 +158,14 @@  enum ttm_caching_state {
  * memory.
  */
 
+struct ttm_tt_page {
+	struct page *page;
+	dma_addr_t *dma_addr;
+	struct dev *dev;
+}
 struct ttm_tt {
 	struct page *dummy_read_page;
-	struct page **pages;
+	struct ttm_tt_page **pages;
 	long first_himem_page;
 	long last_lomem_page;
 	uint32_t page_flags;
@@ -176,7 +181,6 @@  struct ttm_tt {
 		tt_unbound,
 		tt_unpopulated,
 	} state;
-	dma_addr_t *dma_address;
 };
 
 #define TTM_MEMTYPE_FLAG_FIXED         (1 << 0)	/* Fixed (on-card) PCI memory */