Message ID | 20180214015008.9513-3-dongwon.kim@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Feb 13, 2018 at 05:50:01PM -0800, Dongwon Kim wrote: > Reference document for hyper_DMABUF driver > > Documentation/hyper-dmabuf-sharing.txt This should likely be patch 1 in order for reviewers to have the appropriate context. > > Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> > --- > Documentation/hyper-dmabuf-sharing.txt | 734 +++++++++++++++++++++++++++++++++ > 1 file changed, 734 insertions(+) > create mode 100644 Documentation/hyper-dmabuf-sharing.txt > > diff --git a/Documentation/hyper-dmabuf-sharing.txt b/Documentation/hyper-dmabuf-sharing.txt > new file mode 100644 > index 000000000000..928e411931e3 > --- /dev/null > +++ b/Documentation/hyper-dmabuf-sharing.txt > @@ -0,0 +1,734 @@ > +Linux Hyper DMABUF Driver > + > +------------------------------------------------------------------------------ > +Section 1. Overview > +------------------------------------------------------------------------------ > + > +Hyper_DMABUF driver is a Linux device driver running on multiple Virtual > +achines (VMs), which expands DMA-BUF sharing capability to the VM environment > +where multiple different OS instances need to share same physical data without > +data-copy across VMs. > + > +To share a DMA_BUF across VMs, an instance of the Hyper_DMABUF drv on the > +exporting VM (so called, “exporter”) imports a local DMA_BUF from the original > +producer of the buffer, The usage of export and import in the above sentence makes it almost impossible to understand. > then re-exports it with an unique ID, hyper_dmabuf_id > +for the buffer to the importing VM (so called, “importer”). And this is even worse. Maybe it would help to have some kind of flow diagram of all this import/export operations, but please read below. > + > +Another instance of the Hyper_DMABUF driver on importer registers > +a hyper_dmabuf_id together with reference information for the shared physical > +pages associated with the DMA_BUF to its database when the export happens. > + > +The actual mapping of the DMA_BUF on the importer’s side is done by > +the Hyper_DMABUF driver when user space issues the IOCTL command to access > +the shared DMA_BUF. The Hyper_DMABUF driver works as both an importing and > +exporting driver as is, that is, no special configuration is required. > +Consequently, only a single module per VM is needed to enable cross-VM DMA_BUF > +exchange. IMHO I need a more generic view of the problem you are trying to solve in the overview section. I've read the full overview, and I still have no idea why you need all this. I think the overview should contain at least: 1. A description of the problem you are trying to solve. 2. A high level description of the proposed solution. 3. How the proposed solution deals with the problem described in 1. This overview is not useful for people that don't know which problem you are trying to solve, like myself. Thanks, Roger.
Thanks for your comment, Roger I will try to polish this doc and resubmit. (I put some comments below as well.) On Fri, Feb 23, 2018 at 04:15:00PM +0000, Roger Pau Monné wrote: > On Tue, Feb 13, 2018 at 05:50:01PM -0800, Dongwon Kim wrote: > > Reference document for hyper_DMABUF driver > > > > Documentation/hyper-dmabuf-sharing.txt > > This should likely be patch 1 in order for reviewers to have the > appropriate context. > > > > > Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> > > --- > > Documentation/hyper-dmabuf-sharing.txt | 734 +++++++++++++++++++++++++++++++++ > > 1 file changed, 734 insertions(+) > > create mode 100644 Documentation/hyper-dmabuf-sharing.txt > > > > diff --git a/Documentation/hyper-dmabuf-sharing.txt b/Documentation/hyper-dmabuf-sharing.txt > > new file mode 100644 > > index 000000000000..928e411931e3 > > --- /dev/null > > +++ b/Documentation/hyper-dmabuf-sharing.txt > > @@ -0,0 +1,734 @@ > > +Linux Hyper DMABUF Driver > > + > > +------------------------------------------------------------------------------ > > +Section 1. Overview > > +------------------------------------------------------------------------------ > > + > > +Hyper_DMABUF driver is a Linux device driver running on multiple Virtual > > +achines (VMs), which expands DMA-BUF sharing capability to the VM environment > > +where multiple different OS instances need to share same physical data without > > +data-copy across VMs. > > + > > +To share a DMA_BUF across VMs, an instance of the Hyper_DMABUF drv on the > > +exporting VM (so called, “exporter”) imports a local DMA_BUF from the original > > +producer of the buffer, > > The usage of export and import in the above sentence makes it almost > impossible to understand. Ok, it looks confusing. I think the problem is that those words are used for both local and cross-VMs cases. I will try to clarify those. > > > then re-exports it with an unique ID, hyper_dmabuf_id > > +for the buffer to the importing VM (so called, “importer”). > > And this is even worse. > > Maybe it would help to have some kind of flow diagram of all this > import/export operations, but please read below. I will add a diagram here. > > > + > > +Another instance of the Hyper_DMABUF driver on importer registers > > +a hyper_dmabuf_id together with reference information for the shared physical > > +pages associated with the DMA_BUF to its database when the export happens. > > + > > +The actual mapping of the DMA_BUF on the importer’s side is done by > > +the Hyper_DMABUF driver when user space issues the IOCTL command to access > > +the shared DMA_BUF. The Hyper_DMABUF driver works as both an importing and > > +exporting driver as is, that is, no special configuration is required. > > +Consequently, only a single module per VM is needed to enable cross-VM DMA_BUF > > +exchange. > > IMHO I need a more generic view of the problem you are trying to solve > in the overview section. I've read the full overview, and I still have > no idea why you need all this. I will add some more paragrahs here to give some more generic view (and possibly diagrams) of this driver. > > I think the overview should contain at least: > > 1. A description of the problem you are trying to solve. > 2. A high level description of the proposed solution. > 3. How the proposed solution deals with the problem described in 1. > > This overview is not useful for people that don't know which problem > you are trying to solve, like myself. Thanks again. > > Thanks, Roger.
Sorry for top-posting Can we have all this go into some header file which will not only describe the structures/commands/responses/etc, but will also allow drivers to use those directly without defining the same one more time in the code? For example, this is how it is done in Xen [1]. This way, you can keep documentation and the protocol implementation in sync easily On 02/14/2018 03:50 AM, Dongwon Kim wrote: > Reference document for hyper_DMABUF driver > > Documentation/hyper-dmabuf-sharing.txt > > Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> > --- > Documentation/hyper-dmabuf-sharing.txt | 734 +++++++++++++++++++++++++++++++++ > 1 file changed, 734 insertions(+) > create mode 100644 Documentation/hyper-dmabuf-sharing.txt > > diff --git a/Documentation/hyper-dmabuf-sharing.txt b/Documentation/hyper-dmabuf-sharing.txt > new file mode 100644 > index 000000000000..928e411931e3 > --- /dev/null > +++ b/Documentation/hyper-dmabuf-sharing.txt > @@ -0,0 +1,734 @@ > +Linux Hyper DMABUF Driver > + > +------------------------------------------------------------------------------ > +Section 1. Overview > +------------------------------------------------------------------------------ > + > +Hyper_DMABUF driver is a Linux device driver running on multiple Virtual > +achines (VMs), which expands DMA-BUF sharing capability to the VM environment > +where multiple different OS instances need to share same physical data without > +data-copy across VMs. > + > +To share a DMA_BUF across VMs, an instance of the Hyper_DMABUF drv on the > +exporting VM (so called, “exporter”) imports a local DMA_BUF from the original > +producer of the buffer, then re-exports it with an unique ID, hyper_dmabuf_id > +for the buffer to the importing VM (so called, “importer”). > + > +Another instance of the Hyper_DMABUF driver on importer registers > +a hyper_dmabuf_id together with reference information for the shared physical > +pages associated with the DMA_BUF to its database when the export happens. > + > +The actual mapping of the DMA_BUF on the importer’s side is done by > +the Hyper_DMABUF driver when user space issues the IOCTL command to access > +the shared DMA_BUF. The Hyper_DMABUF driver works as both an importing and > +exporting driver as is, that is, no special configuration is required. > +Consequently, only a single module per VM is needed to enable cross-VM DMA_BUF > +exchange. > + > +------------------------------------------------------------------------------ > +Section 2. Architecture > +------------------------------------------------------------------------------ > + > +1. Hyper_DMABUF ID > + > +hyper_dmabuf_id is a global handle for shared DMA BUFs, which is compatible > +across VMs. It is a key used by the importer to retrieve information about > +shared Kernel pages behind the DMA_BUF structure from the IMPORT list. When > +a DMA_BUF is exported to another domain, its hyper_dmabuf_id and META data > +are also kept in the EXPORT list by the exporter for further synchronization > +of control over the DMA_BUF. > + > +hyper_dmabuf_id is “targeted”, meaning it is valid only in exporting (owner of > +the buffer) and importing VMs, where the corresponding hyper_dmabuf_id is > +stored in their database (EXPORT and IMPORT lists). > + > +A user-space application specifies the targeted VM id in the user parameter > +when it calls the IOCTL command to export shared DMA_BUF to another VM. > + > +hyper_dmabuf_id_t is a data type for hyper_dmabuf_id. It is defined as 16-byte > +data structure, and it contains id and rng_key[3] as elements for > +the structure. > + > +typedef struct { > + int id; > + int rng_key[3]; /* 12bytes long random number */ > +} hyper_dmabuf_id_t; > + > +The first element in the hyper_dmabuf_id structure, int id is combined data of > +a count number generated by the driver running on the exporter and > +the exporter’s ID. The VM’s ID is a one byte value and located at the field’s > +SB in int id. The remaining three bytes in int id are reserved for a count > +number. > + > +However, there is a limit related to this count number, which is 1000. > +Therefore, only little more than a byte starting from the LSB is actually used > +for storing this count number. > + > +#define HYPER_DMABUF_ID_CREATE(domid, id) \ > + ((((domid) & 0xFF) << 24) | ((id) & 0xFFFFFF)) > + > +This limit on the count number directly means the maximum number of DMA BUFs > +that can be shared simultaneously by one VM. The second element of > +hyper_dmabuf_id, that is int rng_key[3], is an array of three integers. These > +numbers are generated by Linux’s native random number generation mechanism. > +This field is added to enhance the security of the Hyper DMABUF driver by > +maximizing the entropy of hyper_dmabuf_id (that is, preventing it from being > +guessed by a security attacker). > + > +Once DMA_BUF is no longer shared, the hyper_dmabuf_id associated with > +the DMA_BUF is released, but the count number in hyper_dmabuf_id is saved in > +the ID list for reuse. However, random keys stored in int rng_key[3] are not > +reused. Instead, those keys are always filled with freshly generated random > +keys for security. > + > +2. IOCTLs > + > +a. IOCTL_HYPER_DMABUF_TX_CH_SETUP > + > +This type of IOCTL is used for initialization of a one-directional transmit > +communication channel with a remote domain. > + > +The user space argument for this type of IOCTL is defined as: > + > +struct ioctl_hyper_dmabuf_tx_ch_setup { > + /* IN parameters */ > + /* Remote domain id */ > + int remote_domain; > +}; > + > +b. IOCTL_HYPER_DMABUF_RX_CH_SETUP > + > +This type of IOCTL is used for initialization of a one-directional receive > +communication channel with a remote domain. > + > +The user space argument for this type of IOCTL is defined as: > + > +struct ioctl_hyper_dmabuf_rx_ch_setup { > + /* IN parameters */ > + /* Source domain id */ > + int source_domain; > +}; > + > +c. IOCTL_HYPER_DMABUF_EXPORT_REMOTE > + > +This type of IOCTL is used to export a DMA BUF to another VM. When a user > +space application makes this call to the driver, it extracts Kernel pages > +associated with the DMA_BUF, then makes those shared with the importing VM. > + > +All reference information for this shared pages and hyper_dmabuf_id is > +created, then passed to the importing domain through a communications > +channel for synchronous registration. In the meantime, the hyper_dmabuf_id > +for the shared DMA_BUF is also returned to user-space application. > + > +This IOCTL can accept a reference to “user-defined” data as well as a FD > +for the DMA BUF. This private data is then attached to the DMA BUF and > +exported together with it. > + > +More details regarding this private data can be found in chapter for > +“Hyper_DMABUF Private Data”. > + > +The user space argument for this type of IOCTL is defined as: > + > +struct ioctl_hyper_dmabuf_export_remote { > + /* IN parameters */ > + /* DMA buf fd to be exported */ > + int dmabuf_fd; > + /* Domain id to which buffer should be exported */ > + int remote_domain; > + /* exported dma buf id */ > + hyper_dmabuf_id_t hid; > + /* size of private data */ > + int sz_priv; > + /* ptr to the private data for Hyper_DMABUF */ > + char *priv; > +}; > + > +d. IOCTL_HYPER_DMABUF_EXPORT_FD > + > +The importing VM uses this IOCTL to import and re-export a shared DMA_BUF > +locally to the end-consumer using the standard Linux DMA_BUF framework. > +Upon IOCTL call, the Hyper_DMABUF driver finds the reference information > +of the shared DMA_BUF with the given hyper_dmabuf_id, then maps all shared > +pages in its own Kernel space. The driver then constructs a scatter-gather > +list with those mapped pages and creates a brand-new DMA_BUF with the list, > +which is eventually exported with a file descriptor to the local consumer. > + > +The user space argument for this type of IOCTL is defined as: > + > +struct ioctl_hyper_dmabuf_export_fd { > + /* IN parameters */ > + /* hyper dmabuf id to be imported */ > + int hyper_dmabuf_id; > + /* flags */ > + int flags; > + /* OUT parameters */ > + /* exported dma buf fd */ > + int fd; > +}; > + > +e. IOCTL_HYPER_DMABUF_UNEXPORT > + > +This type of IOCTL is used when it is necessary to terminate the current > +sharing of a DMA_BUF. When called, the driver first checks if there are any > +consumers actively using the DMA_BUF. Then, it unexports it if it is not > +mapped or used by any consumers. Otherwise, it postpones unexporting, but > +makes the buffer invalid to prevent any further import of the same DMA_BUF. > +DMA_BUF is completely unexported after the last consumer releases it. > + > +”Unexport” means removing all reference information about the DMA_BUF from the > +LISTs and make all pages private again. > + > +The user space argument for this type of IOCTL is defined as: > + > +struct ioctl_hyper_dmabuf_unexport { > + /* IN parameters */ > + /* hyper dmabuf id to be unexported */ > + int hyper_dmabuf_id; > + /* delay in ms by which unexport processing will be postponed */ > + int delay_ms; > + /* OUT parameters */ > + /* Status of request */ > + int status; > +}; > + > +f. IOCTL_HYPER_DMABUF_QUERY > + > +This IOCTL is used to retrieve specific information about a DMA_BUF that > +is being shared. > + > +The user space argument for this type of IOCTL is defined as: > + > +struct ioctl_hyper_dmabuf_query { > + /* in parameters */ > + /* hyper dmabuf id to be queried */ > + int hyper_dmabuf_id; > + /* item to be queried */ > + int item; > + /* OUT parameters */ > + /* output of query */ > + /* info can be either value or reference */ > + unsigned long info; > +}; > + > +<Available Queries> > + > +HYPER_DMABUF_QUERY_TYPE > + - Return the type of DMA_BUF from the current domain, Exported or Imported. > + > +HYPER_DMABUF_QUERY_EXPORTER > + - Return the exporting domain’s ID of a shared DMA_BUF. > + > +HYPER_DMABUF_QUERY_IMPORTER > + - Return the importing domain’s ID of a shared DMA_BUF. > + > +HYPER_DMABUF_QUERY_SIZE > + - Return the size of a shared DMA_BUF in bytes. > + > +HYPER_DMABUF_QUERY_BUSY > + - Return ‘true’ if a shared DMA_BUF is currently used > + (mapped by the end-consumer). > + > +HYPER_DMABUF_QUERY_UNEXPORTED > + - Return ‘true’ if a shared DMA_BUF is not valid anymore > + (so it does not allow a new consumer to map it). > + > +HYPER_DMABUF_QUERY_DELAYED_UNEXPORTED > + - Return ‘true’ if a shared DMA_BUF is scheduled to be unexported > + (but is still valid) within a fixed time. > + > +HYPER_DMABUF_QUERY_PRIV_INFO > + - Return ‘private’ data attached to shared DMA_BUF to the user space. > + ‘unsigned long info’ is the user space pointer for the buffer, where > + private data will be copied to. > + > +HYPER_DMABUF_QUERY_PRIV_INFO_SIZE > + - Return the size of the private data attached to the shared DMA_BUF. > + > +3. Event Polling > + > +Event-polling can be enabled optionally by selecting the Kernel config option, > +Enable event-generation and polling operation under xen/hypervisor in Kernel’s > +menuconfig. The event-polling mechanism includes the generation of > +an import-event, adding it to the event-queue and providing a notification to > +the application so that it can retrieve the event data from the queue. > + > +For this mechanism, “Poll” and “Read” operations are added to the Hyper_DMABUF > +driver. A user application that polls the driver goes into a sleep state until > +there is a new event added to the queue. An application uses “Read” to retrieve > +event data from the event queue. Event data contains the hyper_dmabuf_id and > +the private data of the buffer that has been received by the importer. > + > +For more information on private data, refer to Section 3.5). > +Using this method, it is possible to lower the risk of the hyper_dmabuf_id and > +other sensitive information about the shared buffer (for example, meta-data > +for shared images) being leaked while being transferred to the importer because > +all of this data is shared as “private info” at the driver level. However, > +please note there should be a way for the importer to find the correct DMA_BUF > +in this case when there are multiple Hyper_DMABUFs being shared simultaneously. > +For example, the surface name or the surface ID of a specific rendering surface > +needs to be sent to the importer in advance before it is exported in a surface- > +sharing use-case. > + > +Each event data given to the user-space consists of a header and the private > +information of the buffer. The data type is defined as follows: > + > +struct hyper_dmabuf_event_hdr { > + int event_type; /* one type only for now - new import */ > + hyper_dmabuf_id_t hid; /* hyper_dmabuf_id of specific hyper_dmabuf */ > + int size; /* size of data */ > +}; > + > +struct hyper_dmabuf_event_data { > + struct hyper_dmabuf_event_hdr hdr; > + void *data; /* private data */ > +}; > + > +4. Hyper_DMABUF Private Data > + > +Each Hyper_DMABUF can come with private data, the size of which can be up to > +AX_SIZE_PRIV_DATA (currently 192 byte). This private data is just a chunk of > +plain data attached to every Hyper_DMABUF. It is guaranteed to be synchronized > +across VMs, exporter and importer. This private data does not have any specific > +structure defined at the driver level, so any “user-defined” format or > +structure can be used. In addition, there is no dedicated use-case for this > +data. It can be used virtually for any purpose. For example, it can be used to > +share meta-data such as dimension and color formats for shared images in > +a surface sharing model. Another example is when we share protected media > +contents. > + > +This private data can be used to transfer flags related to content protection > +information on streamed media to the importer. > + > +Private data is initially generated when a buffer is exported for the first > +time. Then, it is updated whenever the same buffer is re-exported. During the > +re-exporting process, the Hyper_DMABUF driver only updates private data on > +both sides with new data from user-space since the same buffer already exists > +on both the IMPORT LIST and EXPORT LIST. > + > +There are two different ways to retrieve this private data from user-space. > +The first way is to use “Read” on the Hyper_DMABUF driver. “Read” returns the > +data of events containing private data of the buffer. The second way is to > +make a query to Hyper_DMABUF. There are two query items, > +HYPER_DMABUF_QUERY_PRIV_INFO and HYPER_DMABUF_QUERY_PRIV_INFO_SIZE available > +for retrieving private data and its size. > + > +5. Scatter-Gather List Table (SGT) Management > + > +SGT management is the core part of the Hyper_DMABUF driver that manages an > +SGT, a representation of the group of kernel pages associated with a DMA_BUF. > +This block includes four different sub-blocks: > + > +a. Hyper_DMABUF_id Manager > + > +This ID manager is responsible for generating a hyper_dmabuf_id for an > +exported DMA_BUF. When an ID is requested, the ID Manager first checks if > +there are any reusable IDs left in the list and returns one of those, > +if available. Otherwise, it creates the next count number and returns it > +to the caller. > + > +b. SGT Creator > + > +The SGT (struct sg_table) contains information about the DMA_BUF such as > +references to all kernel pages for the buffer and their connections. The > +SGT Creator creates a new SGT on the importer side with pages shared by > +the hypervisor. > + > +c. Kernel Page Extractor > + > +The Page Extractor extracts pages from a given SGT before those pages > +are shared. > + > +d. List Manager Interface > + > +The SGT manger also interacts with export and import list managers. It > +sends out information (for example, hyper_dmabuf_id, reference, and > +DMA_BUF information) about the exported or imported DMA_BUFs to the > +list manager. Also, on IOCTL request, it asks the list manager to find > +and return the information for a corresponding DMA_BUF in the list. > + > +6. DMA-BUF Interface > + > +The DMA-BUF interface provides standard methods to manage DMA_BUFs > +reconstructed by the Hyper_DMABUF driver from shared pages. All of the > +relevant operations are listed in struct dma_buf_ops. These operations > +are standard DMA_BUF operations, therefore they follow standard DMA BUF > +protocols. > + > +Each DMA_BUF operation communicates with the exporter at the end of the > +routine for “indirect DMA_BUF synchronization”. > + > +7. Export/Import List Management > + > +Whenever a DMA_BUF is shared and exported, its information is added to the > +database (EXPORT-list) on the exporting VM. Similarly, information about an > +imported DMA_BUF is added to the importing database (IMPORT list) on the > +importing VM, when the export happens. > + > +All of the entries in the lists are needed to manage the exported/imported > +DMA_BUF more efficiently. Both lists are implemented as Linux hash tables. > +The key to the list is hyper_dmabuf_id and the output is the information of > +the DMA_BUF. The List Manager manages all requests from other blocks and > +transactions within lists to ensure that all entries are up-to-date and > +that the list structure is consistent. > + > +The List Manager provides basic functionality, such as: > + > +- Adding to the List > +- Removal from the List > +- Finding information about a DMA_BUF, given the hyper_dmabuf_id > + > +8. Page Sharing by Hypercalls > + > +The Hyper_DMABUF driver assumes that there is a native page-by-page memory > +sharing mechanism available on the hypervisor. Referencing a group of pages > +that are being shared is what the driver expects from “backend” APIs or the > +hypervisor itself. > + > +For the example, xen backend integrated in current code base utilizes Xen’s > +grant-table interface for sharing the underlying kernel pages (struct *page). > + > +More details about grant-table interface can be found at the following locations: > + > +https://wiki.xen.org/wiki/Grant_Table > +https://xenbits.xen.org/docs/4.6-testing/misc/grant-tables.txt > + > +9. Message Handling > + > +The exporter and importer can each create a message that consists of an opcode > +(command) and operands (parameters) and send it to each other. > + > +The message format is defined as: > + > +struct hyper_dmabuf_req { > + unsigned int req_id; /* Sequence number. Used for RING BUF > + synchronization */ > + unsigned int stat; /* Status.Response from receiver. */ > + unsigned int cmd; /* Opcode */ > + unsigned int op[MAX_NUMBER_OF_OPERANDS]; /* Operands */ > +}; > + > +The following table gives the list of opcodes: > + > +<Opcodes in Message to Exporter/Importer> > + > +HYPER_DMABUF_EXPORT (exporter --> importer) > + - Export a DMA_BUF to the importer. The importer registers the corresponding > + DMA_BUF in its IMPORT LIST when the message is received. > + > +HYPER_DMABUF_EXPORT_FD (importer --> exporter) > + - Locally exported as FD. The importer sends out this command to the exporter > + to notify that the buffer is now locally exported (mapped and used). > + > +HYPER_DMABUF_EXPORT_FD_FAILED (importer --> exporter) > + - Failed while exporting locally. The importer sends out this command to the > + exporter to notify the exporter that the EXPORT_FD failed. > + > +HYPER_DMABUF_NOTIFY_UNEXPORT (exporter --> importer) > + - Termination of sharing. The exporter notifies the importer that the DMA_BUF > + has been unexported. > + > +HYPER_DMABUF_OPS_TO_REMOTE (importer --> exporter) > + - Not implemented yet. > + > +HYPER_DMABUF_OPS_TO_SOURCE (exporter --> importer) > + - DMA_BUF ops to the exporter, for DMA_BUF upstream synchronization. > + Note: Implemented but it is done asynchronously due to performance issues. > + > +The following table shows the list of operands for each opcode. > + > +<Operands in Message to Exporter/Importer> > + > +- HYPER_DMABUF_EXPORT > + > +op0 to op3 – hyper_dmabuf_id > +op4 – number of pages to be shared > +op5 – offset of data in the first page > +op6 – length of data in the last page > +op7 – reference number for the group of shared pages > +op8 – size of private data > +op9 to (op9+op8) – private data > + > +- HYPER_DMABUF_EXPORT_FD > + > +op0 to op3 – hyper_dmabuf_id > + > +- HYPER_DMABUF_EXPORT_FD_FAILED > + > +op0 to op3 – hyper_dmabuf_id > + > +- HYPER_DMABUF_NOTIFY_UNEXPORT > + > +op0 to op3 – hyper_dmabuf_id > + > +- HYPER_DMABUF_OPS_TO_REMOTE(Not implemented) > + > +- HYPER_DMABUF_OPS_TO_SOURCE > + > +op0 to op3 – hyper_dmabuf_id > +op4 – type of DMA_BUF operation > + > +9. Inter VM (Domain) Communication > + > +Two different types of inter-domain communication channels are required, > +one in kernel space and the other in user space. The communication channel > +in user space is for transmitting or receiving the hyper_dmabuf_id. Since > +there is no specific security (for example, encryption) involved in the > +generation of a global id at the driver level, it is highly recommended that > +the customer’s user application set up a very secure channel for exchanging > +hyper_dmabuf_id between VMs. > + > +The communication channel in kernel space is required for exchanging messages > +from “message management” block between two VMs. In the current reference > +backend for Xen hypervisor, Xen ring-buffer and event-channel mechanisms are > +used for message exchange between impoter and exporter. > + > +10. What are required in hypervisor > + > +emory sharing and message communication between VMs > + > +------------------------------------------------------------------------------ > +Section 3. Hyper DMABUF Sharing Flow > +------------------------------------------------------------------------------ > + > +1. Exporting > + > +To export a DMA_BUF to another VM, user space has to call an IOCTL > +(IOCTL_HYPER_DMABUF_EXPORT_REMOTE) with a file descriptor for the buffer given > +by the original exporter. The Hyper_DMABUF driver maps a DMA_BUF locally, then > +issues a hyper_dmabuf_id and SGT for the DMA_BUF, which is registered to the > +EXPORT list. Then, all pages for the SGT are extracted and each individual > +page is shared via a hypervisor-specific memory sharing mechanism > +(for example, in Xen this is grant-table). > + > +One important requirement on this memory sharing method is that it needs to > +create a single integer value that represents the list of pages, which can > +then be used by the importer for retrieving the group of shared pages. For > +this, the “Backend” in the reference driver utilizes the multiple level > +addressing mechanism. > + > +Once the integer reference to the list of pages is created, the exporter > +builds the “export” command and sends it to the importer, then notifies the > +importer. > + > +2. Importing > + > +The Import process is divided into two sections. One is the registration > +of DMA_BUF from the exporter. The other is the actual mapping of the buffer > +before accessing the data in the buffer. The former (termed “Registration”) > +happens on an export event (that is, the export command with an interrupt) > +in the exporter. > + > +The latter (termed “Mapping”) is done asynchronously when the driver gets the > +IOCTL call from user space. When the importer gets an interrupt from the > +exporter, it checks the command in the receiving queue and if it is an > +“export” command, the registration process is started. It first finds > +hyper_dmabuf_id and the integer reference for the shared pages, then stores > +all of that information together with the “domain id” of the exporting domain > +in the IMPORT LIST. > + > +In the case where “event-polling” is enabled (Kernel Config - Enable event- > +generation and polling operation), a “new sharing available” event is > +generated right after the reference info for the new shared DMA_BUF is > +registered to the IMPORT LIST. This event is added to the event-queue. > + > +The user process that polls Hyper_DMABUF driver wakes up when this event-queue > +is not empty and is able to read back event data from the queue using the > +driver’s “Read” function. Once the user-application calls EXPORT_FD IOCTL with > +the proper parameters including hyper_dmabuf_id, the Hyper_DMABUF driver > +retrieves information about the matched DMA_BUF from the IMPORT LIST. Then, it > +maps all pages shared (referenced by the integer reference) in its kernel > +space and creates its own DMA_BUF referencing the same shared pages. After > +this, it exports this new DMA_BUF to the other drivers with a file descriptor. > +DMA_BUF can then be used just in the same way a local DMA_BUF is. > + > +3. Indirect Synchronization of DMA_BUF > + > +Synchronization of a DMA_BUF within a single OS is automatically achieved > +because all of importer’s DMA_BUF operations are done using functions defined > +on the exporter’s side, which means there is one central place that has full > +control over the DMA_BUF. In other words, any primary activities such as > +attaching/detaching and mapping/un-mapping are all captured by the exporter, > +meaning that the exporter knows basic information such as who is using the > +DMA_BUF and how it is being used. This, however, is not applicable if this > +sharing is done beyond a single OS because kernel space (where the exporter’s > +DMA_BUF operations reside) is simply not visible to the importing VM. > + > +Therefore, “indirect synchronization” was introduced as an alternative solution, > +which is now implemented in the Hyper_DMABUF driver. This technique makes > +the exporter create a shadow DMA_BUF when the end-consumer of the buffer maps > +the DMA_BUF, then duplicates any DMA_BUF operations performed on > +the importer’s side. Through this “indirect synchronization”, the exporter is > +able to virtually track all activities done by the consumer (mostly reference > +counter) as if those are done in exporter’s local system. > + > +------------------------------------------------------------------------------ > +Section 4. Hypervisor Backend Interface > +------------------------------------------------------------------------------ > + > +The Hyper_DMABUF driver has a standard “Backend” structure that contains > +mappings to various functions designed for a specific Hypervisor. Most of > +these API functions should provide a low-level implementation of communication > +and memory sharing capability that utilize a Hypervisor’s native mechanisms. > + > +struct hyper_dmabuf_backend_ops { > + /* retreiving id of current virtual machine */ > + int (*get_vm_id)(void); > + /* get pages shared via hypervisor-specific method */ > + int (*share_pages)(struct page **, int, int, void **); > + /* make shared pages unshared via hypervisor specific method */ > + int (*unshare_pages)(void **, int); > + /* map remotely shared pages on importer's side via > + * hypervisor-specific method > + */ > + struct page ** (*map_shared_pages)(int, int, int, void **); > + /* unmap and free shared pages on importer's side via > + * hypervisor-specific method > + */ > + int (*unmap_shared_pages)(void **, int); > + /* initialize communication environment */ > + int (*init_comm_env)(void); > + /* destroy communication channel */ > + void (*destroy_comm)(void); > + /* upstream ch setup (receiving and responding) */ > + int (*init_rx_ch)(int); > + /* downstream ch setup (transmitting and parsing responses) */ > + int (*init_tx_ch)(int); > + /* send msg via communication ch */ > + int (*send_req)(int, struct hyper_dmabuf_req *, int); > +}; > + > +<Hypervisor-specific Backend Structure> > + > +1. get_vm_id > + > + Returns the VM (domain) ID > + > + Input: > + > + -ID of the current domain > + > + Output: > + > + None > + > +2. share_pages > + > + Get pages shared via hypervisor-specific method and return one reference > + ID that represents the complete list of shared pages > + > + Input: > + > + -Array of pages > + -ID of importing VM > + -Number of pages > + -Hypervisor specific Representation of reference info of shared > + pages > + > + Output: > + > + -Hypervisor specific integer value that represents all of > + the shared pages > + > +3. unshare_pages > + > + Stop sharing pages > + > + Input: > + > + -Hypervisor specific Representation of reference info of shared > + pages > + -Number of shared pages > + > + Output: > + > + 0 > + > +4. map_shared_pages > + > + Map shared pages locally using a hypervisor-specific method > + > + Input: > + > + -Reference number that represents all of shared pages > + -ID of exporting VM, Number of pages > + -Reference information for any purpose > + > + Output: > + > + -An array of shared pages (struct page**) > + > +5. unmap_shared_pages > + > + Unmap shared pages > + > + Input: > + > + -Hypervisor specific Representation of reference info of shared pages > + > + Output: > + > + -0 (successful) or one of Standard Kernel errors > + > +6. init_comm_env > + > + Setup infrastructure needed for communication channel > + > + Input: > + > + None > + > + Output: > + > + None > + > +7. destroy_comm > + > + Cleanup everything done via init_comm_env > + > + Input: > + > + None > + > + Output: > + > + None > + > +8. init_rx_ch > + > + Configure receive channel > + > + Input: > + > + -ID of VM on the other side of the channel > + > + Output: > + > + -0 (successful) or one of Standard Kernel errors > + > +9. init_tx_ch > + > + Configure transmit channel > + > + Input: > + > + -ID of VM on the other side of the channel > + > + Output: > + > + -0 (success) or one of Standard Kernel errors > + > +10. send_req > + > + Send message to other VM > + > + Input: > + > + -ID of VM that receives the message > + -Message > + > + Output: > + > + -0 (success) or one of Standard Kernel errors > + > +------------------------------------------------------------------------------- > +------------------------------------------------------------------------------- > [1] https://elixir.bootlin.com/linux/v4.16.1/source/include/xen/interface/io/kbdif.h
diff --git a/Documentation/hyper-dmabuf-sharing.txt b/Documentation/hyper-dmabuf-sharing.txt new file mode 100644 index 000000000000..928e411931e3 --- /dev/null +++ b/Documentation/hyper-dmabuf-sharing.txt @@ -0,0 +1,734 @@ +Linux Hyper DMABUF Driver + +------------------------------------------------------------------------------ +Section 1. Overview +------------------------------------------------------------------------------ + +Hyper_DMABUF driver is a Linux device driver running on multiple Virtual +achines (VMs), which expands DMA-BUF sharing capability to the VM environment +where multiple different OS instances need to share same physical data without +data-copy across VMs. + +To share a DMA_BUF across VMs, an instance of the Hyper_DMABUF drv on the +exporting VM (so called, “exporter”) imports a local DMA_BUF from the original +producer of the buffer, then re-exports it with an unique ID, hyper_dmabuf_id +for the buffer to the importing VM (so called, “importer”). + +Another instance of the Hyper_DMABUF driver on importer registers +a hyper_dmabuf_id together with reference information for the shared physical +pages associated with the DMA_BUF to its database when the export happens. + +The actual mapping of the DMA_BUF on the importer’s side is done by +the Hyper_DMABUF driver when user space issues the IOCTL command to access +the shared DMA_BUF. The Hyper_DMABUF driver works as both an importing and +exporting driver as is, that is, no special configuration is required. +Consequently, only a single module per VM is needed to enable cross-VM DMA_BUF +exchange. + +------------------------------------------------------------------------------ +Section 2. Architecture +------------------------------------------------------------------------------ + +1. Hyper_DMABUF ID + +hyper_dmabuf_id is a global handle for shared DMA BUFs, which is compatible +across VMs. It is a key used by the importer to retrieve information about +shared Kernel pages behind the DMA_BUF structure from the IMPORT list. When +a DMA_BUF is exported to another domain, its hyper_dmabuf_id and META data +are also kept in the EXPORT list by the exporter for further synchronization +of control over the DMA_BUF. + +hyper_dmabuf_id is “targeted”, meaning it is valid only in exporting (owner of +the buffer) and importing VMs, where the corresponding hyper_dmabuf_id is +stored in their database (EXPORT and IMPORT lists). + +A user-space application specifies the targeted VM id in the user parameter +when it calls the IOCTL command to export shared DMA_BUF to another VM. + +hyper_dmabuf_id_t is a data type for hyper_dmabuf_id. It is defined as 16-byte +data structure, and it contains id and rng_key[3] as elements for +the structure. + +typedef struct { + int id; + int rng_key[3]; /* 12bytes long random number */ +} hyper_dmabuf_id_t; + +The first element in the hyper_dmabuf_id structure, int id is combined data of +a count number generated by the driver running on the exporter and +the exporter’s ID. The VM’s ID is a one byte value and located at the field’s +SB in int id. The remaining three bytes in int id are reserved for a count +number. + +However, there is a limit related to this count number, which is 1000. +Therefore, only little more than a byte starting from the LSB is actually used +for storing this count number. + +#define HYPER_DMABUF_ID_CREATE(domid, id) \ + ((((domid) & 0xFF) << 24) | ((id) & 0xFFFFFF)) + +This limit on the count number directly means the maximum number of DMA BUFs +that can be shared simultaneously by one VM. The second element of +hyper_dmabuf_id, that is int rng_key[3], is an array of three integers. These +numbers are generated by Linux’s native random number generation mechanism. +This field is added to enhance the security of the Hyper DMABUF driver by +maximizing the entropy of hyper_dmabuf_id (that is, preventing it from being +guessed by a security attacker). + +Once DMA_BUF is no longer shared, the hyper_dmabuf_id associated with +the DMA_BUF is released, but the count number in hyper_dmabuf_id is saved in +the ID list for reuse. However, random keys stored in int rng_key[3] are not +reused. Instead, those keys are always filled with freshly generated random +keys for security. + +2. IOCTLs + +a. IOCTL_HYPER_DMABUF_TX_CH_SETUP + +This type of IOCTL is used for initialization of a one-directional transmit +communication channel with a remote domain. + +The user space argument for this type of IOCTL is defined as: + +struct ioctl_hyper_dmabuf_tx_ch_setup { + /* IN parameters */ + /* Remote domain id */ + int remote_domain; +}; + +b. IOCTL_HYPER_DMABUF_RX_CH_SETUP + +This type of IOCTL is used for initialization of a one-directional receive +communication channel with a remote domain. + +The user space argument for this type of IOCTL is defined as: + +struct ioctl_hyper_dmabuf_rx_ch_setup { + /* IN parameters */ + /* Source domain id */ + int source_domain; +}; + +c. IOCTL_HYPER_DMABUF_EXPORT_REMOTE + +This type of IOCTL is used to export a DMA BUF to another VM. When a user +space application makes this call to the driver, it extracts Kernel pages +associated with the DMA_BUF, then makes those shared with the importing VM. + +All reference information for this shared pages and hyper_dmabuf_id is +created, then passed to the importing domain through a communications +channel for synchronous registration. In the meantime, the hyper_dmabuf_id +for the shared DMA_BUF is also returned to user-space application. + +This IOCTL can accept a reference to “user-defined” data as well as a FD +for the DMA BUF. This private data is then attached to the DMA BUF and +exported together with it. + +More details regarding this private data can be found in chapter for +“Hyper_DMABUF Private Data”. + +The user space argument for this type of IOCTL is defined as: + +struct ioctl_hyper_dmabuf_export_remote { + /* IN parameters */ + /* DMA buf fd to be exported */ + int dmabuf_fd; + /* Domain id to which buffer should be exported */ + int remote_domain; + /* exported dma buf id */ + hyper_dmabuf_id_t hid; + /* size of private data */ + int sz_priv; + /* ptr to the private data for Hyper_DMABUF */ + char *priv; +}; + +d. IOCTL_HYPER_DMABUF_EXPORT_FD + +The importing VM uses this IOCTL to import and re-export a shared DMA_BUF +locally to the end-consumer using the standard Linux DMA_BUF framework. +Upon IOCTL call, the Hyper_DMABUF driver finds the reference information +of the shared DMA_BUF with the given hyper_dmabuf_id, then maps all shared +pages in its own Kernel space. The driver then constructs a scatter-gather +list with those mapped pages and creates a brand-new DMA_BUF with the list, +which is eventually exported with a file descriptor to the local consumer. + +The user space argument for this type of IOCTL is defined as: + +struct ioctl_hyper_dmabuf_export_fd { + /* IN parameters */ + /* hyper dmabuf id to be imported */ + int hyper_dmabuf_id; + /* flags */ + int flags; + /* OUT parameters */ + /* exported dma buf fd */ + int fd; +}; + +e. IOCTL_HYPER_DMABUF_UNEXPORT + +This type of IOCTL is used when it is necessary to terminate the current +sharing of a DMA_BUF. When called, the driver first checks if there are any +consumers actively using the DMA_BUF. Then, it unexports it if it is not +mapped or used by any consumers. Otherwise, it postpones unexporting, but +makes the buffer invalid to prevent any further import of the same DMA_BUF. +DMA_BUF is completely unexported after the last consumer releases it. + +”Unexport” means removing all reference information about the DMA_BUF from the +LISTs and make all pages private again. + +The user space argument for this type of IOCTL is defined as: + +struct ioctl_hyper_dmabuf_unexport { + /* IN parameters */ + /* hyper dmabuf id to be unexported */ + int hyper_dmabuf_id; + /* delay in ms by which unexport processing will be postponed */ + int delay_ms; + /* OUT parameters */ + /* Status of request */ + int status; +}; + +f. IOCTL_HYPER_DMABUF_QUERY + +This IOCTL is used to retrieve specific information about a DMA_BUF that +is being shared. + +The user space argument for this type of IOCTL is defined as: + +struct ioctl_hyper_dmabuf_query { + /* in parameters */ + /* hyper dmabuf id to be queried */ + int hyper_dmabuf_id; + /* item to be queried */ + int item; + /* OUT parameters */ + /* output of query */ + /* info can be either value or reference */ + unsigned long info; +}; + +<Available Queries> + +HYPER_DMABUF_QUERY_TYPE + - Return the type of DMA_BUF from the current domain, Exported or Imported. + +HYPER_DMABUF_QUERY_EXPORTER + - Return the exporting domain’s ID of a shared DMA_BUF. + +HYPER_DMABUF_QUERY_IMPORTER + - Return the importing domain’s ID of a shared DMA_BUF. + +HYPER_DMABUF_QUERY_SIZE + - Return the size of a shared DMA_BUF in bytes. + +HYPER_DMABUF_QUERY_BUSY + - Return ‘true’ if a shared DMA_BUF is currently used + (mapped by the end-consumer). + +HYPER_DMABUF_QUERY_UNEXPORTED + - Return ‘true’ if a shared DMA_BUF is not valid anymore + (so it does not allow a new consumer to map it). + +HYPER_DMABUF_QUERY_DELAYED_UNEXPORTED + - Return ‘true’ if a shared DMA_BUF is scheduled to be unexported + (but is still valid) within a fixed time. + +HYPER_DMABUF_QUERY_PRIV_INFO + - Return ‘private’ data attached to shared DMA_BUF to the user space. + ‘unsigned long info’ is the user space pointer for the buffer, where + private data will be copied to. + +HYPER_DMABUF_QUERY_PRIV_INFO_SIZE + - Return the size of the private data attached to the shared DMA_BUF. + +3. Event Polling + +Event-polling can be enabled optionally by selecting the Kernel config option, +Enable event-generation and polling operation under xen/hypervisor in Kernel’s +menuconfig. The event-polling mechanism includes the generation of +an import-event, adding it to the event-queue and providing a notification to +the application so that it can retrieve the event data from the queue. + +For this mechanism, “Poll” and “Read” operations are added to the Hyper_DMABUF +driver. A user application that polls the driver goes into a sleep state until +there is a new event added to the queue. An application uses “Read” to retrieve +event data from the event queue. Event data contains the hyper_dmabuf_id and +the private data of the buffer that has been received by the importer. + +For more information on private data, refer to Section 3.5). +Using this method, it is possible to lower the risk of the hyper_dmabuf_id and +other sensitive information about the shared buffer (for example, meta-data +for shared images) being leaked while being transferred to the importer because +all of this data is shared as “private info” at the driver level. However, +please note there should be a way for the importer to find the correct DMA_BUF +in this case when there are multiple Hyper_DMABUFs being shared simultaneously. +For example, the surface name or the surface ID of a specific rendering surface +needs to be sent to the importer in advance before it is exported in a surface- +sharing use-case. + +Each event data given to the user-space consists of a header and the private +information of the buffer. The data type is defined as follows: + +struct hyper_dmabuf_event_hdr { + int event_type; /* one type only for now - new import */ + hyper_dmabuf_id_t hid; /* hyper_dmabuf_id of specific hyper_dmabuf */ + int size; /* size of data */ +}; + +struct hyper_dmabuf_event_data { + struct hyper_dmabuf_event_hdr hdr; + void *data; /* private data */ +}; + +4. Hyper_DMABUF Private Data + +Each Hyper_DMABUF can come with private data, the size of which can be up to +AX_SIZE_PRIV_DATA (currently 192 byte). This private data is just a chunk of +plain data attached to every Hyper_DMABUF. It is guaranteed to be synchronized +across VMs, exporter and importer. This private data does not have any specific +structure defined at the driver level, so any “user-defined” format or +structure can be used. In addition, there is no dedicated use-case for this +data. It can be used virtually for any purpose. For example, it can be used to +share meta-data such as dimension and color formats for shared images in +a surface sharing model. Another example is when we share protected media +contents. + +This private data can be used to transfer flags related to content protection +information on streamed media to the importer. + +Private data is initially generated when a buffer is exported for the first +time. Then, it is updated whenever the same buffer is re-exported. During the +re-exporting process, the Hyper_DMABUF driver only updates private data on +both sides with new data from user-space since the same buffer already exists +on both the IMPORT LIST and EXPORT LIST. + +There are two different ways to retrieve this private data from user-space. +The first way is to use “Read” on the Hyper_DMABUF driver. “Read” returns the +data of events containing private data of the buffer. The second way is to +make a query to Hyper_DMABUF. There are two query items, +HYPER_DMABUF_QUERY_PRIV_INFO and HYPER_DMABUF_QUERY_PRIV_INFO_SIZE available +for retrieving private data and its size. + +5. Scatter-Gather List Table (SGT) Management + +SGT management is the core part of the Hyper_DMABUF driver that manages an +SGT, a representation of the group of kernel pages associated with a DMA_BUF. +This block includes four different sub-blocks: + +a. Hyper_DMABUF_id Manager + +This ID manager is responsible for generating a hyper_dmabuf_id for an +exported DMA_BUF. When an ID is requested, the ID Manager first checks if +there are any reusable IDs left in the list and returns one of those, +if available. Otherwise, it creates the next count number and returns it +to the caller. + +b. SGT Creator + +The SGT (struct sg_table) contains information about the DMA_BUF such as +references to all kernel pages for the buffer and their connections. The +SGT Creator creates a new SGT on the importer side with pages shared by +the hypervisor. + +c. Kernel Page Extractor + +The Page Extractor extracts pages from a given SGT before those pages +are shared. + +d. List Manager Interface + +The SGT manger also interacts with export and import list managers. It +sends out information (for example, hyper_dmabuf_id, reference, and +DMA_BUF information) about the exported or imported DMA_BUFs to the +list manager. Also, on IOCTL request, it asks the list manager to find +and return the information for a corresponding DMA_BUF in the list. + +6. DMA-BUF Interface + +The DMA-BUF interface provides standard methods to manage DMA_BUFs +reconstructed by the Hyper_DMABUF driver from shared pages. All of the +relevant operations are listed in struct dma_buf_ops. These operations +are standard DMA_BUF operations, therefore they follow standard DMA BUF +protocols. + +Each DMA_BUF operation communicates with the exporter at the end of the +routine for “indirect DMA_BUF synchronization”. + +7. Export/Import List Management + +Whenever a DMA_BUF is shared and exported, its information is added to the +database (EXPORT-list) on the exporting VM. Similarly, information about an +imported DMA_BUF is added to the importing database (IMPORT list) on the +importing VM, when the export happens. + +All of the entries in the lists are needed to manage the exported/imported +DMA_BUF more efficiently. Both lists are implemented as Linux hash tables. +The key to the list is hyper_dmabuf_id and the output is the information of +the DMA_BUF. The List Manager manages all requests from other blocks and +transactions within lists to ensure that all entries are up-to-date and +that the list structure is consistent. + +The List Manager provides basic functionality, such as: + +- Adding to the List +- Removal from the List +- Finding information about a DMA_BUF, given the hyper_dmabuf_id + +8. Page Sharing by Hypercalls + +The Hyper_DMABUF driver assumes that there is a native page-by-page memory +sharing mechanism available on the hypervisor. Referencing a group of pages +that are being shared is what the driver expects from “backend” APIs or the +hypervisor itself. + +For the example, xen backend integrated in current code base utilizes Xen’s +grant-table interface for sharing the underlying kernel pages (struct *page). + +More details about grant-table interface can be found at the following locations: + +https://wiki.xen.org/wiki/Grant_Table +https://xenbits.xen.org/docs/4.6-testing/misc/grant-tables.txt + +9. Message Handling + +The exporter and importer can each create a message that consists of an opcode +(command) and operands (parameters) and send it to each other. + +The message format is defined as: + +struct hyper_dmabuf_req { + unsigned int req_id; /* Sequence number. Used for RING BUF + synchronization */ + unsigned int stat; /* Status.Response from receiver. */ + unsigned int cmd; /* Opcode */ + unsigned int op[MAX_NUMBER_OF_OPERANDS]; /* Operands */ +}; + +The following table gives the list of opcodes: + +<Opcodes in Message to Exporter/Importer> + +HYPER_DMABUF_EXPORT (exporter --> importer) + - Export a DMA_BUF to the importer. The importer registers the corresponding + DMA_BUF in its IMPORT LIST when the message is received. + +HYPER_DMABUF_EXPORT_FD (importer --> exporter) + - Locally exported as FD. The importer sends out this command to the exporter + to notify that the buffer is now locally exported (mapped and used). + +HYPER_DMABUF_EXPORT_FD_FAILED (importer --> exporter) + - Failed while exporting locally. The importer sends out this command to the + exporter to notify the exporter that the EXPORT_FD failed. + +HYPER_DMABUF_NOTIFY_UNEXPORT (exporter --> importer) + - Termination of sharing. The exporter notifies the importer that the DMA_BUF + has been unexported. + +HYPER_DMABUF_OPS_TO_REMOTE (importer --> exporter) + - Not implemented yet. + +HYPER_DMABUF_OPS_TO_SOURCE (exporter --> importer) + - DMA_BUF ops to the exporter, for DMA_BUF upstream synchronization. + Note: Implemented but it is done asynchronously due to performance issues. + +The following table shows the list of operands for each opcode. + +<Operands in Message to Exporter/Importer> + +- HYPER_DMABUF_EXPORT + +op0 to op3 – hyper_dmabuf_id +op4 – number of pages to be shared +op5 – offset of data in the first page +op6 – length of data in the last page +op7 – reference number for the group of shared pages +op8 – size of private data +op9 to (op9+op8) – private data + +- HYPER_DMABUF_EXPORT_FD + +op0 to op3 – hyper_dmabuf_id + +- HYPER_DMABUF_EXPORT_FD_FAILED + +op0 to op3 – hyper_dmabuf_id + +- HYPER_DMABUF_NOTIFY_UNEXPORT + +op0 to op3 – hyper_dmabuf_id + +- HYPER_DMABUF_OPS_TO_REMOTE(Not implemented) + +- HYPER_DMABUF_OPS_TO_SOURCE + +op0 to op3 – hyper_dmabuf_id +op4 – type of DMA_BUF operation + +9. Inter VM (Domain) Communication + +Two different types of inter-domain communication channels are required, +one in kernel space and the other in user space. The communication channel +in user space is for transmitting or receiving the hyper_dmabuf_id. Since +there is no specific security (for example, encryption) involved in the +generation of a global id at the driver level, it is highly recommended that +the customer’s user application set up a very secure channel for exchanging +hyper_dmabuf_id between VMs. + +The communication channel in kernel space is required for exchanging messages +from “message management” block between two VMs. In the current reference +backend for Xen hypervisor, Xen ring-buffer and event-channel mechanisms are +used for message exchange between impoter and exporter. + +10. What are required in hypervisor + +emory sharing and message communication between VMs + +------------------------------------------------------------------------------ +Section 3. Hyper DMABUF Sharing Flow +------------------------------------------------------------------------------ + +1. Exporting + +To export a DMA_BUF to another VM, user space has to call an IOCTL +(IOCTL_HYPER_DMABUF_EXPORT_REMOTE) with a file descriptor for the buffer given +by the original exporter. The Hyper_DMABUF driver maps a DMA_BUF locally, then +issues a hyper_dmabuf_id and SGT for the DMA_BUF, which is registered to the +EXPORT list. Then, all pages for the SGT are extracted and each individual +page is shared via a hypervisor-specific memory sharing mechanism +(for example, in Xen this is grant-table). + +One important requirement on this memory sharing method is that it needs to +create a single integer value that represents the list of pages, which can +then be used by the importer for retrieving the group of shared pages. For +this, the “Backend” in the reference driver utilizes the multiple level +addressing mechanism. + +Once the integer reference to the list of pages is created, the exporter +builds the “export” command and sends it to the importer, then notifies the +importer. + +2. Importing + +The Import process is divided into two sections. One is the registration +of DMA_BUF from the exporter. The other is the actual mapping of the buffer +before accessing the data in the buffer. The former (termed “Registration”) +happens on an export event (that is, the export command with an interrupt) +in the exporter. + +The latter (termed “Mapping”) is done asynchronously when the driver gets the +IOCTL call from user space. When the importer gets an interrupt from the +exporter, it checks the command in the receiving queue and if it is an +“export” command, the registration process is started. It first finds +hyper_dmabuf_id and the integer reference for the shared pages, then stores +all of that information together with the “domain id” of the exporting domain +in the IMPORT LIST. + +In the case where “event-polling” is enabled (Kernel Config - Enable event- +generation and polling operation), a “new sharing available” event is +generated right after the reference info for the new shared DMA_BUF is +registered to the IMPORT LIST. This event is added to the event-queue. + +The user process that polls Hyper_DMABUF driver wakes up when this event-queue +is not empty and is able to read back event data from the queue using the +driver’s “Read” function. Once the user-application calls EXPORT_FD IOCTL with +the proper parameters including hyper_dmabuf_id, the Hyper_DMABUF driver +retrieves information about the matched DMA_BUF from the IMPORT LIST. Then, it +maps all pages shared (referenced by the integer reference) in its kernel +space and creates its own DMA_BUF referencing the same shared pages. After +this, it exports this new DMA_BUF to the other drivers with a file descriptor. +DMA_BUF can then be used just in the same way a local DMA_BUF is. + +3. Indirect Synchronization of DMA_BUF + +Synchronization of a DMA_BUF within a single OS is automatically achieved +because all of importer’s DMA_BUF operations are done using functions defined +on the exporter’s side, which means there is one central place that has full +control over the DMA_BUF. In other words, any primary activities such as +attaching/detaching and mapping/un-mapping are all captured by the exporter, +meaning that the exporter knows basic information such as who is using the +DMA_BUF and how it is being used. This, however, is not applicable if this +sharing is done beyond a single OS because kernel space (where the exporter’s +DMA_BUF operations reside) is simply not visible to the importing VM. + +Therefore, “indirect synchronization” was introduced as an alternative solution, +which is now implemented in the Hyper_DMABUF driver. This technique makes +the exporter create a shadow DMA_BUF when the end-consumer of the buffer maps +the DMA_BUF, then duplicates any DMA_BUF operations performed on +the importer’s side. Through this “indirect synchronization”, the exporter is +able to virtually track all activities done by the consumer (mostly reference +counter) as if those are done in exporter’s local system. + +------------------------------------------------------------------------------ +Section 4. Hypervisor Backend Interface +------------------------------------------------------------------------------ + +The Hyper_DMABUF driver has a standard “Backend” structure that contains +mappings to various functions designed for a specific Hypervisor. Most of +these API functions should provide a low-level implementation of communication +and memory sharing capability that utilize a Hypervisor’s native mechanisms. + +struct hyper_dmabuf_backend_ops { + /* retreiving id of current virtual machine */ + int (*get_vm_id)(void); + /* get pages shared via hypervisor-specific method */ + int (*share_pages)(struct page **, int, int, void **); + /* make shared pages unshared via hypervisor specific method */ + int (*unshare_pages)(void **, int); + /* map remotely shared pages on importer's side via + * hypervisor-specific method + */ + struct page ** (*map_shared_pages)(int, int, int, void **); + /* unmap and free shared pages on importer's side via + * hypervisor-specific method + */ + int (*unmap_shared_pages)(void **, int); + /* initialize communication environment */ + int (*init_comm_env)(void); + /* destroy communication channel */ + void (*destroy_comm)(void); + /* upstream ch setup (receiving and responding) */ + int (*init_rx_ch)(int); + /* downstream ch setup (transmitting and parsing responses) */ + int (*init_tx_ch)(int); + /* send msg via communication ch */ + int (*send_req)(int, struct hyper_dmabuf_req *, int); +}; + +<Hypervisor-specific Backend Structure> + +1. get_vm_id + + Returns the VM (domain) ID + + Input: + + -ID of the current domain + + Output: + + None + +2. share_pages + + Get pages shared via hypervisor-specific method and return one reference + ID that represents the complete list of shared pages + + Input: + + -Array of pages + -ID of importing VM + -Number of pages + -Hypervisor specific Representation of reference info of shared + pages + + Output: + + -Hypervisor specific integer value that represents all of + the shared pages + +3. unshare_pages + + Stop sharing pages + + Input: + + -Hypervisor specific Representation of reference info of shared + pages + -Number of shared pages + + Output: + + 0 + +4. map_shared_pages + + Map shared pages locally using a hypervisor-specific method + + Input: + + -Reference number that represents all of shared pages + -ID of exporting VM, Number of pages + -Reference information for any purpose + + Output: + + -An array of shared pages (struct page**) + +5. unmap_shared_pages + + Unmap shared pages + + Input: + + -Hypervisor specific Representation of reference info of shared pages + + Output: + + -0 (successful) or one of Standard Kernel errors + +6. init_comm_env + + Setup infrastructure needed for communication channel + + Input: + + None + + Output: + + None + +7. destroy_comm + + Cleanup everything done via init_comm_env + + Input: + + None + + Output: + + None + +8. init_rx_ch + + Configure receive channel + + Input: + + -ID of VM on the other side of the channel + + Output: + + -0 (successful) or one of Standard Kernel errors + +9. init_tx_ch + + Configure transmit channel + + Input: + + -ID of VM on the other side of the channel + + Output: + + -0 (success) or one of Standard Kernel errors + +10. send_req + + Send message to other VM + + Input: + + -ID of VM that receives the message + -Message + + Output: + + -0 (success) or one of Standard Kernel errors + +------------------------------------------------------------------------------- +-------------------------------------------------------------------------------
Reference document for hyper_DMABUF driver Documentation/hyper-dmabuf-sharing.txt Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> --- Documentation/hyper-dmabuf-sharing.txt | 734 +++++++++++++++++++++++++++++++++ 1 file changed, 734 insertions(+) create mode 100644 Documentation/hyper-dmabuf-sharing.txt