| \documentclass{article} |
| \usepackage{palatino} |
| |
| \author{Kristian Høgsberg\\ |
| \texttt{krh@bitplanet.net} |
| } |
| |
| \title{The Wayland Display Server} |
| |
| \begin{document} |
| |
| \maketitle |
| |
| \section{Wayland Overview} |
| |
| \begin{itemize} |
| \item wayland is a protocol for a new display server. |
| \item wayland is an implementation |
| \end{itemize} |
| |
| \subsection{Replacing X11} |
| |
| Over time, a lot of functionality have slowly moved out of the X |
| server and into client-side libraries or kernel drivers. One of the |
| first components to move out was font rendering, with freetype and |
| fontconfig providing an alternative to the core X fonts. Direct |
| rendering OpenGL as a graphics driver in a client side library. Then |
| cairo came along and provided a modern 2D rendering library |
| independent of X and compositing managers took over control of the |
| rendering of the desktop. Recently with GEM and KMS in the Linux |
| kernel, we can do modesetting outside X and schedule several direct |
| rendering clients. The end result is a highly modular graphics stack. |
| |
| \subsection{Make the compositing manager the display server} |
| |
| Wayland is a new display server building on top of all those |
| components. We are trying to distill out the functionality in the X |
| server that is still used by the modern Linux desktop. This turns out |
| to be not a whole lot. Applications can allocate their own off-screen |
| buffers and render their window contents by themselves. In the end, |
| what’s needed is a way to present the resulting window surface to a |
| compositor and a way to receive input. This is what Wayland provides, |
| by piecing together the components already in the eco-system in a |
| slightly different way. |
| |
| X will always be relevant, in the same way Fortran compilers and VRML |
| browsers are, but it’s time that we think about moving it out of the |
| critical path and provide it as an optional component for legacy |
| applications. |
| |
| |
| \section{Wayland protocol} |
| |
| \subsection{Basic Principles} |
| |
| The wayland protocol is an asynchronous object oriented protocol. All |
| requests are method invocations on some object. The request include |
| an object id that uniquely identifies an object on the server. Each |
| object implements an interface and the requests include an opcode that |
| identifies which method in the interface to invoke. |
| |
| The server sends back events to the client, each event is emitted from |
| an object. Events can be error conditions. The event includes the |
| object id and the event opcode, from which the client can determine |
| the type of event. Events are generated both in response to a request |
| (in which case the request and the event constitutes a round trip) or |
| spontaneously when the server state changes. |
| |
| \begin{itemize} |
| \item state is broadcast on connect, events sent out when state |
| change. client must listen for these changes and cache the state. |
| no need (or mechanism) to query server state. |
| |
| \item server will broadcast presence of a number of global objects, |
| which in turn will broadcast their current state. |
| \end{itemize} |
| |
| \subsection{Code generation} |
| |
| The interfaces, requests and events are defined in protocol/wayland.xml. |
| This xml is used to generate the function prototypes that can be used by |
| clients and compositors. |
| |
| The protocol entry points are generated as inline functions which just |
| wraps the \verb:wl_proxy_*: functions. The inline functions aren't |
| part of the library ABI and language bindings should generate their |
| own stubs for the protocol entry points from the xml. |
| |
| \subsection{Wire format} |
| |
| The protocol is sent over a UNIX domain stream socket. Currently, the |
| endpoint is named \texttt{\textbackslash0wayland}, but it is subject |
| to change. The protocol is message-based. A message sent by a client |
| to the server is called \texttt{request}. A message from the server |
| to a client is called \texttt{event}. Every message is structured as |
| 32-bit words, values are represented in the host's byte-order. |
| |
| The message header has 2 words in it: |
| \begin{itemize} |
| \item The first word is the sender's object id (32-bit). |
| \item The second has 2 parts of 16-bit. The upper 16-bits are the message |
| size in bytes, starting at the header (i.e. it has a minimum value of 8). |
| The lower is the request/event opcode. |
| \end{itemize} |
| |
| The payload describes the request/event arguments. Every argument is always |
| aligned to 32-bit. There is no prefix that describes the type, but it is |
| inferred implicitly from the xml specification. |
| |
| The representation of argument types are as follows: |
| \begin{itemize} |
| \item "int" or "uint": The value is the 32-bit value of the signed/unsigned |
| int. |
| \item "string": Starts with an unsigned 32-bit length, followed by the |
| string contents, including terminating NUL byte, then padding to a |
| 32-bit boundary. |
| \item "object": A 32-bit object ID. |
| \item "new\_id": the 32-bit object ID. On requests, the client |
| decides the ID. The only event with "new\_id" is advertisements of |
| globals, and the server will use IDs below 0x10000. |
| \item "array": Starts with 32-bit array size in bytes, followed by the array |
| contents verbatim, and finally padding to a 32-bit boundary. |
| \item "fd": the file descriptor is not stored in the message buffer, but in |
| the ancillary data of the UNIX domain socket message (msg\_control). |
| \end{itemize} |
| |
| \subsection{Connect Time} |
| |
| \begin{itemize} |
| \item no fixed format connect block, the server emits a bunch of |
| events at connect time |
| \item presence events for global objects: output, compositor, input |
| devices |
| \end{itemize} |
| \subsection{Security and Authentication} |
| |
| \begin{itemize} |
| \item mostly about access to underlying buffers, need new drm auth |
| mechanism (the grant-to ioctl idea), need to check the cmd stream? |
| |
| \item getting the server socket depends on the compositor type, could |
| be a system wide name, through fd passing on the session dbus. or |
| the client is forked by the compositor and the fd is already opened. |
| \end{itemize} |
| |
| \subsection{Creating Objects} |
| |
| \begin{itemize} |
| \item client allocates object ID, uses range protocol |
| \item server tracks how many IDs are left in current range, sends new |
| range when client is about to run out. |
| \end{itemize} |
| |
| \subsection{Compositor} |
| |
| The compositor is a global object, advertised at connect time. |
| |
| \begin{tabular}{l} |
| \hline |
| Interface \texttt{compositor} \\ \hline |
| Requests \\ \hline |
| \texttt{create\_surface(id)} \\ |
| \texttt{commit()} \\ \hline |
| Events \\ \hline |
| \texttt{device(device)} \\ |
| \texttt{acknowledge(key, frame)} \\ |
| \texttt{frame(frame, time)} \\ \hline |
| \end{tabular} |
| |
| |
| \begin{itemize} |
| \item a global object |
| \item broadcasts drm file name, or at least a string like drm:/dev/card0 |
| \item commit/ack/frame protocol |
| \end{itemize} |
| |
| \subsection{Surface} |
| |
| Created by the client. |
| |
| \begin{tabular}{l} |
| \hline |
| Interface \texttt{surface} \\ \hline |
| Requests \\ \hline |
| \texttt{destroy()} \\ |
| \texttt{attach()} \\ |
| \texttt{map()} \\ |
| \texttt{damage()} \\ \hline |
| Events \\ \hline |
| no events \\ \hline |
| \end{tabular} |
| |
| Needs a way to set input region, opaque region. |
| |
| \subsection{Input} |
| |
| Represents a group of input devices, including mice, keyboards. Has a |
| keyboard and pointer focus. Global object. Pointer events are |
| delivered in both screen coordinates and surface local coordinates. |
| |
| \begin{tabular}{l} |
| \hline |
| Interface \texttt{cache} \\ \hline |
| Requests \\ \hline |
| \texttt{attach(buffer, x, y)} \\ |
| Events \\ \hline |
| \texttt{motion(x, y, sx, sy)} \\ |
| \texttt{button(button, state, x, y, sx, sy)} \\ |
| \texttt{key(key, state)} \\ |
| \texttt{pointer\_focus(surface)} \\ |
| \texttt{keyboard\_focus(surface, keys)} \\ \hline |
| \end{tabular} |
| |
| Talk about: |
| |
| \begin{itemize} |
| \item keyboard map, change events |
| \item xkb on wayland |
| \item multi pointer wayland |
| \end{itemize} |
| |
| A surface can change the pointer image when the surface is the pointer |
| focus of the input device. Wayland doesn't automatically change the |
| pointer image when a pointer enters a surface, but expects the |
| application to set the cursor it wants in response the pointer |
| focus and motion events. The rationale is that a client has to manage |
| changing pointer images for UI elements within the surface in response |
| to motion events anyway, so we'll make that the only mechanism for |
| setting changing the pointer image. If the server receives a request |
| to set the pointer image after the surface loses pointer focus, the |
| request is ignored. To the client this will look like it successfully |
| set the pointer image. |
| |
| The compositor will revert the pointer image back to a default image |
| when no surface has the pointer focus for that device. Clients can |
| revert the pointer image back to the default image by setting a NULL |
| image. |
| |
| What if the pointer moves from one window which has set a special |
| pointer image to a surface that doesn't set an image in response to |
| the motion event? The new surface will be stuck with the special |
| pointer image. We can't just revert the pointer image on leaving a |
| surface, since if we immediately enter a surface that sets a different |
| image, the image will flicker. Broken app, I suppose. |
| |
| \subsection{Output} |
| |
| A output is a global object, advertised at connect time or as they |
| come and go. |
| |
| \begin{tabular}{l} |
| \hline |
| Interface \texttt{output} \\ \hline |
| Requests \\ \hline |
| no requests \\ \hline |
| Events \\ \hline |
| \texttt{geometry(width, height)} \\ \hline |
| \end{tabular} |
| |
| \begin{itemize} |
| \item laid out in a big (compositor) coordinate system |
| \item basically xrandr over wayland |
| \item geometry needs position in compositor coordinate system\ |
| \item events to advertise available modes, requests to move and change |
| modes |
| \end{itemize} |
| |
| \subsection{Shared object cache} |
| |
| Cache for sharing glyphs, icons, cursors across clients. Lets clients |
| share identical objects. The cache is a global object, advertised at |
| connect time. |
| |
| \begin{tabular}{l} |
| \hline |
| Interface \texttt{cache} \\ \hline |
| Requests \\ \hline |
| \texttt{upload(key, visual, bo, stride, width, height)} \\ \hline |
| Events \\ \hline |
| \texttt{item(key, bo, x, y, stride)} \\ |
| \texttt{retire(bo)} \\ \hline |
| \end{tabular} |
| |
| \begin{itemize} |
| |
| \item Upload by passing a visual, bo, stride, width, height to the |
| cache. |
| |
| \item Upload returns a bo name, stride, and x, y location of object in |
| the buffer. Clients take a reference on the atlas bo. |
| |
| \item Shared objects are refcounted, freed by client (when purging |
| glyphs from the local cache) or when a client exits. |
| |
| \item Server can't delete individual items from an atlas, but it can |
| throw out an entire atlas bo if it becomes too sparse. The server |
| sends out an \texttt{retire} event when this happens, and clients |
| must throw away any objects from that bo and reupload. Between the |
| server dropping the atlas and the client receiving the retire event, |
| clients can still legally use the old atlas since they have a ref on |
| the bo. |
| |
| \item cairo needs to hook into the glyph cache, and maybe also a way |
| to create a read-only surface based on an object form the cache |
| (icons). |
| |
| \texttt{cairo\_wayland\_create\_cached\_surface(surface-data)}. |
| |
| \end{itemize} |
| |
| |
| \subsection{Drag and Drop} |
| |
| Multi-device aware. Orthogonal to rest of wayland, as it is its own |
| toplevel object. Since the compositor determines the drag target, it |
| works with transformed surfaces (dragging to a scaled down window in |
| expose mode, for example). |
| |
| Issues: |
| |
| \begin{itemize} |
| \item we can set the cursor image to the current cursor + dragged |
| object, which will last as long as the drag, but maybe an request to |
| attach an image to the cursor will be more convenient? |
| |
| \item Should drag.send() destroy the object? There's nothing to do |
| after the data has been transferred. |
| |
| \item How do we marshal several mime-types? We could make the drag |
| setup a multi-step operation: dnd.create, drag.offer(mime-type1), |
| drag.offer(mime-type2), drag.activate(). The drag object could send |
| multiple offer events on each motion event. Or we could just |
| implement an array type, but that's a pain to work with. |
| |
| \item Middle-click drag to pop up menu? Ctrl/Shift/Alt drag? |
| |
| \item Send a file descriptor over the protocol to let initiator and |
| source exchange data out of band? |
| |
| \item Action? Specify action when creating the drag object? Ask |
| action? |
| \end{itemize} |
| |
| New objects, requests and events: |
| |
| \begin{itemize} |
| \item New toplevel dnd global. One method, creates a drag object: |
| \texttt{dnd.start(new object id, surface, input device, mime |
| types)}. Starts drag for the device, if it's grabbed by the |
| surface. drag ends when button is released. Caller is responsible |
| for destroying the drag object. |
| |
| \item Drag object methods: |
| |
| \texttt{drag.destroy(id)}, destroy drag object. |
| |
| \texttt{drag.send(id, data)}, send drag data. |
| |
| \texttt{drag.accept(id, mime type)}, accept drag offer, called by |
| target surface. |
| |
| \item Drag object events: |
| |
| \texttt{drag.offer(id, mime-types)}, sent to potential destination |
| surfaces to offer drag data. If the device leaves the window or the |
| originator cancels the drag, this event is sent with mime-types = |
| NULL. |
| |
| \texttt{drag.target(id, mime-type)}, sent to drag originator when a |
| target surface has accepted the offer. if a previous target goes |
| away, this event is sent with mime-type = NULL. |
| |
| \texttt{drag.data(id, data)}, sent to target, contains dragged data. |
| ends transaction on the target side. |
| \end{itemize} |
| |
| Sequence of events: |
| |
| \begin{itemize} |
| \item The initiator surface receives a click (which grabs the input |
| device to that surface) and then enough motion to decide that a drag |
| is starting. Wayland has no subwindows, so it's entirely up to the |
| application to decide whether or not a draggable object within the |
| surface was clicked. |
| |
| \item The initiator creates a drag object by calling the |
| \texttt{create\_drag} method on the dnd global object. As for any |
| client created object, the client allocates the id. The |
| \texttt{create\_drag} method also takes the originating surface, the |
| device that's dragging and the mime-types supported. If the surface |
| has indeed grabbed the device passed in, the server will create an |
| active drag object for the device. If the grab was released in the |
| meantime, the drag object will be in-active, that is, the same state |
| as when the grab is released. In that case, the client will receive |
| a button up event, which will let it know that the drag finished. |
| To the client it will look like the drag was immediately cancelled |
| by the grab ending. |
| |
| The special mime-type application/x-root-target indicates that the |
| initiator is looking for drag events to the root window as well. |
| |
| \item To indicate the object being dragged, the initiator can replace |
| the pointer image with an larger image representing the data being |
| dragged with the cursor image overlaid. The pointer image will |
| remain in place as long as the grab is in effect, since the |
| initiating surface keeps pointer focus, and no other surface |
| receives enter events. |
| |
| \item As long as the grab is active (or until the initiator cancels |
| the drag by destroying the drag object), the drag object will send |
| \texttt{offer} events to surfaces it moves across. As for motion |
| events, these events contain the surface local coordinates of the |
| device as well as the list of mime-types offered. When a device |
| leaves a surface, it will send an \texttt{offer} event with an empty |
| list of mime-types to indicate that the device left the surface. |
| |
| \item If a surface receives an offer event and decides that it's in an |
| area that can accept a drag event, it should call the |
| \texttt{accept} method on the drag object in the event. The surface |
| passes a mime-type in the request, picked from the list in the offer |
| event, to indicate which of the types it wants. At this point, the |
| surface can update the appearance of the drop target to give |
| feedback to the user that the drag has a valid target. If the |
| \texttt{offer} event moves to a different drop target (the surface |
| decides the offer coordinates is outside the drop target) or leaves |
| the surface (the offer event has an empty list of mime-types) it |
| should revert the appearance of the drop target to the inactive |
| state. A surface can also decide to retract its drop target (if the |
| drop target disappears or moves, for example), by calling the accept |
| method with a NULL mime-type. |
| |
| \item When a target surface sends an \texttt{accept} request, the drag |
| object will send a \texttt{target} event to the initiator surface. |
| This tells the initiator that the drag currently has a potential |
| target and which of the offered mime-types the target wants. The |
| initiator can change the pointer image or drag source appearance to |
| reflect this new state. If the target surface retracts its drop |
| target of if the surface disappears, a \texttt{target} event with a |
| NULL mime-type will be sent. |
| |
| If the initiator listed application/x-root-target as a valid |
| mime-type, dragging into the root window will make the drag object |
| send a \texttt{target} event with the application/x-root-target |
| mime-type. |
| |
| \item When the grab is released (indicated by the button release |
| event), if the drag has an active target, the initiator calls the |
| \texttt{send} method on the drag object to send the data to be |
| transferred by the drag operation, in the format requested by the |
| target. The initiator can then destroy the drag object by calling |
| the \texttt{destroy} method. |
| |
| \item The drop target receives a \texttt{data} event from the drag |
| object with the requested data. |
| \end{itemize} |
| |
| MIME is defined in RFC's 2045-2049. A registry of MIME types is |
| maintained by the Internet Assigned Numbers Authority (IANA). |
| |
| ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/ |
| |
| |
| \section{Types of compositors} |
| |
| \subsection{System Compositor} |
| |
| \begin{itemize} |
| \item ties in with graphical boot |
| \item hosts different types of session compositors |
| \item lets us switch between multiple sessions (fast user switching, |
| secure/personal desktop switching) |
| \item multiseat |
| \item linux implementation using libudev, egl, kms, evdev, cairo |
| \item for fullscreen clients, the system compositor can reprogram the |
| video scanout address to source from the client provided buffer. |
| \end{itemize} |
| |
| \subsection{Session Compositor} |
| |
| \begin{itemize} |
| \item nested under the system compositor. nesting is feasible because |
| protocol is async, roundtrip would break nesting |
| \item gnome-shell |
| \item moblin |
| \item compiz? |
| \item kde compositor? |
| \item text mode using vte |
| \item rdp session |
| \item fullscreen X session under wayland |
| \item can run without system compositor, on the hw where it makes |
| sense |
| \item root window less X server, bridging X windows into a wayland |
| session compositor |
| \end{itemize} |
| |
| \subsection{Embbedding Compositor} |
| |
| X11 lets clients embed windows from other clients, or lets client copy |
| pixmap contents rendered by another client into their window. This is |
| often used for applets in a panel, browser plugins and similar. |
| Wayland doesn't directly allow this, but clients can communicate GEM |
| buffer names out-of-band, for example, using d-bus or as command line |
| arguments when the panel launches the applet. Another option is to |
| use a nested wayland instance. For this, the wayland server will have |
| to be a library that the host application links to. The host |
| application will then pass the wayland server socket name to the |
| embedded application, and will need to implement the wayland |
| compositor interface. The host application composites the client |
| surfaces as part of it's window, that is, in the web page or in the |
| panel. The benefit of nesting the wayland server is that it provides |
| the requests the embedded client needs to inform the host about buffer |
| updates and a mechanism for forwarding input events from the host |
| application. |
| |
| \begin{itemize} |
| \item firefox embedding flash by being a special purpose compositor to |
| the plugin |
| \end{itemize} |
| |
| \section{Implementation} |
| |
| what's currently implemented |
| |
| \subsection{Wayland Server Library} |
| |
| \texttt{libwayland-server.so} |
| |
| \begin{itemize} |
| \item implements protocol side of a compositor |
| \item minimal, doesn't include any rendering or input device handling |
| \item helpers for running on egl and evdev, and for nested wayland |
| \end{itemize} |
| |
| \subsection{Wayland Client Library} |
| |
| \texttt{libwayland.so} |
| |
| \begin{itemize} |
| \item minimal, designed to support integration with real toolkits such as |
| Qt, GTK+ or Clutter. |
| |
| \item doesn't cache state, but lets the toolkits cache server state in |
| native objects (GObject or QObject or whatever). |
| \end{itemize} |
| |
| \subsection{Wayland System Compositor} |
| |
| \begin{itemize} |
| \item implementation of the system compositor |
| |
| \item uses libudev, eagle (egl), evdev and drm |
| |
| \item integrates with ConsoleKit, can create new sessions |
| |
| \item allows multi seat setups |
| |
| \item configurable through udev rules and maybe /etc/wayland.d type thing |
| \end{itemize} |
| |
| \subsection{X Server Session} |
| |
| \begin{itemize} |
| \item xserver module and driver support |
| |
| \item uses wayland client library |
| |
| \item same X.org server as we normally run, the front buffer is a wayland |
| surface but all accel code, 3d and extensions are there |
| |
| \item when full screen the session compositor will scan out from the X |
| server wayland surface, at which point X is running pretty much as it |
| does natively. |
| \end{itemize} |
| |
| \end{document} |