… here I’am :)
personal news
•May 17, 2009 • 2 CommentsBeing quick about my last months: finished my Master course, moving to Helsinki tomorrow (!), got my first real job. Yessh, three great steps in my crazy life and I’m very happy with it! Hope to stay closer with the desktop communities again ;)
Cursor’s update inside kernel only
•January 14, 2009 • 3 CommentsOne thing that I learned with open communities is when you send a proposal idea and no one replies. This seems to be odd at first but it _isn’t_. If you did something strange or wrong then at least someone will reply. OTOH if you did something that can be promiser _and_ no one objects, then great! 99% that you did something interesting :)
Given that I’ll not work towards this cursor’s shortcut idea soon, I’m posting a mail that I recently sent to dri list and didn’t see much excitement there. So the bits are preserved here for future references.
—
Under KMS, we can build a feature to update the cursor directly to screen without the continuous intervention of the userspace application (X server, wayland, etc). It’s a fastpath for DRM based cursors obtained by short-circuit the kernel input layer and DRM module. This would solve all cursor’s latency issues that we have in our current model.
This series of patches implement such feature using Xorg as the application. Through an ioctl, Xorg tells which devices are responsible for cursors’ updates and the kernel evdev driver will spit the events to DRM. DRM will take care about the event informations and also screen limits, and then will draw the cursor on screen. Very intuitive.
Right now a thing that is annoying me is how others cursors, sw rendered, could be implemented. I want to avoid two different sets of the same code in different contexts. IMHO conceptually all these cursor update things must be in-kernel. Painting a cursor image seems to be quite hard as we have to save/restore areas under the cursor. I remember that anholt had an idea concerning this, but I do not remember details.
Well, the patches are far from ready to go upstream, but it deploys a system working on this way. So, for now, this mail has two goals:
- to people comment on and see in what kind of world we can move.
- get a feedback how we can proceed in the case of sw cursors.
Please, comment on. The patches can be found here.
multiseat – documentation and references
•January 9, 2009 • Leave a CommentTrying to put some order with all the multiseat documentation found on Web, today I updated the following:
Multiseat’s article, in X.Org Foundation wiki:
http://wiki.x.org/wiki/Development/Documentation/Multiseat
Multiseat’s history, in Wikipedia:
http://en.wikipedia.org/wiki/Multiseat#History
So if you are trying to setup or help in development, probably the foundation’s wiki has the most up to date and centralized references. Also, please, if you have more helpful information, don’t be afraid to update the wikis. Our future grandsons will be thankful :)
multiseat – roadmap
•September 23, 2008 • 5 CommentsThis week our laboratory at university released the MDM utility to ease the process of installation and configuration of a multiseat box. The idea is that the end-user should not use some boring and hard howtos anymore to deploy it. Just installing a distro package must be enough now. Try it, use it, report the bugs and send the patches! :)
But I would like to call attention here because we’re still relying on the ugly Xephyr solution to build the multiseat on a simple PC machine (there are people selling this solution. Sigh). A lot of cool stuffs arriving in the linux graphics stack are lacking with this solution. So lets try trace the roadmap here that we can follow in the short/medium-term to build a good one solution:
- Vga Arbiter
We should as fast as we can forget the Xephyr hack. Definitely several instances of Xorg – one for each user session – is what we want. If someone wants to use several graphics cards to deploy a multiseat, then (s)he would probably face the problem of the vga legacy address. The vga arbitration is the solution.
Jesse seems that will work towards this in 2.6.28. There’s also a “minor” problem here that the X server still not posting secondary cards (after pci-rework).
- xrandr 1.3
To deploy a multiseat with one video card/multi-crtc, the randr extension is enough to cover the hotplug of output devices. For a multi-card configuration, xrandr must be GPU aware. So we must done some work here as well to do the automagically configuration of output devices.
- input hotplug
So far MDM is not using the last input features of X to plug or re-plug a device in the machine. It is using its own ugly method to poll for devices. Some work here is needed.
- ConsoleKit integration
Device ownership (e.g. audio, pen drive, cameras, usb toys, output and input devices) when multiple users are in the same machine could be a mess. Moreover, the security problems implied by this could be harmful. ConsoleKit seems to solve this all. It is able to keep track of all the users currently logged in.
Honestly I never took a closer look at ConsoleKit. I’m not sure if it’s prepared enough to support multiseat. So we need to take care of this as well eventually putting some hook inside MDM to work with it.
- DRI + modesetting
Support DRI in multiple X servers in parallel is not ready yet. Some redesign must be done to achieve this.
- tools for auto configuration
After all this work, some easy tools and very user-friendly would be awesome to setup on-the-fly the multiseat in the desktop environments.
Parallel events (panic) with X
•August 18, 2008 • Leave a CommentUnfortunately that model which I described some weeks ago to put the input event delivery of the X server in a separate thread wouldn’t be an advantage. I precipitated myself thinking that it could be feasible. Sorry :(
I started to implement all this but it showed a very boring task to grab all the globals variables which both threads touch and to lock it. So I decided to stop going in this way. It’s hard to program thinking in parallel. It’s even harder to debug a program with severals flows. More, the tools don’t help you (if you have lucky, gdb will work).
But the main reason I can argue to stop with this model is that the “main” event flow of execution (i.e. basically all the functions in {Swapped,}ProcVector) and the input delivery flow (ProcessInputEvents()) are very very tied. Both deal a lot with clients and we’d need to lock several globals, thus spending a lot of time in the management of the threads. It’s easy to see this acting: just put a breakpoint in TryClientEvents(). Every single request to deliver a given event to a given client involves this function. And both input and main event flow will call TryClientEvents(). So you will see a zillion of times this function being called. The contention of the eventual processing and main threads would be even greater if the client choose to receive MotionNotify event.
So yeah, it’s far from be clear how to put processing of input events inside another thread.
== Next ==
In the next days I’ll be traveling to CESol, Fortaleza here in Brazil. I was invited to talk about my work in X land. Latin America has a lot of promising countries concerning FOSS development however for some reason no one actively participate and contribute for the X development (why?). I’ll try to motivate people there somehow :)
In the next week I’ll put the generation thread in a shape good enough to eventually push this to upstream. Also I’ll try to write a good sumary of all my work given that GSoC is in the end.
Priorities and scheduling hints for X server threads
•August 7, 2008 • 3 CommentsInput events routed through another thread/process can have bad effects on latency because we can’t guarantee that it will get scheduled at the right moment. Although this is hard to see happening with the current X server threaded implementation, we must design something to avoid it. One way to improve the responsiveness is to give a high priority to the input thread and also adjust the CPU scheduling. (Note that this will not avoid problems related with page faults which usually happen in the X input flow.)
Linux uses 1:1 thread model and the scheduler handles every thread as a process. For now I don’t care about others systems. Both input generation and processing threads was designed to sleep after a relatively short CPU run. So we can give a priority to processes that are trusted to not hog the CPU. And given they are special time-critical applications I have no doubt in what policy to use: I set both input threads to use the real-time FIFO policy and to get the maximum priority (sched_get_priority_max()).
—
I’m sure that someone will complain telling that this would decrease a bit the main thread when used together with both input threads. In GUI we’re talking about better user experience. Latency variability must be avoided whenever possible in interactive situations. What the user see is what matters. For non-interactive processes (server scheduling workloads) the situation is totally different.
Xorg’s philosophy is to be portable so we have to take care when setting this kind of parameters. It is a complex issue and different systems do it in wildly different ways. I was using my Linux box (2.6.24) to design it all.
keep it going…
•August 7, 2008 • 1 CommentGiven that GSoC ‘08 is getting close to the end, strategy number 2 showed more feasible to proceed my work. Strategy #3 would be a lot of fun but would imply a hell massive codification as well (also a little out of our scope). Unfortunately no-no for now.
Improving input latency
•July 29, 2008 • 10 CommentsGSoC summary #1 – July 29
The current implementation of X Window System relies in a signal scheme to manage the input event coming from hardware devices. This scheme frequently get blocked when lot of IO is occurring (for instance, when the process is swapping in/out). Get blocked means for instance a jumping cursor on the screen and in GUI is always desirable to prioritize the system responsiveness for end users. The human/computer interface should be smooth and this is the most user visible aspect of a system.
Besides the need for improvement in system responsiveness, the current design of the event stream has some oddities, probably due historical reasons, such as the cursor update done in user-space or the huge path that takes to draw the cursor instead just connect the mouse hardware directly with the cursor position update in-kernel. Moreover there is no fundamental reason to input drivers be dependent of DDX part of the X server. Therefore a design of the input subsystem must be carefully redone to improve such issues.
Our project try to solve all this problems. In summary the goal is: to get a path from hardware input event to client delivery that cannot be blocked by rendering or IO operations, meaning we always have very low latency on input events. Moreover, a redesign of such event stream could improve the overall X graphics stack, which must be considered as well.
So far three strategies were explored to achieve the goal:
1. put X input generation stage in a separate thread
2. put X input generation and processing stages others threads
3. shortcut the kernel input layer with drm to decrease the cursor update latency
Basically 1. and 2. tries to solve the issue of blocking signals and 3. would be a completely redesign in input infrastructure. Anyway, the 3. strategy would impact in 1. and 2. but these could be implemented in parallel with the third strategy. The following sections details each strategy.
== strategy #1 ==
Strategy 1 does not uses a signal handler anymore to wake up the event generation code. It simply poll for device’s socket and giving that this code is under a separate thread this is a win for the CPUs.
With the separate thread taking care only the input code, it was expected that the cursor footprint always lived on resident memory when the mouse stills in movement. Unfortunately this was not true. For some reason it swaps back to disk. Maybe some scheduler adjusts would help here. A memory lock scheme was tried to do lock the cursor footprint always in physical memory without success.
This strategy is basically what we’ve been done is the first GSoC. This is pretty much implemented. It would not require much trouble to push it to X server from upstream. The code is here:
http://cgit.freedesktop.org/~vignatti/xserver/
== strategy #2 ==
This strategy can be thought as an improvement of #1. It can be separated in two models of implementation:
Model one:
thread #1 deals with
- injection and processing of input events
thread #2 deals with
- requests from known clients
- new client that tries to connect
It would be very very nice to let both threads totally independents. But we cannot. The event delivery depends on window structure and the first thread must always wake up the second. Also, sometimes the processing of events take a while and the injection of events stays stucked in this model. So we came with this another:
Model two:
thread #1 deals with
- injection of input events from devices
thread #2 deals with
- processing of input events to clients
thread #3 deals with
- requests from known clients
- new client that tries to connect
With this model the first and the second thread become not so tied and given that we’re using non blocking fds to wake up each thread (through a pipe), CPU “enjoys” the effect of threads. For instance, under heavy drawing primitives only thread #3 would wake up.
We had a proof-of-concept of this last model and it workish (occasionally seeing some segfaults probably due of some critical regions we forgot to lock – now the only mutex that exists is inside the server queue of events).
It’s hard to imagine other threaded models mainly because the way X deals with clients are very tied in every piece of the server and it would require a lot of mutexes.
== strategy #3 ==
For sure this strategy is the most shocking one :) The idea is to connect the mouse hardware directly to the cursor position update function, all inside kernel. We’d then rewrite the event stream from the pointer device to an absolute position. Transform the relative mouse motion into an absolute screen position seems to be not that complicated, but this strategy would involve acceleration and cursor limits inside kernel as well (the current implementation of accel deals with floats, so we would have to adapt it to live in kernel).
It is a _very_ _large_ amount of codification. It would require changes to the X server, DDX driver and its corresponding kernel DRM drivers, drm library and kernel input drivers. A mini-input driver *inside* drm is also needed. We would add complexities of the connection between input device and output device to the kernel (in my proof-of-concept implementation evdev is dependent of drm. Yeah, really weird world). Moreover, we would have to avoid somehow two differents sets of the exact same code in different contexts in the case of sw cursors (think MPX). It’s a completely redesign. Things would have to go incrementally.
But why this strategy? Well, this would solve all the current issues with input latency. For instance with the current design of the kernel modesetting – which seems the future – the cursor is jumping a lot, much more than with current implementation. Try to call a xrandr instance and move the mouse with kernel modesetting. xrandr will do DDC communication which will blocked X in the kernel. So with the handling and update of the cursor inside the kernel all would work fine (and my proof-of-concept already showed this).
Moreover, I believe the current implementation remained until now due historical reasons. Ultrix systems placed the entire input subsystem in the kernel. What is the problem to do this in Linux (and others) as well (besides massive codification)?
and non-dri drivers? Should we forget them?
EOF
cursor handling and updates inside DRM
•July 10, 2008 • Leave a CommentThe current DRM kernel modesetting tree is already taking care to update the cursor registers and paint it to the screen. Very cool [0].
What I’ve done today is a shortcut between the kernel input layer and DRM to update the cursor directly on screen without the X server be notified always. Of course, a lot of issues raised up together. So let’s try to delegates the tasks again.
userspace app (X server):
- starts all this mechanism telling which is the device responsible for the cursor (input ddx drv)
- responsible for loading new cursor images and push to the DRM (video ddx drv)
kernel input layer (evdev driver):
- notify and send its relative coordinates events to DRM
DRM:
- transform relative motion into absolute
- takes care the cursor limits
- responsible for the acceleration computation
- responsible for the input transformation as well?
- touch the gfx registers.
Seems that a reasonable amount of code in ddx input drv (mainly ReadInput) and dix (mainly GetPointerEvents) would be “swallowed” by the DRM. The “event generation stage” of the server would deal with the event itself + xkb + Xi things (which eventually could be done in a dedicated thread) and will let to DRM the responsibility of paint the cursor on screen.
The communication between kernel input drv can be directly, calling a DRM function; the DRM and userspace can communicate basically using ioctls. Complains?
This would only works with DRM supporting OSes. What about the others?
[0] Not so much. Seems this method to update the cursor is sending _a lot of_ ioctls and sometimes doing cursor jumps. But I have to double check to see if the problem is for sure with context switches.
