I think I should blog more about random troubleshooting sessions, if nothing else it will remind me what steps I took when it inevitably happens again!
Okay, here is the first one – why is my xterm opening so slowly?
Background
I have two similarly specced machines at my desk – my primary workstation running Fedora Linux, and a Windows 11 machine. They share the same monitor and input devices, and I switch between them with an iogear KVM.
I do the bulk of my work in either a browser or a terminal. This is true even on Windows, where I rely heavily on WSL.
This works well for me, and I’m happy enough with the setup.
Issue
I have the shortcut Super+1 bound to xterm on both machines, and I probably use this hundreds of times per day.
Here is how that looks on Fedora:
Fedora Terminal
It takes about 300ms from key activation to a terminal being ready. This is fine, I’ve never noticed any problem.
However, let’s compare that to Windows:
Windows Terminal
That’s about 1600ms before I can type, over 5 times slower! This is slow enough that it bothers me, and I use this shortcut so often that I want to solve it.
I don’t think many people care about xterm performance on Windows, so I guess that means it’s up to me to solve this 🙂
Window Effects
Hey, wait a minute… ENHANCE! 👀
Windows Fading Effect
Why does the window fade in like that? It looks like the Window is ready when the effect starts, but I can’t interact with it until it completes. If I count those frames, this animation must be costing me ~200ms! 🤬
I always disable anything like animation or compositing effects, so I’m confused where this is coming from.
Windows Performance Settings
I tested with some native windows programs like notepad and calc – they just appear instantly… so what is causing that?
I experimented with it a bit, I can see other windows behind it as it fades in, so I think something must be changing the opacity. Searching msdn, it looks like the function for that is SetLayeredWindowAttributes().
Could something be calling that, is my X server betraying me?
This looks like the culprit!! I’m using a server called X410, it seems like it’s adding it’s own animation effects, and doesn’t have any way to disable it. I’m reluctant to switch to an alternative – that could just replace this issue with a different issue to troubleshoot.
Is it possible I can just stop it from doing that with a debugger?
I’ve added that cdb command into my xinit initialization, and it looks a lot snappier. That saved nearly 300ms, so we’re down to just 4 times slower than Fedora!
Profiling
Okay, let’s try get some real numbers. I like the tool hyperfine for this, here are the initial results:
Hyperfine Windows
If we run it under optimal conditions, it takes about 900ms on Windows, and about 100ms on Fedora.
Now that I can reproduce the delay reliably, I can start exploring some theories…
Filesystem
My first thought is that filesystem performance under WSL can be very slow, could that explain the difference?
Nope, it’s actually a bit faster on Windows! If I browse the logs, it looks related to fonts, and I do have fewer fonts installed on Windows. I suspect that causes fontconfig to query less files on initialization.
Whatever the reason, I concluded it wasn’t a big proportion of startup time, so it doesn’t seem worth worrying about.
X Server
The issue must be the X server, how fast is a very simple X client?
taviso@fedora:~$ hyperfine xdpyinfo
Benchmark 1: xdpyinfo
Time (mean ± σ): 4.6 ms ± 0.8 ms [User: 1.9 ms, System: 1.6 ms]
Range (min … max): 3.1 ms … 9.4 ms 317 runs
That does run slower on Windows, but not significantly slower – perhaps I actually need to create a window to see a difference?
That is also slower on Windows – this is understandable, it has to translate from X11 to win32 – but not so slow that it adequately explains the problem alone.
This XtRealizeWidget call does seem slow, and I don’t see that on Fedora, what is calling that?
$ gdb --args ./xterm -e true
Reading symbols from ./xterm...
(gdb) b XtRealizeWidget
Breakpoint 1 at 0x2db20
(gdb) r
Breakpoint 1, 0x00007fffff43d940 in XtRealizeWidget () from /lib/x86_64-linux-gnu/libXt.so.6
(gdb) bt
#0 0x00007fffff43d940 in XtRealizeWidget () from /lib/x86_64-linux-gnu/libXt.so.6
...
#4 0x00007fffff44c4f6 in XtSetValues () from /lib/x86_64-linux-gnu/libXt.so.6
#5 0x0000000008086683 in UpdateMenuItem (menu=0x811c5c0 <mainMenuEntries>, which=0, val=1)
at ./menu.c:1026
#6 0x000000000808a763 in update_toolbar () at ./menu.c:3366
Ah-ha – it’s the toolbar feature. It’s disabled at compile time on Fedora, but I quite like it and enable it on Windows.
If I disable that, startup is a little faster, I wonder if there are any other features that are slowing down initialization…?
Parameter Scanning
The hyperfine utility has a feature called parameter scan, where it it will try a bunch of settings for you and tell you which one is fastest.
Let’s ask XTerm what features are available, and toggle each one on and off.
Note: xterm has a lot of features, I’m truncating the list for brevity!
Now we can give each of those to hyperfine, and let it figure out which settings have the most noticable effect.
That took about 20 minutes to run, and reports:
$ hyperfine --parameter-list res allowBoldFonts,allow... \
--parameter-list bool true,false \
"xterm -xrm 'XTerm*{res}: {bool}' -e true"
...
Benchmark 240: xterm -xrm 'XTerm*xftTrackMemUsage: false' -e true
Time (mean ± σ): 140.1 ms ± 7.4 ms [User: 30.8 ms, System: 23.4 ms]
Range (min … max): 129.2 ms … 153.4 ms 21 runs
Summary
xterm -xrm 'XTerm*tekInhibit: true' -e true ran
1.01 ± 0.08 times faster than xterm -xrm 'XTerm*allowSendEvents: false' -e true
...
This helped a little, I found a combination of options that saved around 200ms total. One example was tekInhibit, which disables the Tektronix emulation. That’s usually used as a graphing mode – it’s actually pretty cool.
Still, it isn’t a big enough difference, and this is as far as I was able to get through tweaking settings.
I’m starting to think that this is just death by a thousand cuts, everything just has some small overhead on Windows and it adds up…
Caching
There’s a simple generic solution to slow startup performance: server mode.
The idea is to cache a few processes in the background, then all the slow stuff will already be done, ready for you to start working immediately.
XTerm doesn’t have this feature natively, but it’s not complicated, I can add it.
To do this, I will use deferred mapping – that just means that a program is running, but the window is not visible yet.
I tried a few solutions and found one that works well, an LD_PRELOAD library. All it does is intercept any toplevel XMapWindow() calls, then pause execution until it receives a signal.
It’s a bit hacky, but my code is here, if you’re interested.
To use it, you need something to manage the cache for you in the background, xargs will work!