Troubleshooting: Terminal Lag

Tavis Ormandy

$Id: a07cf90837a3c4373b82d6724b97593810766af7 $

I think I should blog more about random troubleshooting sessions, if nothing else it will remind me what steps I took when it inevitably happens again!

Okay, here is the first one – why is my xterm opening so slowly?

Background

I have two similarly specced machines at my desk – my primary workstation running Fedora Linux, and a Windows 11 machine. They share the same monitor and input devices, and I switch between them with an iogear KVM.

I do the bulk of my work in either a browser or a terminal. This is true even on Windows, where I rely heavily on WSL.

This works well for me, and I’m happy enough with the setup.

Issue

I have the shortcut Super+1 bound to xterm on both machines, and I probably use this hundreds of times per day.

Here is how that looks on Fedora:

Fedora Terminal

It takes about 300ms from key activation to a terminal being ready. This is fine, I’ve never noticed any problem.

However, let’s compare that to Windows:

Windows Terminal

That’s about 1600ms before I can type, over 5 times slower! This is slow enough that it bothers me, and I use this shortcut so often that I want to solve it.

I don’t think many people care about xterm performance on Windows, so I guess that means it’s up to me to solve this 🙂

Window Effects

Hey, wait a minute… ENHANCE! 👀

Windows Fading Effect

Why does the window fade in like that? It looks like the Window is ready when the effect starts, but I can’t interact with it until it completes. If I count those frames, this animation must be costing me ~200ms! 🤬

I always disable anything like animation or compositing effects, so I’m confused where this is coming from.

Windows Performance Settings

I tested with some native windows programs like notepad and calc – they just appear instantly… so what is causing that?

I experimented with it a bit, I can see other windows behind it as it fades in, so I think something must be changing the opacity. Searching msdn, it looks like the function for that is SetLayeredWindowAttributes().

Could something be calling that, is my X server betraying me?

$ dumpbin /imports X410.exe  | grep SetLayeredWindowAttributes
                         33A SetLayeredWindowAttributes

This looks like the culprit!! I’m using a server called X410, it seems like it’s adding it’s own animation effects, and doesn’t have any way to disable it. I’m reluctant to switch to an alternative – that could just replace this issue with a different issue to troubleshoot.

Is it possible I can just stop it from doing that with a debugger?

$ cdb -p 6624
(19e0.2ad0): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00007ff9`1f9b3c90 cc              int     3
0:014> eb win32u!NtUserSetLayeredWindowAttributes c3
0:014> .detach
Detached
NoTarget> q
quit:

Ah-ha, that actually worked!!!

I’ve added that cdb command into my xinit initialization, and it looks a lot snappier. That saved nearly 300ms, so we’re down to just 4 times slower than Fedora!

Profiling

Okay, let’s try get some real numbers. I like the tool hyperfine for this, here are the initial results:

Hyperfine Windows

If we run it under optimal conditions, it takes about 900ms on Windows, and about 100ms on Fedora.

Now that I can reproduce the delay reliably, I can start exploring some theories…

Filesystem

My first thought is that filesystem performance under WSL can be very slow, could that explain the difference?

taviso@WORKSTATION:~$ strace -wc -efile xterm -e true
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 43.93    0.014870          56       261        56 openat
 23.79    0.008053          31       257        33 access
 20.48    0.006932          32       211        12 newfstatat
  7.07    0.002394          23       100        54 readlink
  4.72    0.001596        1596         1           execve
------ ----------- ----------- --------- --------- ----------------
100.00    0.033845          40       830       155 total

Nope, it’s actually a bit faster on Windows! If I browse the logs, it looks related to fonts, and I do have fewer fonts installed on Windows. I suspect that causes fontconfig to query less files on initialization.

Whatever the reason, I concluded it wasn’t a big proportion of startup time, so it doesn’t seem worth worrying about.

X Server

The issue must be the X server, how fast is a very simple X client?

taviso@fedora:~$ hyperfine xdpyinfo
Benchmark 1: xdpyinfo
  Time (mean ± σ):       4.6 ms ±   0.8 ms    [User: 1.9 ms, System: 1.6 ms]
  Range (min … max):     3.1 ms …   9.4 ms    317 runs

That does run slower on Windows, but not significantly slower – perhaps I actually need to create a window to see a difference?

taviso@fedora:~$ x11perf -repeat 1 -subs 8 -popup
    5600000 reps @   0.0010 msec (1040000.0/sec): Hide/expose window via popup (8 kids)

That is also slower on Windows – this is understandable, it has to translate from X11 to win32 – but not so slow that it adequately explains the problem alone.

FreeType

Could it be a FreeType or FontConfig issue?

taviso@WORKSTATION:~$ ftbench -p consola.ttf
executing tests:
  Load                           2.491 us/op     809010 done
  Load_Advances (Normal)         2.437 us/op     827190 done
  Load_Advances (Fast)           0.022 us/op   88575990 done
  Load_Advances (Unscaled)       0.013 us/op  147448890 done
  Render                         2.039 us/op     390870 done
  Get_Glyph                      0.921 us/op     509040 done
  Get_Char_Index                 0.018 us/op  111551968 done
  Iterate CMap                  21.539 us/op      88503 done
  New_Face                       6.271 us/op     284029 done
  Embolden                       2.491 us/op     357540 done
  Stroke                        24.663 us/op      69690 done
  Get_BBox                       0.865 us/op     487830 done
  Get_CBox                       0.679 us/op     506010 done

Loading fonts is slightly slower, but the other numbers seem fine, and it’s not that much slower. I don’t think it’s this.

Features

Maybe I can get some clues from ltrace?

$ ltrace -c xterm -e true
% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 26.74    1.977389      988694         2 XtRealizeWidget
 22.18    1.640132       14139       116 XtSetValues
 13.39    0.989966      197993         5 XtVaCreateManagedWidget
  6.42    0.474410         200      2361 strlen
  5.51    0.407617      203808         2 read
  4.21    0.311086         197      1572 FcCharSetHasChar
  2.92    0.215975        2571        84 XtCreateManagedWidget
  2.80    0.207397       69132         3 XtVaCreatePopupShell
  2.39    0.176458         218       808 XftTextExtents32
  1.51    0.111306      111306         1 XpmReadFileToPixmap
  1.48    0.109486      109486         1 XtOpenApplication
  1.44    0.106786         203       524 malloc

This XtRealizeWidget call does seem slow, and I don’t see that on Fedora, what is calling that?

$ gdb --args ./xterm -e true
Reading symbols from ./xterm...
(gdb) b XtRealizeWidget
Breakpoint 1 at 0x2db20
(gdb) r
Breakpoint 1, 0x00007fffff43d940 in XtRealizeWidget () from /lib/x86_64-linux-gnu/libXt.so.6
(gdb) bt
#0  0x00007fffff43d940 in XtRealizeWidget () from /lib/x86_64-linux-gnu/libXt.so.6
...
#4  0x00007fffff44c4f6 in XtSetValues () from /lib/x86_64-linux-gnu/libXt.so.6
#5  0x0000000008086683 in UpdateMenuItem (menu=0x811c5c0 <mainMenuEntries>, which=0, val=1)
    at ./menu.c:1026
#6  0x000000000808a763 in update_toolbar () at ./menu.c:3366

Ah-ha – it’s the toolbar feature. It’s disabled at compile time on Fedora, but I quite like it and enable it on Windows.

If I disable that, startup is a little faster, I wonder if there are any other features that are slowing down initialization…?

Parameter Scanning

The hyperfine utility has a feature called parameter scan, where it it will try a bunch of settings for you and tell you which one is fastest.

Let’s ask XTerm what features are available, and toggle each one on and off.

$ xterm -report-xres -e true
activeIcon              : default
allowBoldFonts          : true
allowC1Printable        : false
allowColorOps           : true
allowFontOps            : false
allowMouseOps           : true
...

I’ll start by extracting all the settings that are booleans.

$ xterm -report-xres -e true | grep -Po '^\S+(?=\s+: (true|false))' | tr '\n' ','
allowBoldFonts,allowColorOps,allowMouseOps...

Note: xterm has a lot of features, I’m truncating the list for brevity!

Now we can give each of those to hyperfine, and let it figure out which settings have the most noticable effect.

That took about 20 minutes to run, and reports:

$ hyperfine --parameter-list res allowBoldFonts,allow... \
            --parameter-list bool true,false \
            "xterm -xrm 'XTerm*{res}: {bool}' -e true"
...
Benchmark 240: xterm -xrm 'XTerm*xftTrackMemUsage: false' -e true
  Time (mean ± σ):     140.1 ms ±   7.4 ms    [User: 30.8 ms, System: 23.4 ms]
  Range (min … max):   129.2 ms … 153.4 ms    21 runs
 
Summary
  xterm -xrm 'XTerm*tekInhibit: true' -e true ran
    1.01 ± 0.08 times faster than xterm -xrm 'XTerm*allowSendEvents: false' -e true
    ...

This helped a little, I found a combination of options that saved around 200ms total. One example was tekInhibit, which disables the Tektronix emulation. That’s usually used as a graphing mode – it’s actually pretty cool.

Still, it isn’t a big enough difference, and this is as far as I was able to get through tweaking settings.

I’m starting to think that this is just death by a thousand cuts, everything just has some small overhead on Windows and it adds up…

Caching

There’s a simple generic solution to slow startup performance: server mode.

The idea is to cache a few processes in the background, then all the slow stuff will already be done, ready for you to start working immediately.

XTerm doesn’t have this feature natively, but it’s not complicated, I can add it.

To do this, I will use deferred mapping – that just means that a program is running, but the window is not visible yet.

I tried a few solutions and found one that works well, an LD_PRELOAD library. All it does is intercept any toplevel XMapWindow() calls, then pause execution until it receives a signal.

It’s a bit hacky, but my code is here, if you’re interested.

To use it, you need something to manage the cache for you in the background, xargs will work!

$ xargs --null --arg-file=/dev/zero --max-procs=3 --replace -- \
     env LD_PRELOAD=defermap.so xterm -display :0 [PARAMS...]

This will keep three xterms running in the background.

Note: If you often rapidly start terminals in quick succession, increase max-procs.

When you want a new terminal, instead of running xterm as you normally would, do this instead:

$ pkill --oldest --signal SIGUSR1 xtermserver

A terminal should appear near-instantly. You can now execute that instead of xterm, and startup performance should be solved.

Conclusion

This whole process took a while! Now I need to adjust my shortcuts to run pkill instead of xterm, and I can compare the results.

Windows Terminal

Counting the frames in that video, it’s down to about 366ms, just 60ms slower than Fedora, this is totally acceptable!

I’ve been using this configuration for a few days, so far it’s working great. I haven’t noticed any issues running it this way.

I highly doubt anyone else will find this useful, who else is using XTerm on Windows? 😆

Nevertheless, if you have a better solution, or can think of something else I can try, let me know!