Recently we decided to add Windows 10 testers to our Windows unit test cluster, which had been using Windows 7 Pro.
We immediately noticed performance problems. A test suite that previously took about 100 minutes to run, now took 300 or even 360 minutes. We tried fixing the problem by tweaking the OS configuration, replacing drivers, and adjusting the virtual machine’s configuration. Nothing helped, and we went back to the old Windows 7 Pro instance.
Keep digging
However, I wanted to know what was causing the problem – the VM environment, Windows 10, or something in the tests and the tested Chromium or Vivaldi code. At this time I was replacing my six-year-old home PC (which incidentally, was the first machine used to compile the source code that became Vivaldi) and I decided to remove my own disks from the machine, take it down to the office, install a new SSD, and install Windows 10 on it, and run some tests.
I also installed Win 7 Pro back into the machine.
First result: It was definitely not the VM environment. One of the tests that took 100 minutes when run on Windows 10 on this machine, took 20 (twenty!) minutes on Windows 7.
I also did tests with “raw” Chromium builds to discover if the problem was in our code, or also existed upstream. I even used a six-month-old build to check if the problem was new, or long-standing. I used a raw Chromium build to investigate the Mac issue when using bisect.
After a bit more testing I contacted the Chromium Ops group on their Slack channel to discover if they were seeing similar issues, even if not on the same magnitude.
There were signs of some differences between Windows 7 and Windows 10, but the differences weren’t big and their test system is quite different from ours. They suggested filing a bug report, which I did.
After doing more testing, including doing more detailed logging of how long the tests took to run, I finally was able to pin the extra time down to a CreateProcess call, the Windows function used to start new processes, which is used by the test executables to run smaller groups of tests to isolate them from the main test executable, which means that crashing tests do not cause the entire test suite to fail.
A big reduction in performance
In the test suite I was using for my investigation, there were almost 11,000 tests run in separate processes in groups of 10, so the test suite would spawn about 1100 processes during one test run.
What I saw was that on Windows 7 this step took at most 2-3 milliseconds, while on Windows 10, it took at least 300 milliseconds, possibly as much as 600. And due to how the Chromium test code starts these processes, only one test process could be started at a time (and the end of test group handling also had to line up in the same line), so to start 8 processes would take at least 2.4 seconds, and only then could the results of a test group that usually used just 100 milliseconds to complete, be processed (that is, the first test group after running for 100 milliseconds had to wait at least 2300 milliseconds before its slot could start with the next group). That is a big reduction in performance.
CreateProcess had O(n^2) performance for CFG data. Now it doesn't.
Timeline of this Windows performance bug:
April 15: Initial private report
April 21: Isolated repro and blog post
April 23: Fix built (flighting in a few weeks)https://t.co/PLsWMqeier— Bruce Dawson (@BruceDawson0xB) April 24, 2019
At this time, Bruce Dawson in the Chromium team started looking at the issue. Although he initially suspected a different problem, he quickly discovered that the problem was caused by a Windows security feature called Control Flow Guard (CFG), and that the time Windows used to initialize this apparently increased with the *square* of the number of functions in the executable (double the number of functions, time increases fourfold).
This feature is very useful to protect applications like a browser, but not really necessary for test executables, so he worked around the issue in the test executable by disabling a feature for them previously used by all Windows executables.
Then he bounced the ball over to Microsoft, since having an important feature with “quadratic” time is not a good thing, so he suspected it was a bug. Within a couple of days, devs at MS reported back that they had fixed the issue.
Perhaps an issue for normal browser usage
It could be that this issue affects normal browser usage, too, since both Chrome and Vivaldi start new processes for each tab, but as much of the actual code is located in DLLs shared among the processes, and the Windows CFG configuration is reused for DLLs, it might not be as noticeable in normal use.
Even if it does cause some performance issues, disabling the security feature is not an option.
Performance issues like this can affect both the product performance and development turnaround time. The Chromium team compiles and runs all updates through the tests suite before they are accepted into the main code base, so delays here would cause delays in how quickly they can get updates accepted. So locating and fixing performance issues is important.
The patch from Microsoft should hopefully eliminate the issue. The bug was fixed last week, so we expect that it will be in an upcoming Patch Tuesday.
Not the first performance issue reported
This is the second performance issue we have reported recently. The previous case was a regression on some of our Mac testers. That turned out to be due to the database code being diligent about making sure all data is written to the disk, and the Mac OS using a very slow method for this. It was especially slow on machines using slow disks like our tester. After we contacted the Chromium team, they did discover that there had been a slowdown of their tests, too, but not on the same magnitude as we had observed. Finding the cause did suggest that maybe the database code was too aggressive in performing this kind of disk syncing, and they are investigating this. However, this is a ticklish issue, since ensuring that the database is properly stored on the disk is important in case of loss of power. In the meantime we were able to work around the issue when running our tests.
Investigation of the Windows 10 issue goes on
The Windows 10 performance issue is still being investigated. Bruce Dawson found at least one major issue in a group of tests while he was working on the CFG problem, and he did notice a few other issues, too. I have also observed that Windows 10 performance still does not quite match Windows 7, so there are probably more issues to be discovered. Though they may not be as much of a problem as the CFG issue, they could still indicate problems that also affect the browser. We won’t know until we find them.
So, there may well be more performance bugs being filed to Chromium from us in the future.