Blog posts tagged: perf
News and other things I find interesting
Snappy optimizations for faster Firefox startup
Last modified: Wednesday, February 08, 2012
Snappy is a project that aims to improve is improving Firefox responsiveness. As part of this project I've been working on Firefox startup optimizations.
You can spend days, months, or even years trying to optimize code, but if you don't understand where to optimize, you won't be making a difference. Likewise, searching for an optimizations in a competitor's product is usually not the best way to get results.
It's nearly impossible to optimize code by guessing what is slow, you need to profile the code to understand the problems. Once you understand the problems you can then fix them, and then finally test to make sure they are fixed.
Here are some initial Firefox startup optimizations I've done with some lite profiling on Windows over the period of a few days:
- bug 724177 - 30-50ms (5%) Firefox startup speed optimization on Windows in nsLocalFileWin
- bug 724256 - Optimize move file calls on Windows, saving about 2ms per call (1 call on startup)
- bug 722225 - Firefox startup speed by ~5% (-70ms) on Windows by optimizing D3D10CreateDevice1
- bug 722315 - Firefox startup speed by by ~5% (-76ms) on Windows by lazy loading CLSID_DragDropHelper
- bug 724203 - Optimize nsLocalFile::IsDirectory on Windows by 50% giving 5ms startup improvement
- bug 724207 - Save 15-20ms on startup from unused file attributes fetch when opening files
- bug 692255 - Find a way to get rid of prefetch files on Windows for faster startup
- bug 725444 - 10-15ms main thread startup optimization in Windows AudioSession
I plan to keep doing more startup optimizations for a while in between silent update work.
So how are optimizations found? There are many ways, the first I will talk about is Xperf.
Xperf
One way to identify optimization bottlenecks on Windows is to use Xperf. Xperf is a great way to find both IO bottlenecks and CPU usage bottlenecks.
It can tell you which files use the most IO, what the IO patterns are, which functions take the most time, and allow you to group based on different criteria to view the data in different ways. It is the tool that made Windows 7 so much faster than Windows Vista. It actually does a ton more than that but I won't focus on it in the context of this blog post.
Built in Mozilla profiler
Another way to find optimizations is to use Benoit's profiler (SPS). Currently it works well on Mac, and Windows is nearly completed.
I'm very excited to start using it on Windows.
It far surpasses the tools I've been using to find the bugs and fixes above.
Custom profiler
I haven't seen how the Mozilla built-in profiler is implemented, but I suspect that this custom profiler is a simplified version of the Mozilla profiler's Pseudostack mode.
I was able to find all of the above optimizations (with the exception of the prefetch task) with a tiny class with a couple of wrapper macros.
Basically the class is an RAII class built around the Win32 API QueryPerformanceCounter and QueryPerformanceFrequency.
These functions allow you to get very detailed timing results.
For any function you want to profile you simply add the PROFILE_FUNCTION() macro to the start of the function. I usually start by adding this to each non-trivial function in a file.
Once you've found a function that takes a non trivial amount of time you can start to dig deeper into other functions inside of the function until you find the root cause.
But sometimes a function has a lot of code and you aren't sure what the slow part of the function is. For this I use a second macro PROFILE_STR("FunctionName:N") where N is a number I increment evenly spaced out. The string can be anything, I just use that normally.
These macros just create an object. The constructor of the object stores the start time, the destructor of the object gets the end time.
The destructor also uses the function name or string as a lookup in a global map that keeps track of the number of hits, the maximum length of time, the minimum length of time, and the average length of time for each function/string.
There is a third macro I was using to dump the results to a file at particular events I wanted to focus on, like first paint, or when the session is restored.
One thing you have to look out for with this method is if you happen to put the calls in a recursive function.
In that case it will appear to be a bigger bottleneck because the first call will include the full length of time of all calls that contain it.
All of the inner calls will add onto that.
The output of this function is made when the third macro to dump the results is called. It dumps the results to a simple formatted .txt file:
Verifying startup results
If you're working on Firefox, a good way to verify that your speed optimization made a difference is to use the about:startup extension.
This extension times the startup of important events like firstPaint
and sessionRestored and gives you the average up top.
To verify my results I just did 20 startups in sequence with a release build that contained my patches vs a release build that did not contain my patches.
Here is an example after one of the patches above:
Tags: perf snappy firefox mozilla
Add a new comment | 11 comment(s)
|
Some mobile browsers show a screenshot of the browser window while loading the actual program to make cold startup feel faster. Is this something that could be done on the desktop too? I'm sure that the problem is a lot more complicated than I think it is, but are the reasons for not doing this technical, or maybe also philosophical? Because as a user, I know I really don't care what "dirty" tricks have been used, or what the measured startup time is, as long as the startup *feels* faster. |
|
Are there measurements made on the installed base of Firefox like what is done for memory usage ? |
|
> Some mobile browsers show a screenshot of the browser I think early window loading is already done for Firefox on mobile, but ya that might be possible on Desktop as well. I'd prefer to work on real speed opts that do less work or deffer work until after startup on another non-main thread though. But this is something we can look into after those optimizaitons are all done. Thanks for the comment. > Are there measurements made on the installed base of There is a whole project for memory usage called MemShrink here: There are a lot of other measurements as well and telemetry data for those that enable it. This enables us to see problems that aren't on the machines of the developers. |
|
@j: Speaking from personal experience using Firefox from a USB stick on my college's computers, I've found that I care less about how long it takes Firefox to appear, and more about when it becomes usable. There are few things more frustrating than trying to interact with a nearly unresponsive Firefox (or a Firefox that's responsive for a few seconds, then grinds to a halt while more I/O goes on). So I don't think it's a very good idea to create the illusion that the program has finished launching when it hasn't. Mind you, I haven't been in a position to test Firefox like this in a while. Hopefully with all the optimizations that have been made in the past half year or so, launching Firefox on my college's PCs has become much smoother (one of the problems is that their virus scanner seems to want to scan everything on my USB stick the moment I read anything from there; it's very frustrating and I doubt Firefox can do anything about it). |
|
Does SPS work on non-main threads? Last time I tried it (on mobile) it didn't... |
|
Are you measuring cold startup times? I found a nice way, by using the sysinternals RamMap utility. Before launching an exe, just do Empty Standby List in it. |
|
I've been mostly focused on hot startup for the short term. Thanks for the tip! I haven't heard of that util. |
|
> Does SPS work on non-main threads? I'm not sure about its implementation but I'm pretty sure that at least for Pseudostack mode it happens on the same thread as you add the statements on. |
|
Not sure if this is something you would like to work on, but I believe bug 650968 can be a big factor in general Firefox startup slowness. You might want to check it out. |
|
I wish you would finish optimising nsLocalFileWin :-( |
|
Ya I plan to spend about 80% of my time on Metro so I'll still have time for things like optimizations just at a slower pace for those. |



