* Let tests run without bash
* Import pmcstat callgraph format
This eliminates the need for stackcollapse-pmc.pl and its dependency on
Perl to import FreeBSD callgraphs. Tracing can be done completely with
tools included in the base system. For example:
```
# kldload hwpmc
# pmcstat -S instructions -O sample.out sleep 1
# pmcstat -R sample.out -G sample.pmcstat.graph
```
Steps to reproduce:
1. Open https://www.speedscope.app/
2. Try to import an invalid file such as [invalid.json](https://github.com/jlfwong/speedscope/files/14167326/invalid.json). The page says "Something went wrong. Check the JS console for more details."
3. Now try to import a valid file such as [simple.json](https://github.com/jlfwong/speedscope/files/14167335/simple.json). The page says "Something went wrong. Check the JS console for more details." even though this second file is valid.
Explanation of the fix (copied from the commit message):
> We need to clear the error flag, otherwise once there is an error and we display "Something went wrong" we will keep displaying that forever even when the user imports a valid file.
The previous description ("event values will be relative to this
startValue") was ambiguous.
Suppose the profile starts at wall time 1000ms and the first event is
at wall time 1003ms.
The intention is that startValue should be set to 1000 and the "at"
value of the event should be set to 1003. The viewer's time axis will
start at 0ms and the first event will be displayed at 3ms.
But the previous description could be incorrectly interpreted as
saying that the "at" value of the first event should be set to 3 (the
time relative to startValue, as opposed to the absolute wall time).
Clarify that "relative" is referring to how the viewer displays the
data, not about which values to store in the file.
In the existing code, if `traceEvents` did not have any importable events, the profile would be marked as empty. This was a bug, as the dev Hermes profiles I was testing with had one X event which made the code work. However we do not need to guarantee (and the spec doesn't seem to) any traceEvents being present as long as we have samples and stack frames.
When I tested a production profile taken from Hermes it did not have any importable events, just a metadata event with a thread name. This PR updates the implementation to iterate over partitioned samples instead of traceEvents so we construct profiles for all thread + process pairs referenced in the samples array.
| Before | After |
| --- | ----- |
| <img width="1454" alt="Screenshot 2024-01-03 at 9 58 56 AM" src="https://github.com/jlfwong/speedscope/assets/9957046/78c098a7-b374-4492-ad13-9ca79afdb40c"> | <img width="1450" alt="Screenshot 2024-01-03 at 9 58 17 AM" src="https://github.com/jlfwong/speedscope/assets/9957046/d2d5e82b-fa3e-43db-bf8a-e8c3b84cd13a"> |
I have been taking a lot of profiles using the Hermes profiler, but I noticed that they sometimes to not show up properly. After debugging what exactly was going on, I realized it was because the logic in `selectQueueToTakeFromNext` only checks for name, instead of the key for the event. I had a bunch of events with the name `anonymous` that were getting improperly exited before they should have been due to this logic.
This fix makes the code more robust if there are added "args" which differentiate an event from another (as is the case in Hermes profiles), however it would still be an issue if they key just defaults to the name.
Example profile before:
<img width="1728" alt="Screenshot 2023-12-15 at 12 54 04 AM" src="https://github.com/jlfwong/speedscope/assets/9957046/345f556e-f944-40f1-b59c-748133acb950">
What it should look like (in Perfetto):
<img width="1051" alt="Screenshot 2023-12-15 at 8 51 38 AM" src="https://github.com/jlfwong/speedscope/assets/9957046/7473cdf8-95f1-49de-a0c7-ef4ac081ff85">
After the fix:
<img width="1728" alt="Screenshot 2023-12-15 at 12 54 29 AM" src="https://github.com/jlfwong/speedscope/assets/9957046/56b0870a-538b-4916-acc8-de2b7dfd78eb">
Wow this was surprising. As reported in #448, the `simple.speedscope.json` file failed import. This was surprising to me because there's a test that covers that file to ensure it imports correctly.
The file provided in #448, however, is from a version of speedscope from 5 years ago before the file had been pretty printed. It turns out that when this *particular* file is not pretty-printed, it's a schematically valid pprof profile.
The fix is to do some bounds checks and return null. After the change, the file imports as you'd expect after realizing its not actually a valid pprof profile.
Fixes#448
The publish, deploy, and release process is annoying enough at the moment that I avoid doing it frequently. Let's automate most of it to reduce the friction
Follow up on this PR #435.
Currently, it took roughly 22 seconds to load my 1.3GB file. After inspecting the profiler, there's a large chunk of time spending in Frame.getOrInsert. I figure we can reduce the number of invocations by half. It reduces the load time to roughly 18 seconds.I also tested with a smaller file (~350MB), and it show similar gains, about 15-20%
Currently, importing files generated by linux perf tool whose some blocks exceed V8 strings limit can crash the application. This issue is similar to the one in #385.
This PR fixes it by changing parseEvents to work directly with lines instead of chunking lines into blocks first.
Fixes#433
The current behavior of splitLines is to eagerly split all the lines and return an array of strings.
This PR improves this by returning an iterator instead, which will emit lines. This lets callers decide how to best use the splitLines function (i.e. lazily enumerate over lines)
Relates to #433
The previous behavior was to use the StartLine of a function as the line number to show in speedscope. However, the Line object has more precise line information, and we should only fallback to StartLine if we don't have this more detailed information.
Looking at the [documentation for the pprof proto](https://github.com/google/pprof/tree/main/proto#general-structure-of-a-profile), this is more how it intends to interpret line information:
> location: A unique place in the program, commonly mapped to a single instruction address. It has a unique nonzero id, to be referenced from the samples. It contains source information in the form of lines, and a mapping id that points to a binary.
> function: A program function as defined in the program source. It has a unique nonzero id, referenced from the location lines. It contains a human-readable name for the function (eg a C++ demangled name), a system name (eg a C++ mangled name), the name of the corresponding source file, and other function attributes.
Here is a sample profile that had line-level info on the Line object of the profile:
Before:
<img width="449" alt="Screen Shot 2022-11-03 at 11 17 11 AM" src="https://user-images.githubusercontent.com/618615/199760730-712daa70-6cfb-4e90-b037-b571809c26d9.png">
After:
<img width="449" alt="Screen Shot 2022-11-03 at 11 17 22 AM" src="https://user-images.githubusercontent.com/618615/199760777-1c0d5581-7b29-42b7-b642-6035f7d25405.png">
This attempts to improve the quality of the on-CPU profiles stackprof provides. Rather than weighing samples by their timestamp deltas, which, in our opinion, are only valid in wall-clock mode, this weighs callchains by:
```
S = number of samples
P = sample period in nanoseconds
W = S * P
```
The difference after this change is quite substantial, specially in profiles that previously were showing up with heavy IO frames:
* Total profile weight is almost down by 90%, which actually makes sense for an on-CPU profile if the app is relatively idle
* Certain callchains that blocked in syscalls / IO are now much lower weight. This was what I was expecting to find.
Here is an example of the latter point.
In delta mode, we see an io select taking a long time, it is a significant portion of the profile:
<img width="1100" alt="236936508-709bee01-d616-4246-ba74-ab004331dcd3" src="https://github.com/dalehamel/speedscope/assets/4398256/39140f1e-50a9-4f33-8a61-ec98b6273fd4">
But in period scaling mode, it is only a couple of sample periods ultimately:
<img width="206" alt="236936693-9d44304e-a1c2-4906-b3c8-50e19e6f9f27" src="https://github.com/dalehamel/speedscope/assets/4398256/7d19077f-ef25-4d79-980b-cfa1775d928d">
Sometimes, stackprof frames don't get generated with a `name` in the frame.
I think it's probably worth tracking down why that is, but in the mean
time, speedscope simply crashes with a method call on `undefined`. The
crash is bad because it only shows up in the console--there's no visible
message saying that speedscope failed to parse and load a profile.
For more information, see #378
This fixes the crash by simply skipping the logic in demangle if the name
field isn't present on a frame. That's probably a fine tradeoff? Because in
this case, stackprof is generating ruby frames, which means that C++ name
demangling won't apply.
I have tested this by running the scripts/prepare-test-installation.sh
script and verifying that `bin/cli.js` can now successfully load the
included profile. Before these changes, I verified that speedscope failed
with the behavior mentioned in #378.
I've also included a snapshot test case, but it seems that the Jest test
harness only tests the parsing, not the rendering (correct me if I'm wrong).
So I haven't actually been able to create an automated test that would catch
a regression. Please let me know if there's a better way to have written this
test.
I've staged the commits on this branch so that the second commit (dcb9840)
showcases the minimal diff to a stackprof file that reproduces the bug. That is,
rather than look at the thousands of new lines in the stackprof profile, you can
view the second commit to see the salient part of the file.
Fixes#415
* Interprets pprof's defaultSampleType as an index into the string table [as documented in the proto](0414c2f617/src/import/profile.proto (L85)), not as an index into the sample types repeated-field. This allows parsing to succeed when the string-table index is not a valid sampleTypes index, which is common on allocation profiles.
* Update the pprof snapshot. This is necessarily because we were previously interpreting an empty defaultSampleType as truthy-but-zero when long.js is present, ie: the first sample-type. But Speedscope-in-the-browser doesn't seem to include long.js, so our tests were disagreeing with in-browser behavior. With this PR, that should be fixed.