Raul Smith

Posted on Jan 23

When Threads Starve: Finding CPU Time Theft With Perfetto and ftrace

#mobile #mobileapp

The first time I saw it, I blamed my code.

Frames were missing their deadlines. Input felt delayed. Animations stuttered in a way that couldn’t be reproduced reliably. CPU usage looked fine. Memory was stable. Nothing obvious was broken.

Yet users kept describing the same thing: “The app feels heavy.”

That word — heavy — is usually a clue that threads are running, but not when they need to.

What I eventually discovered wasn’t a memory leak or inefficient algorithm. It was CPU time theft. My threads weren’t slow. They were starving.

The illusion of “available CPU”

Most developers assume CPU availability is binary. Either the CPU is busy or it isn’t.

In reality, mobile CPUs are time-sliced, frequency-clamped, thermally throttled, and aggressively scheduled. Your thread can be runnable and still not run.

From the app’s point of view, nothing crashes. No exception is thrown. Execution just… pauses.

That pause is where starvation hides.

How starvation actually happens

Thread starvation on mobile rarely comes from a single cause. It’s usually the result of layered pressure:

Competing foreground apps
System services with elevated priority
Thermal throttling reducing effective CPU capacity
Background limits shrinking time slices
Excessive concurrency inside your own process

Each factor steals a few milliseconds. Together, they break assumptions.

A render thread that expects 16ms suddenly gets 6ms. A background worker that normally finishes before the next frame now spills over. Jank appears, not because work increased, but because time disappeared.

Why traditional profiling didn’t help me

CPU profilers told me where time was spent, not when execution was denied.

Method traces looked clean. Flame graphs didn’t show contention. Nothing explained why a trivial operation sometimes took 10× longer.

That’s when I stopped profiling code and started profiling the scheduler.

Enter Perfetto: seeing stolen time

Perfetto doesn’t just show what your app did. It shows what the system let it do.

The first scheduler trace I captured was uncomfortable to read. My threads were marked runnable, but they weren’t being scheduled. Large gaps appeared between execution slices.

That gap wasn’t my code running slowly. It was my code not running at all.

Once you see that visually, everything changes.

ftrace confirmed the suspicion

Perfetto gave me the overview. ftrace gave me the proof.

With scheduler events enabled, I could see:

Threads waking up on time
Threads being ready to run
Threads repeatedly preempted by higher-priority work
Long delays before being scheduled again

Nothing was blocked. Nothing was deadlocked. The threads were alive — just ignored.

This is CPU time theft in its purest form.

The UI thread is not special enough

One hard lesson: the UI thread is privileged, but not immune.

Under contention, rendering competes with everything else. System compositors, media services, sensors, and other apps all want slices.

If your app spawns too much background work, you can indirectly starve your own UI thread.

I had done exactly that.

Concurrency was my silent enemy

I believed more threads meant smoother performance.

In practice, each extra thread increased scheduling pressure. The CPU didn’t get faster. It just got more crowded.

The scheduler spent more time deciding who runs next than letting meaningful work complete.

Reducing concurrency — fewer workers, tighter lifetimes, clearer ownership — immediately improved responsiveness.

Not because the code changed, but because the scheduler had fewer fights to manage.

Thermal state rewrote the rules mid-session

One thing Perfetto made obvious was how CPU frequency dropped over time.

As the device warmed up, execution windows shrank. Threads that once fit neatly into frame boundaries now overflowed.

This explained why bugs appeared only after long usage sessions and why QA couldn’t reproduce them consistently.

The environment wasn’t stable. My assumptions were.

Designing for starvation instead of fighting it

Once I accepted starvation as a normal condition, my architecture shifted.

Tasks became smaller and interruptible
UI work assumed delays and resumed gracefully
Background work stopped relying on precise timing
State transitions became resumable, not linear
Metrics tracked worst-case latency, not averages

The app didn’t become faster. It became resilient.

That’s the real optimization.

Why this matters in real teams

I’ve seen the same starvation patterns surface across teams working in mobile app development San Diego environments where apps run alongside navigation, media streaming, and constant background services.

The more capable devices become, the more aggressively the system manages them. Raw CPU power doesn’t eliminate contention — it masks it until production.

The takeaway no profiler teaches you

CPU time is not yours.

It’s borrowed, negotiated, and occasionally revoked without warning.

Perfetto and ftrace didn’t just help me fix a performance issue. They changed how I think about execution itself.

Once you stop assuming your threads will run when they’re ready, you stop being surprised by jank, delays, and “unreproducible” bugs.

You start building software that survives contention instead of pretending it doesn’t exist.

And that’s when starvation stops being mysterious — and starts being manageable.

DUMB DEV Community

When Threads Starve: Finding CPU Time Theft With Perfetto and ftrace

Top comments (0)