Android Performance

Android Perfetto Series 10: Binder Scheduling and Lock Contention

Word count: 6.4kReading time: 39 min
2025/11/16
loading

The tenth article in the Perfetto series focuses on Binder, the core Inter-Process Communication (IPC) mechanism of Android. Binder carries almost all interactions between system services and applications and is a frequent source of performance bottlenecks. This article, written from a system development and performance tuning perspective, combines data sources like android.binder, sched, thread_state, and android.java_hprof to provide a practical diagnostic workflow that helps both beginners and advanced developers identify issues related to latency, thread pool pressure, and lock contention.

Table of Contents

Perfetto Series Catalog

  1. Android Perfetto Series Catalog
  2. Android Perfetto Series 1: Introduction to Perfetto
  3. Android Perfetto Series 2: Capturing Perfetto Traces
  4. Android Perfetto Series 3: Familiarizing with the Perfetto View
  5. Android Perfetto Series 4: Opening Large Traces via Command Line
  6. Android Perfetto Series 5: Choreographer-based Rendering Flow
  7. Android Perfetto Series 6: Why 120Hz? Advantages and Challenges
  8. Android Perfetto Series 7: MainThread and RenderThread Deep Dive
  9. Android Perfetto Series 8: Understanding Vsync and Performance Analysis
  10. Android Perfetto Series 9: Interpreting CPU Information
  11. Android Perfetto Series 10: Binder Scheduling and Lock Contention (this article)
  12. Video (Bilibili) - Android Perfetto Basics and Case Studies

Binder Basics

For readers encountering Binder for the first time, understanding its role and participants is crucial. You can roughly understand Binder as “cross-process function calls”: you write code in one process that looks like calling a local interface, while Binder handles the actual call and data transfer. Overall, it is Android’s primary Inter-Process Communication (IPC) mechanism, consisting of four core components:

  1. Client: Application threads initiate calls through IBinder.transact(), writing Parcel-serialized data to the kernel.
  2. Service (Server): Usually runs in SystemServer or other processes, reading Parcel and executing business logic through Binder.onTransact().
  3. Binder Driver: The kernel module /dev/binder responsible for thread pool scheduling, buffer management, priority inheritance, etc., serving as the “messenger” connecting both parties.
  4. Thread Pool: The server typically maintains a set of Binder threads. Note that the thread pool is not created full from the start, but is created on demand. The Java layer defaults to approximately 15 Binder worker threads (excluding the main thread), and the Native layer can also configure the maximum thread count via ProcessState (default value is usually also 15). When all Binder threads are busy, new requests will queue in the driver layer waiting for idle threads.

Why is Binder needed?

Android adopts a multi-process architecture to isolate applications, improve security, and stability. Each APK runs in an independent user space. When it needs to access system capabilities (camera, location, notifications, etc.), it must cross-process call the Framework or SystemServer.

Limitations of traditional IPC solutions:

IPC Method Problem
Socket High overhead, lacks identity verification
Pipe Only supports parent-child processes, one-way communication
Shared Memory Needs additional synchronization mechanisms, lacks access control

Binder solves these problems at the kernel layer, providing three key capabilities: first, identity and permission (based on UID/PID verification to ensure the caller is legitimate); second, synchronous and asynchronous calls (in synchronous mode, the Client waits for the Server to return, which is the most common mode, while in asynchronous mode, the Client returns immediately after sending, suitable for scenarios like notifications and status reporting); third, priority inheritance (when a high-priority Client calls a low-priority Server, the Server temporarily elevates its priority to avoid priority inversion problems).

Therefore, when we write statements like locationManager.getCurrentLocation() in an application, the underlying layer necessarily relies on Binder to safely and reliably pass the call to SystemServer.

Case from App Developer’s Perspective

Suppose we call LocationManager#getCurrentLocation() in an application. The actual implementation of this API is located in LocationManagerService in system_server. The call path can be summarized as: first, on the Proxy side, the App thread gets a proxy object (BinderProxy) for ILocationManager through Context.getSystemService; then it serializes, and when calling getCurrentLocation(), the proxy writes parameters into a Parcel and executes transact(); next is kernel transfer, where the Binder driver queues this transaction into the Binder thread queue of system_server and wakes an idle thread (for example, Binder:1460_5); then on the Stub side, the thread where LocationManagerService (Stub) is located wakes up, reads parameters, executes location logic (which may involve HAL layer interaction); finally, in the return phase, the Service completes execution, writes the result into a Parcel, the driver wakes the original App thread, and the App thread returns from waitForResponse() with the data.

In Perfetto, this chain appears as: a service = android.location.ILocationManager transaction on the android.binder track; the App thread is in the S (Sleeping) state in thread_state, and blocked_function usually involves binder_thread_read or epoll; a Binder thread from SystemServer appears with a Running slice; and Flow arrows (Perfetto uses arrows to connect the Client’s transact and the Server’s run).

Perfetto Setup and Data Sources

To diagnose Binder in Perfetto, you need to prepare the data sources and Trace configuration in advance.

Data Sources and Track Overview

Perfetto’s Binder-related signals mainly come from three data sources, each operating at different levels, providing information granularity and applicable scenarios that differ.

linux.ftrace (kernel layer) is the most general and basic data source, compatible with all Android versions. It directly reads kernel ftrace events, including binder_transaction (transaction start), binder_transaction_received (server received transaction), binder_lock (kernel Binder lock, usually doesn’t need much attention unless you’re debugging the driver itself), etc. Combined with scheduling-related events (sched_switch, sched_waking), it can completely restore the chain of “Client initiates call → kernel wakes Server thread → Server processes → returns”. If your device is Android 12 or 13, basically relying on this data source is enough, and the Perfetto UI will automatically parse the ftrace binder events into an intuitive “Transactions” view.

android.binder (user layer) is a relatively new data source, mainly improved in Android 14/15 and later versions. It utilizes the kernel’s new interface or tracepoints to provide richer semantic information, such as directly distinguishing reply (response) and transaction (request), providing pre-computed metrics like blocking_dur_ns (client blocking duration), and distinguishing lazy_async (delayed dispatched asynchronous transactions), etc. If your device is Android 14 or above, it’s recommended to enable this data source at the same time to get more detailed analysis dimensions.

android.java_hprof (lock contention) is used to capture Monitor Contention at the Java layer (that is, lock contention generated by the synchronized keyword). Although the name contains hprof, in this scenario it mainly records lock waiting events through the ART virtual machine’s instrumentation mechanism, rather than doing memory snapshots. Note that enabling this data source has some performance overhead (because every lock contention needs to be recorded), so it’s recommended to enable it only when the problem can be stably reproduced, within a short time window, to avoid the Trace file being too large or affecting the authenticity of the reproduction scenario.

The following configuration balances compatibility with new features and is recommended as a standard Binder analysis template. Save the configuration as binder_config.pbtx for use:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# ============================================================
# Perfetto configuration dedicated to Binder analysis
# Applicable scope: Android 12+ (some data sources require Android 14+)
# ============================================================

# --- Buffer and duration settings ---
buffers {
size_kb: 65536 # 64MB buffer, suitable for medium complexity scenarios
fill_policy: RING_BUFFER
}
duration_ms: 15000 # 15 second capture duration, adjustable as needed

# --- Data source 1: android.binder (Android 14+) ---
# Provides user-layer Binder semantics, including reply/transaction distinction, blocking_dur_ns, etc.
data_sources {
config {
name: "android.binder"
android_binder_config {
intercept_transactions: true # Intercept transactions
intercept_late_reply: true # Capture delayed replies
# Optional: Filter specific processes to reduce Trace size
# filter { name: "system_server" }
# filter { name: "com.android.systemui" }
# filter { name: "your_app_package_name" }
}
}
}

# --- Data source 2: linux.ftrace (kernel layer) ---
# Most general data source, compatible with all Android versions
data_sources {
config {
name: "linux.ftrace"
ftrace_config {
# Binder core events
ftrace_events: "binder/binder_transaction" # Transaction start
ftrace_events: "binder/binder_transaction_received" # Server received transaction
ftrace_events: "binder/binder_transaction_alloc_buf" # Buffer allocation (diagnose TransactionTooLarge)
ftrace_events: "binder/binder_set_priority" # Priority inheritance
ftrace_events: "binder/binder_lock" # Kernel lock (usually can be omitted)
ftrace_events: "binder/binder_locked"
ftrace_events: "binder/binder_unlock"

# Scheduling events (connect Client/Server threads)
ftrace_events: "sched/sched_switch"
ftrace_events: "sched/sched_waking"
ftrace_events: "sched/sched_wakeup"
ftrace_events: "sched/sched_blocked_reason" # Blocking reason

# Optional: Application layer trace points (need atrace)
atrace_categories: "binder_driver" # Binder driver layer
atrace_categories: "sched" # Scheduling
atrace_categories: "am" # ActivityManager
atrace_categories: "wm" # WindowManager
# atrace_categories: "view" # Enable if analyzing UI

# Symbolize kernel call stack
symbolize_ksyms: true

# Optimize scheduling event storage, reduce Trace size
compact_sched {
enabled: true
}
}
}
}

# --- Data source 3: android.java_hprof (Java lock contention) ---
# Capture synchronized lock waiting, has some performance overhead
data_sources {
config {
name: "android.java_hprof"
java_hprof_config {
track_contended_locks: true # Enable lock contention tracking
track_allocation_contexts: false # Disable memory allocation tracking (reduce overhead)
track_java_heap: false # Disable heap sampling
}
}
}

# --- Data source 4: linux.process_stats (process information) ---
# Provide basic information like process name, PID
data_sources {
config {
name: "linux.process_stats"
process_stats_config {
scan_all_processes_on_start: true
}
}
}

Configuration Item Description

Data Source Purpose Android Version Requirement Overhead
android.binder User-layer Binder semantics (blocking_dur, reply, etc.) 14+ Low
linux.ftrace (binder/*) Kernel-layer Binder events All versions Low
linux.ftrace (sched/*) Scheduling events, connecting thread wakeups All versions Medium
android.java_hprof Java lock contention (Monitor Contention) 10+ Medium-High
linux.process_stats Process name and PID mapping All versions Very low

Tip: If the device is Android 12/13, the android.binder data source may have limited functionality, mainly relying on linux.ftrace is sufficient. Perfetto UI will automatically parse ftrace binder events into an intuitive Transactions view.

Quick Start: 3 Steps to Capture and View Binder Trace

  1. Capture Trace:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    # Push configuration
    adb push binder_config.pbtx /data/local/tmp/

    # Start capture
    adb shell perfetto --txt -c /data/local/tmp/binder_config.pbtx \
    -o /data/misc/perfetto-traces/trace.pftrace

    # ... operate phone to reproduce lag ...

    # Pull file out
    adb pull /data/misc/perfetto-traces/trace.pftrace .
  2. Open Trace: Visit ui.perfetto.dev, drag in the trace file.

  3. Add Key Tracks:

    • Left side click TracksAdd new track
    • Search “Binder”, add Android Binder / Transactions and Android Binder / Oneway Calls
    • Search “Lock”, add Thread / Lock contention (if data available)

Other Binder Analysis Tools

Besides Perfetto, there are other tools that can assist with Binder analysis. Here are two practical ones: am trace-ipc (built into Android, no extra installation needed) and binder-trace (open source tool, more powerful but higher configuration threshold).

am trace-ipc: Java Layer Binder Call Tracking

am trace-ipc is a command-line tool built into the Android system for tracking Java-layer Binder call stacks. Its working principle is to instrument at BinderProxy.transact(), recording all calls passing through here, and counting the occurrence of each call pattern. The biggest advantage of this tool is zero configuration, ready to use, no root permission required, no extra software installation needed.

Basic usage is very simple, just three steps: “start → operate → stop and export”:

1
2
3
4
5
6
7
8
9
10
# 1. Start tracking (system starts recording all process Binder calls)
adb shell am trace-ipc start

# 2. Execute the operation you want to analyze on the phone (like starting an app, triggering lag scenarios, etc.)

# 3. Stop tracking and export results to file
adb shell am trace-ipc stop --dump-file /data/local/tmp/ipc-trace.txt

# 4. Pull the result file to computer to view
adb pull /data/local/tmp/ipc-trace.txt

The exported file is in plain text format, content looks like this:

1
2
3
4
5
6
7
Traces for process: com.example.app
Count: 15
java.lang.Throwable
at android.os.BinderProxy.transact(BinderProxy.java:xxx)
at android.app.IActivityManager$Stub$Proxy.startActivity(...)
at android.app.Instrumentation.execStartActivity(...)
...

From the output, you can see that it groups by process, listing the complete Java stack for each Binder call and the occurrence count (Count). This is very direct for answering questions like “how many Binder calls did my application make during this operation, and what services did it call”.

Using with Perfetto: A very useful feature is that after enabling am trace-ipc, Binder call information is also synchronized to the trace events in Perfetto/Systrace. This means you can use Perfetto to see the latency distribution on the timeline while confirming which Java method initiated the call through the trace-ipc output. Combining both gives you both macro time dimensions and micro call stack details.

This tool is particularly suitable for the following scenarios: suspecting ANR or lag is caused by frequent IPC calls, use it to quickly verify; want to count the total number of Binder calls during a user operation (like startup, scrolling); need to get the complete Java call stack of Binder calls to locate code position.

binder-trace: Real-time Binder Message Parsing

binder-trace is an open-source Binder analysis tool that can intercept and parse Android Binder messages in real-time. Its positioning is similar to “Wireshark for Binder” – just as Wireshark can capture and parse network packets, binder-trace can capture and parse the specific content of Binder transactions, including interface names, method names, and even passed parameter values.

This tool uses Frida for dynamic injection (Frida is a popular dynamic instrumentation framework), so some conditions need to be met before use: the device needs to be rooted (or using an emulator), and frida-server must be deployed on the device first. Also, the local computer needs Python 3.9 or above. After configuring the environment, usage is as follows:

1
2
# Track Binder communication for a specific app (-d specifies device, -n specifies process name, -a specifies Android version)
binder-trace -d emulator-5554 -n com.example.app -a 11

After running, an interactive interface opens, displaying all Binder interactions of the target process in real-time. You can filter by interface, method, transaction type, etc. through configuration files (avoiding information overload), and also use shortcuts to pause/continue recording, clear screen, etc. The tool has AIDL structure definitions for Android 9 to 14 built-in, automatically parsing most system service call parameters.

This tool is more suitable for security research and reverse engineering scenarios, like if you want to deeply analyze what specific data is passed between an app and system services, or want to understand the calling details of some unpublished API. However, for daily performance analysis, binder-trace has a high configuration threshold (needs root, needs frida deployment), and it focuses on “message content” rather than “latency distribution”, so usually Perfetto with am trace-ipc is enough. If you’re doing security auditing or need to reverse analyze an app’s IPC behavior, binder-trace will be a very powerful tool.

Binder Analysis Workflow

After getting the Trace, don’t just fish in the ocean. It’s recommended to proceed in the order of “find target → look at latency → check threads → find locks”.

Step 1: Identify Transaction Latency

The first step in analysis is to find that Binder call you care about. In Perfetto, there are several common ways to locate it: if you already know which process initiated the call, you can directly find your App process’s area as Client in the Transactions track; if you know the interface name or method name, you can press the / key to open the search box, input the AIDL interface name (like ILocationManager) or method name to quickly locate; if you’re troubleshooting UI lag issues, the most direct way is to first look at the main thread’s thread_state track, find segments in S (Sleeping) state with long duration – if during this time the main thread is almost not executing code, it’s very likely waiting for Binder call return, this is the starting point of analysis.

After selecting a Transaction Slice, the Details panel on the right shows detailed information about this transaction. Among them, three key latency metrics need special attention: latency_ns indicates total latency, which is the complete time from the client sending the request to receiving the response; server_latency_ns indicates server-side processing latency, which is the time the server-side thread actually executes business code; blocking_dur_ns indicates client blocking latency, which is the time the client thread waits in the kernel.

Understanding the relationship between these metrics is very important because it directly determines which direction you should dig deeper next. If latency_ns is very long but server_latency_ns is very short, it means time wasn’t spent on server-side processing, but was consumed in Binder driver scheduling or server-side queuing (usually means the Service’s thread pool is busy, new requests need to wait for idle threads). In this case, you need to check the server-side thread pool status, which is what Step 2 will do. If latency_ns and server_latency_ns are similar and both very long, it means server-side processing itself is slow, at this point you need to jump to the server-side Binder thread and see what it was actually doing during this time – was it running business code, waiting for a lock, or waiting for IO.

Step 2: Evaluate Thread Pool and Oneway Queue

If Step 1 analysis finds that latency is mainly not in server-side processing, but in “queuing”, then you need to further check the status of the Binder thread pool. Before deep analysis, first answer a frequently asked question: “approximately how many Binder threads does each process have? What’s the scale of system_server’s Binder thread pool? Under what circumstances will it be ‘exhausted’?”

SystemServer’s Binder Thread Pool Scale

In upstream AOSP (Android 14/15), the Binder thread pool design philosophy is: grow on demand, configurable, no single fixed number.

  • Thread pool grows on demand: Each server process maintains a thread pool in the Binder driver, where the actual number of threads increases or decreases according to load on demand, with the upper limit jointly determined by the max_threads field in the kernel and user-space configurations like ProcessState#setThreadPoolMaxThreadCount().
  • Typical upper limit is 15-16 worker threads: In most AOSP versions, the app process’s Java Binder thread pool upper limit is about 15-16 worker threads; core processes like system_server also have their Binder thread pool defaults in this magnitude, in the range of “a dozen threads”.
    Some vendor ROMs or custom kernels will adjust the upper limit up or down based on their own load models (for example, adjusting to dozens of threads), so when you see specific numbers through ps -T system_server, top -H, or counting Binder: threads in Perfetto on different devices, there may be differences.
  • Take actual observation as standard, not memorizing a number: In Perfetto, a more recommended approach is to directly expand a process and see how many Binder:xxx_y thread tracks there are, and their activity level during the Trace capture, to evaluate the thread pool’s “scale” and “busyness”.

Binder Thread Count, Buffer, and “Binder Exhaustion”

In performance analysis, when people mention “Binder count”, they often confuse three different types of resource limits:

Binder thread pool exhaustion means all Binder worker threads in a process are in Running / D / S and other busy states, with no idle threads that can be woken by the driver to process new transactions. Phenomena include Client threads staying in S state in the thread_state track for a long time (call stack stops at ioctl(BINDER_WRITE_READ) / epoll_wait), and queue_len corresponding to the service in the android.binder track remains high (indicating requests are queuing). For key processes like system_server, the thread pool being full means system service response capability drops, easily amplifying into global lag or ANR.

Binder transaction buffer exhaustion involves a shared buffer of limited size (typical value about 1MB magnitude) that each process has in the Binder driver, used to carry Parcel data being transmitted. Typical scenarios include one transaction transmitting an object that’s too large (like a large Bitmap, extra long string, large array, etc.), and a large number of concurrent transactions not yet consumed, causing too many unreleased Parcels to pile up in the buffer. Possible results include kernel logs showing binder_transaction_alloc_buf failures, Java layer throwing TransactionTooLargeException, and subsequent transactions queuing for a long time or even failing in the driver layer (looks like “Binder is used up”). The solution to such problems is not to “open more threads”, but to control the data amount per transmission (split packets, paging, streaming protocols), and prioritize using SharedMemory / files / ParcelFileDescriptor and other mechanisms for large block data.

Binder reference table / object count: The Binder driver maintains reference tables and node objects for each process, and these also have upper limits, but in most actual scenarios, they rarely hit this first. Common risk is holding a large number of Binder references for a long time without releasing, more manifesting as memory/stability issues, not UI lag.

When analyzing in Perfetto, you can carry a judgment framework:
“Is the current slowness because the thread pool is full, or because transactions are too large/buffer is used up?”
The former mainly looks at Binder thread count and their thread_state, as well as queue_len; the latter focuses on single transaction size, concurrent transaction count, and whether accompanied by TransactionTooLargeException / binder_transaction_alloc_buf related logs.


Now back to our analysis scenario:

The busyness of the Binder thread pool directly determines the service’s concurrent processing capability. For synchronous transactions, if all Binder threads on the server side are in Running or Uninterruptible Sleep (D) states, new synchronous requests will queue in the kernel waiting, and client threads will be blocked on ioctl(BINDER_WRITE_READ) or epoll_wait for a long time. In Perfetto, this appears as: the main thread stays in S state for a long time, but looking at its Java code execution situation, it’s almost blank – time is spent waiting. For Oneway (asynchronous) transactions, the situation is slightly different: although the client returns immediately after sending and doesn’t block waiting, Oneway requests on the same IBinder object are often consumed serially on the server side (there’s a queue). If an App sends a large number of Oneway requests in a short time, this queue will be extended, not only will subsequent Oneway executions of this App be delayed, but responses from other callers on the same service may also be affected.

When diagnosing thread pool issues in Perfetto, there are several metrics worth attention. First is Queue Length, in the android.binder track you can observe the queue_len metric, if it remains high, it means request production speed is far greater than consumption speed, and the thread pool is in a saturated state. Second, pay attention to buffer-related signs (like the TransactionTooLargeException or binder_transaction_alloc_buf failures in kernel logs mentioned earlier), which usually means single transaction is too large or too many concurrent transactions. The most intuitive way is to directly observe Binder thread states: find the system_server process and expand all threads, if you find that most threads starting with Binder: are in busy states (dense slices on the timeline with almost no gaps), it means the thread pool is already saturated.

About identifying Oneway calls in Perfetto: Synchronous calls (Two-way) and asynchronous calls (Oneway) have obvious differences in Perfetto, learning to distinguish them is very helpful for analysis. During synchronous calls, the client blocks waiting (thread_state shows S), and Perfetto draws bidirectional Flow arrows (transaction → reply); while Oneway calls the client returns immediately after sending, almost no blocking, Flow arrows only have one-way transaction, no reply comes back. Also, Oneway call Slice names might carry an [oneway] marker, and its latency_ns only represents sending time, not round-trip time.

When analyzing Oneway-related issues, focus on two things: first, the server-side queue depth (if Oneway requests on the same IBinder object pile up, the actual execution timing of subsequent requests will be continuously delayed); second, whether there’s a batch sending pattern (a large number of Oneway calls in a short time will form “spikes”, appearing as densely arranged short Slices on server-side Binder threads in Perfetto).

It’s worth mentioning that SystemServer’s Binder threads not only need to handle requests from various Apps, but also handle system internal calls (like AMS calling WMS, WMS calling SurfaceFlinger, etc.). If a “misbehaving” App frantically sends Oneway requests in a short time, it might fill up a certain system service’s Oneway queue, further affecting other Apps’ asynchronous callback latency, causing a global lag feeling.

Step 3: Investigate Lock Contention

If you jump to the server-side Binder thread and find it stays in S (Sleeping) or D (Disk Sleep / Uninterruptible Sleep) state for a long time while processing your request, it usually means it’s waiting for some resource – either waiting for a lock or waiting for IO. Lock contention is a very common source of performance bottlenecks in SystemServer, because SystemServer runs a large number of services that share a lot of global state, and this state is often protected by synchronized locks.

Java locks (Monitor Contention) is the most common situation. There are quite a few global locks in SystemServer, like WindowManagerService’s mGlobalLock, some internal locks of ActivityManagerService, etc. When multiple threads simultaneously need to access resources protected by these locks, contention occurs. In Perfetto, if you see a Binder thread state as S, and the blocked_function field contains symbols related to futex (like futex_wait), you can basically be sure it’s waiting for a Java lock. To further confirm which lock it’s waiting for and who’s holding it, you can check the Lock contention track. Perfetto will visualize the lock contention relationship: marking the Owner (thread holding the lock, like the android.display thread) and Waiter (thread waiting for the lock, like the Binder:123_1 processing your request) with connection lines. Clicking the Contention Slice, you can also see the lock object’s class name (like com.android.server.wm.WindowManagerGlobalLock) in the Details panel, which is very helpful for understanding the root cause of the problem.

Native locks (Mutex / RwLock) situations are relatively rarer, but can be encountered in some scenarios. Manifestations are similar: thread state is D or S, but the call stack shows symbols from the Native layer like __mutex_lock, pthread_mutex_lock, rwsem, not Java’s futex_wait. Analyzing such problems usually needs to combine sched_blocked_reason events to see what the thread is specifically waiting for, belonging to relatively advanced content, so we won’t expand on it here.

Using SQL to Statistics SystemServer Lock Contention (Optional)

If you’re already familiar with Perfetto SQL, here’s a query that can be run directly in Perfetto UI to count Java monitor lock contention (Lock contention on a monitor lock) in the system_server process.
Where lock_depth indicates the number of threads participating in contention for the same object lock when lock contention occurs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
select count(1) as lock_depth, s.slice_id,s.track_id,s.ts,s.dur,s.dur/1e6 as dur_ms,ctn.otid,s.name

from slice s, (select slice_id,track_id,ts,dur,name,substr(name, 46, instr(name,')')-46) as otid
from slice t
WHERE name like 'Lock contention on a monitor lock %'
order by dur) ctn
JOIN thread_track ON s.track_id=thread_track.id JOIN thread USING(utid) JOIN process USING(upid)
WHERE
process.name = 'system_server'
and s.name like 'Lock contention on a monitor lock %'
and substr(s.name, 46, instr(s.name,')')-46) = ctn.otid
and ctn.slice_id <> s.slice_id
and ctn.ts >= s.ts and (ctn.ts+ctn.dur) <= (s.ts+s.dur)
group by s.slice_id
order by s.dur desc;


Case Study: Window Management Delay

Below is a real case to demonstrate the analysis process introduced earlier. The scenario is: obvious animation stutter occurs during app startup.

1. Discover Anomaly

First, open the Trace in Perfetto and find the App’s UI Thread track. Observation shows one frame’s doFrame duration reached 30ms (normally, a 60Hz screen frame should be within 16.6ms). Zooming in on the thread_state corresponding to this frame, we find the main thread was in S (Sleeping) state for 18ms, indicating that during this time, the main thread wasn’t executing code, but waiting for something.

2. Track Binder Call

Click this 18ms S segment and check the Details panel on the right, you can see associated Slice information. From it, we discover the main thread was calling IActivityTaskManager.startActivity at the time – this is a cross-process Binder call, the caller is App, and the callee is system_server. Perfetto’s Flow arrow clearly points to the Binder:1605_2 thread in the system_server process, indicating that this request is being processed by this Binder thread.

3. Server-side Analysis

Follow the Flow arrow and jump to system_server‘s Binder:1605_2 thread. Observation shows that this thread is indeed running (Running state), but it took 15ms to finish. Further observing this Binder thread’s Lock contention track (some versions show it in small bars above the thread track), we find a red lock contention marker.

4. Pin Down the Culprit

Click this lock contention marker, and the Details panel shows key information: this thread is waiting for the com.android.server.wm.WindowManagerGlobalLock lock, the lock’s Owner is the android.anim thread (system animation thread), and the waiting Duration is 12ms.

Conclusion: The startActivity Binder request initiated during app startup needs to acquire the WindowManagerGlobalLock lock to continue execution on the SystemServer side, but this lock was being held by the system animation thread at the time (for updating window state), causing the Binder thread to wait 12ms to get the lock and complete processing. This 12ms of lock waiting plus other overhead caused the App main thread to be blocked for 18ms, ultimately manifesting as one frame of lag.

Optimization Direction: This situation belongs to system-level lock contention, which is hard for the App side to directly fix. However, the App side can do some avoidance: avoid initiating complex Window operations during intensive system animation execution (like startup animations, transition animations), or find ways to reduce IPC call frequency during cold startup, lowering the probability of hitting lock contention.


Case Study: Binder Thread Pool Saturation

Let’s look at another case. The scenario is: multiple apps start simultaneously, the overall system becomes laggy.

1. Discover Anomaly

Observing in Perfetto, we find that multiple Apps’ main threads are in S (Sleeping) state during the same time period, and from the call stack, they’re all waiting for their respective Binder calls to return. User feedback is “clicking anything has no response, takes a few seconds to react”, the whole system response is sluggish.

2. Check SystemServer’s Thread Pool Status

Since multiple Apps are all waiting for Binder return, the problem likely lies on the server side. Expand the system_server process and observe all threads starting with Binder:. The situation is bad: almost all Binder threads (from Binder:1460_1 to Binder:1460_15) are in Running state, each thread’s timeline has dense slices with almost no gaps, completely no idle threads can process new requests. This is a typical thread pool saturation phenomenon.

3. Analyze Queuing

Further observing in the android.binder track, we find the queue_len (queue length) metric remains high (greater than 5, meaning requests are always queuing), and multiple Clients’ blocking_dur_ns (blocking duration) is much greater than server_latency_ns (server processing duration) – this indicates requests spend most time queuing waiting, not actually processing.

4. Locate Root Cause

Further checking what transactions each Binder thread is specifically processing, we find an interesting phenomenon: a background app initiated a large number of IPackageManager query requests in a short time. Each query itself doesn’t take long (about 5ms), but there are too many (hundreds), and these requests filled up the thread pool, causing other apps’ (including foreground app) requests to queue and wait.

Conclusion: A “misbehaving” app “crowded out” system_server’s thread pool resources through batch Binder calls, causing normal requests from other apps to be significantly delayed, manifesting as global lag.

Optimization Direction: From the app side, should avoid this batch loop calling pattern – if batch query is needed, should use batch interfaces provided by the system (like using getPackagesForUid instead of loop calling getPackageInfo), or scatter requests to different time points and execute asynchronously. From the system side, can consider adding rate limiting (throttling) for specific services, or optimize hot spot service processing efficiency, reducing single request duration.

Platform Features and Best Practices

As Android versions iterate, the Binder mechanism itself is constantly evolving, introducing some new features to improve performance and stability. Understanding these features helps understand the causes of certain phenomena in Perfetto, and can also help you write more “friendly” code.

Binder Freeze (Android 12+) is a feature very helpful for reducing ANR. When a process is judged as Cached (cached state) and is “frozen” by the system, its Binder interfaces are also frozen. At this time, if other processes attempt to synchronously call this frozen process’s Binder, the system will faster make this call directly fail (in logcat you’ll see FROZEN related error messages), instead of letting the caller hang there and ultimately走向 ANR. The benefit of this design is: rather than letting the caller hang there not knowing when it can return, it’s better to fail fast, giving the caller a chance to do error handling.

Lazy Async (Android 14/15) optimizes the dispatch strategy for asynchronous transactions (Oneway). In previous versions, the system would immediately try to wake the target thread to process as soon as it received an Oneway request, which would cause a “wakeup storm” when a large number of Oneways flooded in a short time, bringing unnecessary power consumption overhead. Lazy Async’s approach is to “batch” a bit before dispatching according to system load, making Oneway processing rhythm smoother. In Perfetto, you’ll find that after enabling this feature, Oneway queue length fluctuations are smaller, and server-side thread wake-up frequency is more uniform.

Binder Heavy Hitter Watcher is a system-level monitoring mechanism that automatically detects those “problem processes” that overuse Binder. If a process initiates too many Binder calls in a short time, the system will print warning information in logcat. This is very helpful for discovering potential performance problems – if you see your app being named in the logs, you should check if some IPC call is too frequent.

Some suggestions for developers:

About Oneway usage, need to be particularly cautious. Oneway calls indeed “look” faster (because they return immediately after sending, don’t wait for results), but they’re not suitable for all scenarios. Only when you “clearly don’t care about return results, and don’t need to know when the operation completes” should you use Oneway, like scenarios like log reporting, status notification where “sending is enough”. If you change a call that should be synchronous to Oneway just to “make the main thread return faster”, you might introduce hard-to-debug timing issues – because you can’t determine when the server actually finishes processing, and Oneway requests are processed serially and queued on the server side, a large number of Oneways might affect each other.

About transmitting large data, must avoid passing large objects via Binder directly (especially Bitmaps). As mentioned earlier, Binder’s shared buffer per process is only about 1MB, passing a slightly larger image might burst the buffer, triggering TransactionTooLargeException. The correct approach is to use SharedMemory (usually based on ashmem or memfd at the bottom layer, can efficiently share large blocks of memory between processes), pass through files, or use ParcelFileDescriptor to pass file descriptors letting the other side read itself.

About main thread Binder calls, the basic principle is: don’t call those Binder services on the main thread where you can’t estimate latency. Many system services’ response times are uncertain, they might depend on network, IO, or even other process states. Once the other side gets stuck, your main thread will follow and get stuck, a few seconds later it’s ANR. If you must call such services, should put them on background threads to do, then switch back to the main thread to update UI after getting results.

Summary

Perfetto is an important tool for analyzing Binder problems. Through this article, you should have a basic understanding of the following aspects: how to configure ftrace and android.binder data sources to capture Binder-related events; how to use Flow arrows in Perfetto UI to connect the Client’s request and Server’s processing, forming a complete call chain; and how to distinguish “queueing slow” (thread pool saturation), “processing slow” (server-side code time-consuming), and “waiting for lock” (lock contention) these several common performance bottlenecks by observing latency_ns, server_latency_ns, thread states, lock contention, etc.

In actual development, if you encounter unexplainable UI lag or ANR, try capturing a Trace with Perfetto to look: is the main thread waiting for Binder call return? If so, what is the server-side doing? Is it queuing waiting for threads, executing business logic, or waiting for locks? Following this thought process step by step to dig down, you can often find the root cause of the problem. Of course, Binder analysis is just a part of Perfetto functionality, combining the CPU, scheduling, rendering, etc. knowledge introduced in previous articles, you can more comprehensively understand the system’s running state and locate various performance problems.

References

  1. Understanding Android Binder Mechanism 1/3: Driver Part
  2. Perfetto Documentation - Android Binder
  3. Perfetto Documentation - Ftrace
  4. Android Source - Binder
  5. Android Developers - Parcel and Bundle
  6. binder-trace - Wireshark for Binder
  7. am trace-ipc Source Analysis

Attachments

About the Author & Blog

  1. Blogger Introduction
  2. Blog Content Navigation
  3. Android Performance Optimization Knowledge Planet

“If you want to go fast, go alone. If you want to go far, go together.”

Wechat QR

CATALOG
  1. 1. Table of Contents
  2. 2. Perfetto Series Catalog
  3. 3. Binder Basics
    1. 3.1. Case from App Developer’s Perspective
  4. 4. Perfetto Setup and Data Sources
    1. 4.1. Data Sources and Track Overview
    2. 4.2. Recommended Trace Config
      1. 4.2.1. Configuration Item Description
    3. 4.3. Quick Start: 3 Steps to Capture and View Binder Trace
    4. 4.4. Other Binder Analysis Tools
      1. 4.4.1. am trace-ipc: Java Layer Binder Call Tracking
      2. 4.4.2. binder-trace: Real-time Binder Message Parsing
  5. 5. Binder Analysis Workflow
    1. 5.1. Step 1: Identify Transaction Latency
    2. 5.2. Step 2: Evaluate Thread Pool and Oneway Queue
      1. 5.2.1. SystemServer’s Binder Thread Pool Scale
      2. 5.2.2. Binder Thread Count, Buffer, and “Binder Exhaustion”
    3. 5.3. Step 3: Investigate Lock Contention
      1. 5.3.1. Using SQL to Statistics SystemServer Lock Contention (Optional)
  6. 6. Case Study: Window Management Delay
  7. 7. Case Study: Binder Thread Pool Saturation
  8. 8. Platform Features and Best Practices
  9. 9. Summary
  10. 10. References
  11. 11. Attachments
  12. 12. About the Author & Blog