The tenth article in the Perfetto series focuses on Binder, the core Inter-Process Communication (IPC) mechanism of Android. Binder carries almost all interactions between system services and applications and is a frequent source of performance bottlenecks. This article, written from a system development and performance tuning perspective, combines data sources like android.binder, sched, thread_state, and android.java_hprof to provide a practical diagnostic workflow that helps both beginners and advanced developers identify issues related to latency, thread pool pressure, and lock contention.
Table of Contents
- Perfetto Series Catalog
- Binder Basics
- Perfetto Setup and Data Sources
- Binder Analysis Workflow
- Case Studies
- Platform Features and Best Practices
- Summary
- References
- Attachments
- About the Author & Blog
Perfetto Series Catalog
- Android Perfetto Series Catalog
- Android Perfetto Series 1: Introduction to Perfetto
- Android Perfetto Series 2: Capturing Perfetto Traces
- Android Perfetto Series 3: Familiarizing with the Perfetto View
- Android Perfetto Series 4: Opening Large Traces via Command Line
- Android Perfetto Series 5: Choreographer-based Rendering Flow
- Android Perfetto Series 6: Why 120Hz? Advantages and Challenges
- Android Perfetto Series 7: MainThread and RenderThread Deep Dive
- Android Perfetto Series 8: Understanding Vsync and Performance Analysis
- Android Perfetto Series 9: Interpreting CPU Information
- Android Perfetto Series 10: Binder Scheduling and Lock Contention (this article)
- Video (Bilibili) - Android Perfetto Basics and Case Studies
Binder Basics
For readers encountering Binder for the first time, understanding its role and participants is crucial. You can roughly understand Binder as “cross-process function calls”: you write code in one process that looks like calling a local interface, while Binder handles the actual call and data transfer. Overall, it is Android’s primary Inter-Process Communication (IPC) mechanism, consisting of four core components:
- Client: Application threads initiate calls through
IBinder.transact(), writingParcel-serialized data to the kernel. - Service (Server): Usually runs in SystemServer or other processes, reading
Parceland executing business logic throughBinder.onTransact(). - Binder Driver: The kernel module
/dev/binderresponsible for thread pool scheduling, buffer management, priority inheritance, etc., serving as the “messenger” connecting both parties. - Thread Pool: The server typically maintains a set of Binder threads. Note that the thread pool is not created full from the start, but is created on demand. The Java layer defaults to approximately 15 Binder worker threads (excluding the main thread), and the Native layer can also configure the maximum thread count via
ProcessState(default value is usually also 15). When all Binder threads are busy, new requests will queue in the driver layer waiting for idle threads.
Why is Binder needed?
Android adopts a multi-process architecture to isolate applications, improve security, and stability. Each APK runs in an independent user space. When it needs to access system capabilities (camera, location, notifications, etc.), it must cross-process call the Framework or SystemServer.
Limitations of traditional IPC solutions:
| IPC Method | Problem |
|---|---|
| Socket | High overhead, lacks identity verification |
| Pipe | Only supports parent-child processes, one-way communication |
| Shared Memory | Needs additional synchronization mechanisms, lacks access control |
Binder solves these problems at the kernel layer, providing three key capabilities: first, identity and permission (based on UID/PID verification to ensure the caller is legitimate); second, synchronous and asynchronous calls (in synchronous mode, the Client waits for the Server to return, which is the most common mode, while in asynchronous mode, the Client returns immediately after sending, suitable for scenarios like notifications and status reporting); third, priority inheritance (when a high-priority Client calls a low-priority Server, the Server temporarily elevates its priority to avoid priority inversion problems).
Therefore, when we write statements like locationManager.getCurrentLocation() in an application, the underlying layer necessarily relies on Binder to safely and reliably pass the call to SystemServer.
Case from App Developer’s Perspective
Suppose we call LocationManager#getCurrentLocation() in an application. The actual implementation of this API is located in LocationManagerService in system_server. The call path can be summarized as: first, on the Proxy side, the App thread gets a proxy object (BinderProxy) for ILocationManager through Context.getSystemService; then it serializes, and when calling getCurrentLocation(), the proxy writes parameters into a Parcel and executes transact(); next is kernel transfer, where the Binder driver queues this transaction into the Binder thread queue of system_server and wakes an idle thread (for example, Binder:1460_5); then on the Stub side, the thread where LocationManagerService (Stub) is located wakes up, reads parameters, executes location logic (which may involve HAL layer interaction); finally, in the return phase, the Service completes execution, writes the result into a Parcel, the driver wakes the original App thread, and the App thread returns from waitForResponse() with the data.
In Perfetto, this chain appears as: a service = android.location.ILocationManager transaction on the android.binder track; the App thread is in the S (Sleeping) state in thread_state, and blocked_function usually involves binder_thread_read or epoll; a Binder thread from SystemServer appears with a Running slice; and Flow arrows (Perfetto uses arrows to connect the Client’s transact and the Server’s run).
Perfetto Setup and Data Sources
To diagnose Binder in Perfetto, you need to prepare the data sources and Trace configuration in advance.
Data Sources and Track Overview
Perfetto’s Binder-related signals mainly come from three data sources, each operating at different levels, providing information granularity and applicable scenarios that differ.
linux.ftrace (kernel layer) is the most general and basic data source, compatible with all Android versions. It directly reads kernel ftrace events, including binder_transaction (transaction start), binder_transaction_received (server received transaction), binder_lock (kernel Binder lock, usually doesn’t need much attention unless you’re debugging the driver itself), etc. Combined with scheduling-related events (sched_switch, sched_waking), it can completely restore the chain of “Client initiates call → kernel wakes Server thread → Server processes → returns”. If your device is Android 12 or 13, basically relying on this data source is enough, and the Perfetto UI will automatically parse the ftrace binder events into an intuitive “Transactions” view.
android.binder (user layer) is a relatively new data source, mainly improved in Android 14/15 and later versions. It utilizes the kernel’s new interface or tracepoints to provide richer semantic information, such as directly distinguishing reply (response) and transaction (request), providing pre-computed metrics like blocking_dur_ns (client blocking duration), and distinguishing lazy_async (delayed dispatched asynchronous transactions), etc. If your device is Android 14 or above, it’s recommended to enable this data source at the same time to get more detailed analysis dimensions.
android.java_hprof (lock contention) is used to capture Monitor Contention at the Java layer (that is, lock contention generated by the synchronized keyword). Although the name contains hprof, in this scenario it mainly records lock waiting events through the ART virtual machine’s instrumentation mechanism, rather than doing memory snapshots. Note that enabling this data source has some performance overhead (because every lock contention needs to be recorded), so it’s recommended to enable it only when the problem can be stably reproduced, within a short time window, to avoid the Trace file being too large or affecting the authenticity of the reproduction scenario.
Recommended Trace Config
The following configuration balances compatibility with new features and is recommended as a standard Binder analysis template. Save the configuration as binder_config.pbtx for use:
1 | # ============================================================ |
Configuration Item Description
| Data Source | Purpose | Android Version Requirement | Overhead |
|---|---|---|---|
android.binder |
User-layer Binder semantics (blocking_dur, reply, etc.) | 14+ | Low |
linux.ftrace (binder/*) |
Kernel-layer Binder events | All versions | Low |
linux.ftrace (sched/*) |
Scheduling events, connecting thread wakeups | All versions | Medium |
android.java_hprof |
Java lock contention (Monitor Contention) | 10+ | Medium-High |
linux.process_stats |
Process name and PID mapping | All versions | Very low |
Tip: If the device is Android 12/13, the
android.binderdata source may have limited functionality, mainly relying onlinux.ftraceis sufficient. Perfetto UI will automatically parse ftrace binder events into an intuitive Transactions view.
Quick Start: 3 Steps to Capture and View Binder Trace
Capture Trace:
1
2
3
4
5
6
7
8
9
10
11# Push configuration
adb push binder_config.pbtx /data/local/tmp/
# Start capture
adb shell perfetto --txt -c /data/local/tmp/binder_config.pbtx \
-o /data/misc/perfetto-traces/trace.pftrace
# ... operate phone to reproduce lag ...
# Pull file out
adb pull /data/misc/perfetto-traces/trace.pftrace .Open Trace: Visit ui.perfetto.dev, drag in the trace file.
Add Key Tracks:
- Left side click Tracks → Add new track
- Search “Binder”, add Android Binder / Transactions and Android Binder / Oneway Calls
- Search “Lock”, add Thread / Lock contention (if data available)
Other Binder Analysis Tools
Besides Perfetto, there are other tools that can assist with Binder analysis. Here are two practical ones: am trace-ipc (built into Android, no extra installation needed) and binder-trace (open source tool, more powerful but higher configuration threshold).
am trace-ipc: Java Layer Binder Call Tracking
am trace-ipc is a command-line tool built into the Android system for tracking Java-layer Binder call stacks. Its working principle is to instrument at BinderProxy.transact(), recording all calls passing through here, and counting the occurrence of each call pattern. The biggest advantage of this tool is zero configuration, ready to use, no root permission required, no extra software installation needed.
Basic usage is very simple, just three steps: “start → operate → stop and export”:
1 | # 1. Start tracking (system starts recording all process Binder calls) |
The exported file is in plain text format, content looks like this:
1 | Traces for process: com.example.app |
From the output, you can see that it groups by process, listing the complete Java stack for each Binder call and the occurrence count (Count). This is very direct for answering questions like “how many Binder calls did my application make during this operation, and what services did it call”.
Using with Perfetto: A very useful feature is that after enabling am trace-ipc, Binder call information is also synchronized to the trace events in Perfetto/Systrace. This means you can use Perfetto to see the latency distribution on the timeline while confirming which Java method initiated the call through the trace-ipc output. Combining both gives you both macro time dimensions and micro call stack details.
This tool is particularly suitable for the following scenarios: suspecting ANR or lag is caused by frequent IPC calls, use it to quickly verify; want to count the total number of Binder calls during a user operation (like startup, scrolling); need to get the complete Java call stack of Binder calls to locate code position.
binder-trace: Real-time Binder Message Parsing
binder-trace is an open-source Binder analysis tool that can intercept and parse Android Binder messages in real-time. Its positioning is similar to “Wireshark for Binder” – just as Wireshark can capture and parse network packets, binder-trace can capture and parse the specific content of Binder transactions, including interface names, method names, and even passed parameter values.
This tool uses Frida for dynamic injection (Frida is a popular dynamic instrumentation framework), so some conditions need to be met before use: the device needs to be rooted (or using an emulator), and frida-server must be deployed on the device first. Also, the local computer needs Python 3.9 or above. After configuring the environment, usage is as follows:
1 | # Track Binder communication for a specific app (-d specifies device, -n specifies process name, -a specifies Android version) |
After running, an interactive interface opens, displaying all Binder interactions of the target process in real-time. You can filter by interface, method, transaction type, etc. through configuration files (avoiding information overload), and also use shortcuts to pause/continue recording, clear screen, etc. The tool has AIDL structure definitions for Android 9 to 14 built-in, automatically parsing most system service call parameters.
This tool is more suitable for security research and reverse engineering scenarios, like if you want to deeply analyze what specific data is passed between an app and system services, or want to understand the calling details of some unpublished API. However, for daily performance analysis, binder-trace has a high configuration threshold (needs root, needs frida deployment), and it focuses on “message content” rather than “latency distribution”, so usually Perfetto with am trace-ipc is enough. If you’re doing security auditing or need to reverse analyze an app’s IPC behavior, binder-trace will be a very powerful tool.
Binder Analysis Workflow
After getting the Trace, don’t just fish in the ocean. It’s recommended to proceed in the order of “find target → look at latency → check threads → find locks”.
Step 1: Identify Transaction Latency
The first step in analysis is to find that Binder call you care about. In Perfetto, there are several common ways to locate it: if you already know which process initiated the call, you can directly find your App process’s area as Client in the Transactions track; if you know the interface name or method name, you can press the / key to open the search box, input the AIDL interface name (like ILocationManager) or method name to quickly locate; if you’re troubleshooting UI lag issues, the most direct way is to first look at the main thread’s thread_state track, find segments in S (Sleeping) state with long duration – if during this time the main thread is almost not executing code, it’s very likely waiting for Binder call return, this is the starting point of analysis.
After selecting a Transaction Slice, the Details panel on the right shows detailed information about this transaction. Among them, three key latency metrics need special attention: latency_ns indicates total latency, which is the complete time from the client sending the request to receiving the response; server_latency_ns indicates server-side processing latency, which is the time the server-side thread actually executes business code; blocking_dur_ns indicates client blocking latency, which is the time the client thread waits in the kernel.
Understanding the relationship between these metrics is very important because it directly determines which direction you should dig deeper next. If latency_ns is very long but server_latency_ns is very short, it means time wasn’t spent on server-side processing, but was consumed in Binder driver scheduling or server-side queuing (usually means the Service’s thread pool is busy, new requests need to wait for idle threads). In this case, you need to check the server-side thread pool status, which is what Step 2 will do. If latency_ns and server_latency_ns are similar and both very long, it means server-side processing itself is slow, at this point you need to jump to the server-side Binder thread and see what it was actually doing during this time – was it running business code, waiting for a lock, or waiting for IO.
Step 2: Evaluate Thread Pool and Oneway Queue
If Step 1 analysis finds that latency is mainly not in server-side processing, but in “queuing”, then you need to further check the status of the Binder thread pool. Before deep analysis, first answer a frequently asked question: “approximately how many Binder threads does each process have? What’s the scale of system_server’s Binder thread pool? Under what circumstances will it be ‘exhausted’?”
SystemServer’s Binder Thread Pool Scale
In upstream AOSP (Android 14/15), the Binder thread pool design philosophy is: grow on demand, configurable, no single fixed number.
- Thread pool grows on demand: Each server process maintains a thread pool in the Binder driver, where the actual number of threads increases or decreases according to load on demand, with the upper limit jointly determined by the
max_threadsfield in the kernel and user-space configurations likeProcessState#setThreadPoolMaxThreadCount(). - Typical upper limit is 15-16 worker threads: In most AOSP versions, the app process’s Java Binder thread pool upper limit is about 15-16 worker threads; core processes like
system_serveralso have their Binder thread pool defaults in this magnitude, in the range of “a dozen threads”.
Some vendor ROMs or custom kernels will adjust the upper limit up or down based on their own load models (for example, adjusting to dozens of threads), so when you see specific numbers throughps -T system_server,top -H, or countingBinder:threads in Perfetto on different devices, there may be differences. - Take actual observation as standard, not memorizing a number: In Perfetto, a more recommended approach is to directly expand a process and see how many
Binder:xxx_ythread tracks there are, and their activity level during the Trace capture, to evaluate the thread pool’s “scale” and “busyness”.
Binder Thread Count, Buffer, and “Binder Exhaustion”
In performance analysis, when people mention “Binder count”, they often confuse three different types of resource limits:
Binder thread pool exhaustion means all Binder worker threads in a process are in Running / D / S and other busy states, with no idle threads that can be woken by the driver to process new transactions. Phenomena include Client threads staying in S state in the thread_state track for a long time (call stack stops at ioctl(BINDER_WRITE_READ) / epoll_wait), and queue_len corresponding to the service in the android.binder track remains high (indicating requests are queuing). For key processes like system_server, the thread pool being full means system service response capability drops, easily amplifying into global lag or ANR.
Binder transaction buffer exhaustion involves a shared buffer of limited size (typical value about 1MB magnitude) that each process has in the Binder driver, used to carry Parcel data being transmitted. Typical scenarios include one transaction transmitting an object that’s too large (like a large Bitmap, extra long string, large array, etc.), and a large number of concurrent transactions not yet consumed, causing too many unreleased Parcels to pile up in the buffer. Possible results include kernel logs showing binder_transaction_alloc_buf failures, Java layer throwing TransactionTooLargeException, and subsequent transactions queuing for a long time or even failing in the driver layer (looks like “Binder is used up”). The solution to such problems is not to “open more threads”, but to control the data amount per transmission (split packets, paging, streaming protocols), and prioritize using SharedMemory / files / ParcelFileDescriptor and other mechanisms for large block data.
Binder reference table / object count: The Binder driver maintains reference tables and node objects for each process, and these also have upper limits, but in most actual scenarios, they rarely hit this first. Common risk is holding a large number of Binder references for a long time without releasing, more manifesting as memory/stability issues, not UI lag.
When analyzing in Perfetto, you can carry a judgment framework:
“Is the current slowness because the thread pool is full, or because transactions are too large/buffer is used up?”
The former mainly looks at Binder thread count and their thread_state, as well as queue_len; the latter focuses on single transaction size, concurrent transaction count, and whether accompanied by TransactionTooLargeException / binder_transaction_alloc_buf related logs.
Now back to our analysis scenario:
The busyness of the Binder thread pool directly determines the service’s concurrent processing capability. For synchronous transactions, if all Binder threads on the server side are in Running or Uninterruptible Sleep (D) states, new synchronous requests will queue in the kernel waiting, and client threads will be blocked on ioctl(BINDER_WRITE_READ) or epoll_wait for a long time. In Perfetto, this appears as: the main thread stays in S state for a long time, but looking at its Java code execution situation, it’s almost blank – time is spent waiting. For Oneway (asynchronous) transactions, the situation is slightly different: although the client returns immediately after sending and doesn’t block waiting, Oneway requests on the same IBinder object are often consumed serially on the server side (there’s a queue). If an App sends a large number of Oneway requests in a short time, this queue will be extended, not only will subsequent Oneway executions of this App be delayed, but responses from other callers on the same service may also be affected.
When diagnosing thread pool issues in Perfetto, there are several metrics worth attention. First is Queue Length, in the android.binder track you can observe the queue_len metric, if it remains high, it means request production speed is far greater than consumption speed, and the thread pool is in a saturated state. Second, pay attention to buffer-related signs (like the TransactionTooLargeException or binder_transaction_alloc_buf failures in kernel logs mentioned earlier), which usually means single transaction is too large or too many concurrent transactions. The most intuitive way is to directly observe Binder thread states: find the system_server process and expand all threads, if you find that most threads starting with Binder: are in busy states (dense slices on the timeline with almost no gaps), it means the thread pool is already saturated.
About identifying Oneway calls in Perfetto: Synchronous calls (Two-way) and asynchronous calls (Oneway) have obvious differences in Perfetto, learning to distinguish them is very helpful for analysis. During synchronous calls, the client blocks waiting (thread_state shows S), and Perfetto draws bidirectional Flow arrows (transaction → reply); while Oneway calls the client returns immediately after sending, almost no blocking, Flow arrows only have one-way transaction, no reply comes back. Also, Oneway call Slice names might carry an [oneway] marker, and its latency_ns only represents sending time, not round-trip time.
When analyzing Oneway-related issues, focus on two things: first, the server-side queue depth (if Oneway requests on the same IBinder object pile up, the actual execution timing of subsequent requests will be continuously delayed); second, whether there’s a batch sending pattern (a large number of Oneway calls in a short time will form “spikes”, appearing as densely arranged short Slices on server-side Binder threads in Perfetto).
It’s worth mentioning that SystemServer’s Binder threads not only need to handle requests from various Apps, but also handle system internal calls (like AMS calling WMS, WMS calling SurfaceFlinger, etc.). If a “misbehaving” App frantically sends Oneway requests in a short time, it might fill up a certain system service’s Oneway queue, further affecting other Apps’ asynchronous callback latency, causing a global lag feeling.
Step 3: Investigate Lock Contention
If you jump to the server-side Binder thread and find it stays in S (Sleeping) or D (Disk Sleep / Uninterruptible Sleep) state for a long time while processing your request, it usually means it’s waiting for some resource – either waiting for a lock or waiting for IO. Lock contention is a very common source of performance bottlenecks in SystemServer, because SystemServer runs a large number of services that share a lot of global state, and this state is often protected by synchronized locks.
Java locks (Monitor Contention) is the most common situation. There are quite a few global locks in SystemServer, like WindowManagerService’s mGlobalLock, some internal locks of ActivityManagerService, etc. When multiple threads simultaneously need to access resources protected by these locks, contention occurs. In Perfetto, if you see a Binder thread state as S, and the blocked_function field contains symbols related to futex (like futex_wait), you can basically be sure it’s waiting for a Java lock. To further confirm which lock it’s waiting for and who’s holding it, you can check the Lock contention track. Perfetto will visualize the lock contention relationship: marking the Owner (thread holding the lock, like the android.display thread) and Waiter (thread waiting for the lock, like the Binder:123_1 processing your request) with connection lines. Clicking the Contention Slice, you can also see the lock object’s class name (like com.android.server.wm.WindowManagerGlobalLock) in the Details panel, which is very helpful for understanding the root cause of the problem.
Native locks (Mutex / RwLock) situations are relatively rarer, but can be encountered in some scenarios. Manifestations are similar: thread state is D or S, but the call stack shows symbols from the Native layer like __mutex_lock, pthread_mutex_lock, rwsem, not Java’s futex_wait. Analyzing such problems usually needs to combine sched_blocked_reason events to see what the thread is specifically waiting for, belonging to relatively advanced content, so we won’t expand on it here.
Using SQL to Statistics SystemServer Lock Contention (Optional)
If you’re already familiar with Perfetto SQL, here’s a query that can be run directly in Perfetto UI to count Java monitor lock contention (Lock contention on a monitor lock) in the system_server process.
Where lock_depth indicates the number of threads participating in contention for the same object lock when lock contention occurs.
1 | select count(1) as lock_depth, s.slice_id,s.track_id,s.ts,s.dur,s.dur/1e6 as dur_ms,ctn.otid,s.name |
Case Study: Window Management Delay
Below is a real case to demonstrate the analysis process introduced earlier. The scenario is: obvious animation stutter occurs during app startup.
1. Discover Anomaly
First, open the Trace in Perfetto and find the App’s UI Thread track. Observation shows one frame’s doFrame duration reached 30ms (normally, a 60Hz screen frame should be within 16.6ms). Zooming in on the thread_state corresponding to this frame, we find the main thread was in S (Sleeping) state for 18ms, indicating that during this time, the main thread wasn’t executing code, but waiting for something.
2. Track Binder Call
Click this 18ms S segment and check the Details panel on the right, you can see associated Slice information. From it, we discover the main thread was calling IActivityTaskManager.startActivity at the time – this is a cross-process Binder call, the caller is App, and the callee is system_server. Perfetto’s Flow arrow clearly points to the Binder:1605_2 thread in the system_server process, indicating that this request is being processed by this Binder thread.
3. Server-side Analysis
Follow the Flow arrow and jump to system_server‘s Binder:1605_2 thread. Observation shows that this thread is indeed running (Running state), but it took 15ms to finish. Further observing this Binder thread’s Lock contention track (some versions show it in small bars above the thread track), we find a red lock contention marker.
4. Pin Down the Culprit
Click this lock contention marker, and the Details panel shows key information: this thread is waiting for the com.android.server.wm.WindowManagerGlobalLock lock, the lock’s Owner is the android.anim thread (system animation thread), and the waiting Duration is 12ms.
Conclusion: The startActivity Binder request initiated during app startup needs to acquire the WindowManagerGlobalLock lock to continue execution on the SystemServer side, but this lock was being held by the system animation thread at the time (for updating window state), causing the Binder thread to wait 12ms to get the lock and complete processing. This 12ms of lock waiting plus other overhead caused the App main thread to be blocked for 18ms, ultimately manifesting as one frame of lag.
Optimization Direction: This situation belongs to system-level lock contention, which is hard for the App side to directly fix. However, the App side can do some avoidance: avoid initiating complex Window operations during intensive system animation execution (like startup animations, transition animations), or find ways to reduce IPC call frequency during cold startup, lowering the probability of hitting lock contention.
Case Study: Binder Thread Pool Saturation
Let’s look at another case. The scenario is: multiple apps start simultaneously, the overall system becomes laggy.
1. Discover Anomaly
Observing in Perfetto, we find that multiple Apps’ main threads are in S (Sleeping) state during the same time period, and from the call stack, they’re all waiting for their respective Binder calls to return. User feedback is “clicking anything has no response, takes a few seconds to react”, the whole system response is sluggish.
2. Check SystemServer’s Thread Pool Status
Since multiple Apps are all waiting for Binder return, the problem likely lies on the server side. Expand the system_server process and observe all threads starting with Binder:. The situation is bad: almost all Binder threads (from Binder:1460_1 to Binder:1460_15) are in Running state, each thread’s timeline has dense slices with almost no gaps, completely no idle threads can process new requests. This is a typical thread pool saturation phenomenon.
3. Analyze Queuing
Further observing in the android.binder track, we find the queue_len (queue length) metric remains high (greater than 5, meaning requests are always queuing), and multiple Clients’ blocking_dur_ns (blocking duration) is much greater than server_latency_ns (server processing duration) – this indicates requests spend most time queuing waiting, not actually processing.
4. Locate Root Cause
Further checking what transactions each Binder thread is specifically processing, we find an interesting phenomenon: a background app initiated a large number of IPackageManager query requests in a short time. Each query itself doesn’t take long (about 5ms), but there are too many (hundreds), and these requests filled up the thread pool, causing other apps’ (including foreground app) requests to queue and wait.
Conclusion: A “misbehaving” app “crowded out” system_server’s thread pool resources through batch Binder calls, causing normal requests from other apps to be significantly delayed, manifesting as global lag.
Optimization Direction: From the app side, should avoid this batch loop calling pattern – if batch query is needed, should use batch interfaces provided by the system (like using getPackagesForUid instead of loop calling getPackageInfo), or scatter requests to different time points and execute asynchronously. From the system side, can consider adding rate limiting (throttling) for specific services, or optimize hot spot service processing efficiency, reducing single request duration.
Platform Features and Best Practices
As Android versions iterate, the Binder mechanism itself is constantly evolving, introducing some new features to improve performance and stability. Understanding these features helps understand the causes of certain phenomena in Perfetto, and can also help you write more “friendly” code.
Binder Freeze (Android 12+) is a feature very helpful for reducing ANR. When a process is judged as Cached (cached state) and is “frozen” by the system, its Binder interfaces are also frozen. At this time, if other processes attempt to synchronously call this frozen process’s Binder, the system will faster make this call directly fail (in logcat you’ll see FROZEN related error messages), instead of letting the caller hang there and ultimately走向 ANR. The benefit of this design is: rather than letting the caller hang there not knowing when it can return, it’s better to fail fast, giving the caller a chance to do error handling.
Lazy Async (Android 14/15) optimizes the dispatch strategy for asynchronous transactions (Oneway). In previous versions, the system would immediately try to wake the target thread to process as soon as it received an Oneway request, which would cause a “wakeup storm” when a large number of Oneways flooded in a short time, bringing unnecessary power consumption overhead. Lazy Async’s approach is to “batch” a bit before dispatching according to system load, making Oneway processing rhythm smoother. In Perfetto, you’ll find that after enabling this feature, Oneway queue length fluctuations are smaller, and server-side thread wake-up frequency is more uniform.
Binder Heavy Hitter Watcher is a system-level monitoring mechanism that automatically detects those “problem processes” that overuse Binder. If a process initiates too many Binder calls in a short time, the system will print warning information in logcat. This is very helpful for discovering potential performance problems – if you see your app being named in the logs, you should check if some IPC call is too frequent.
Some suggestions for developers:
About Oneway usage, need to be particularly cautious. Oneway calls indeed “look” faster (because they return immediately after sending, don’t wait for results), but they’re not suitable for all scenarios. Only when you “clearly don’t care about return results, and don’t need to know when the operation completes” should you use Oneway, like scenarios like log reporting, status notification where “sending is enough”. If you change a call that should be synchronous to Oneway just to “make the main thread return faster”, you might introduce hard-to-debug timing issues – because you can’t determine when the server actually finishes processing, and Oneway requests are processed serially and queued on the server side, a large number of Oneways might affect each other.
About transmitting large data, must avoid passing large objects via Binder directly (especially Bitmaps). As mentioned earlier, Binder’s shared buffer per process is only about 1MB, passing a slightly larger image might burst the buffer, triggering TransactionTooLargeException. The correct approach is to use SharedMemory (usually based on ashmem or memfd at the bottom layer, can efficiently share large blocks of memory between processes), pass through files, or use ParcelFileDescriptor to pass file descriptors letting the other side read itself.
About main thread Binder calls, the basic principle is: don’t call those Binder services on the main thread where you can’t estimate latency. Many system services’ response times are uncertain, they might depend on network, IO, or even other process states. Once the other side gets stuck, your main thread will follow and get stuck, a few seconds later it’s ANR. If you must call such services, should put them on background threads to do, then switch back to the main thread to update UI after getting results.
Summary
Perfetto is an important tool for analyzing Binder problems. Through this article, you should have a basic understanding of the following aspects: how to configure ftrace and android.binder data sources to capture Binder-related events; how to use Flow arrows in Perfetto UI to connect the Client’s request and Server’s processing, forming a complete call chain; and how to distinguish “queueing slow” (thread pool saturation), “processing slow” (server-side code time-consuming), and “waiting for lock” (lock contention) these several common performance bottlenecks by observing latency_ns, server_latency_ns, thread states, lock contention, etc.
In actual development, if you encounter unexplainable UI lag or ANR, try capturing a Trace with Perfetto to look: is the main thread waiting for Binder call return? If so, what is the server-side doing? Is it queuing waiting for threads, executing business logic, or waiting for locks? Following this thought process step by step to dig down, you can often find the root cause of the problem. Of course, Binder analysis is just a part of Perfetto functionality, combining the CPU, scheduling, rendering, etc. knowledge introduced in previous articles, you can more comprehensively understand the system’s running state and locate various performance problems.
References
- Understanding Android Binder Mechanism 1/3: Driver Part
- Perfetto Documentation - Android Binder
- Perfetto Documentation - Ftrace
- Android Source - Binder
- Android Developers - Parcel and Bundle
- binder-trace - Wireshark for Binder
- am trace-ipc Source Analysis
Attachments
- Download Perfetto Trace (SystemServer Binder Case)
(Trace data contains sensitive information, please keep it confidential after downloading.)
About the Author & Blog
“If you want to go fast, go alone. If you want to go far, go together.”
