SmartPerfetto Architecture Q&A
This article collects technical questions received after publishing From Trace to Insight: Harness Engineering in SmartPerfetto AI Agent and discusses them in Q&A format.
Q1: Why build a custom YAML Skill system instead of using Claude Code’s standard Skills?
Question context: Claude Code’s Skill system supports placing deterministic scripts in a scripts/ directory to avoid LLM generalization. Since you can use scripts/ to execute fixed SQL, why build a separate YAML Skill system? Isn’t a YAML Skill essentially a tool that lets performance engineers execute SQL according to predefined rules?
Key distinction: The two Skill systems operate at different layers
Claude Code Skills and SmartPerfetto YAML Skills solve problems at different stages:
1 | Development stage (when I write code): |
Claude Code’s Skills run in the developer’s terminal as CLI tool extensions. SmartPerfetto’s YAML Skills run in the Skill Engine within the Express backend, invoked by the Agent at runtime through the MCP tool invoke_skill. The execution environments, invocation methods, and data flows are completely different.
Even focusing only on “deterministic execution,” YAML Skills have several targeted designs
1. Parameterized SQL, not fixed scripts
Performance analysis SQL isn’t hardcoded – the same Skill needs to accept different parameters (process names, time ranges, frame ID lists):
1 | steps: |
${main_thread_utid} and ${start_ts} are parameters passed in when Claude calls invoke_skill. The YAML Skill Engine performs parameter substitution before executing the SQL. With scripts/, you’d either write shell scripts that accept parameters and concatenate SQL (prone to injection issues) or write full Python/Node scripts – far more complex than YAML.
2. Self-describing output format (DataEnvelope)
1 | display: |
Each step declares the output columns’ names and types. The frontend automatically renders tables based on this schema – duration types are automatically formatted as ms, timestamp types support click-to-navigate to the Perfetto timeline. With scripts/, the output is free-form text that the frontend can’t automatically render.
3. Composable (composite + iterator)
A composite Skill can reference multiple atomic Skills, and iterators can traverse data rows for per-frame analysis. This composition is declarative in YAML, with the Skill Engine handling orchestration. The scripts/ approach would require writing your own orchestration logic for the same composition.
4. Designed for performance engineers, not developers
The questioner got it right: YAML Skills are essentially a tool that lets performance engineers contribute analysis logic through predefined rules. Performance engineers know which SQL to query and which metrics to examine, but they don’t necessarily know TypeScript. The YAML format lets them directly define SQL queries and output formats without touching backend code. After modifications, changes take effect by simply refreshing the browser in DEV mode.
Comparison summary
| Dimension | Claude Code scripts/ | SmartPerfetto YAML Skill |
|---|---|---|
| Runtime environment | Developer terminal (CLI) | Express backend (runtime) |
| Caller | Developer via /skill command |
Agent via invoke_skill MCP tool |
| Parameterization | Must handle yourself | `${param |
| Output format | Free-form text | DataEnvelope (schema-driven) |
| Frontend rendering | Not involved | Automatic tables/charts |
| Composition | Manual orchestration | composite / iterator / conditional |
| Contribution barrier | Must write scripts | Just YAML + SQL |
The two are not alternatives but solve different problems at different layers.
Q2: How exactly is “deterministic + flexible” implemented?
Question context: The article says “known scenarios use Strategy files to constrain mandatory checks, but within each phase, the specific queries and deep drill directions are autonomously decided by Claude.” Where is the boundary between constraint and autonomy? How exactly is this achieved?
Three-layer mechanism working together
This hybrid design relies on three layers working in concert: Strategy files define “what must be done,” Planning Gate enforces “plan before acting,” and Verifier performs post-hoc checks on “whether it was actually done.”
Layer 1: Strategy files – hard constraints alongside soft guidance
Taking the scrolling analysis scrolling.strategy.md as an example, it defines multiple analysis phases, but the constraint strength differs across phases:
Hard constraints (must execute; skipping triggers verification errors):
Phase 1.9 root cause deep drill is the most strictly constrained phase, with the strategy file using red circle markers and “prohibited” language:
1 | **Phase 1.9 -- Root Cause Deep Drill (Red circle mandatory, cannot skip):** |
Soft guidance (suggested but skippable):
Phase 1.5 (architecture-aware branching) and Phase 1.7 (root cause branching) use suggestive language like “switch to” and “note,” allowing Claude to decide whether to execute based on actual data:
1 | **Phase 1.5 -- Architecture-Aware Branching:** |
The entire content of the Strategy file is injected verbatim into the System Prompt, with a hard constraint statement added at injection time:
1 | Scene Strategy (must be strictly followed) |
Claude sees these phase definitions, red circle markers, and “prohibited” language directly in the System Prompt.
Layer 2: Planning Gate – forces planning first, but doesn’t limit plan content
Before executing any SQL queries or Skill invocations, Claude must first call submit_plan to submit an analysis plan. Calling execute_sql or invoke_skill without submitting a plan is directly rejected:
1 | function requirePlan(toolName: string): string | null { |
The key point is: the Gate only requires a plan to exist, not that it precisely matches the Strategy’s phases. Claude can submit any plan structure – it can merge Phase 1 and 1.5, add extra steps not mentioned in the Strategy, or adjust deep drill directions based on preliminary data.
When submitting a plan, the system performs scene-aware keyword checking (e.g., for scrolling scenarios, checking whether the plan mentions “frames,” “jank,” etc.), but this is only at the warning level – the plan is accepted even without these keywords.
The purpose of this design is: force Claude to think clearly about what it wants to do before acting (planning discipline), without restricting how it thinks (planning freedom).
Layer 3: Verifier – multi-dimensional post-hoc checking
There can be gaps between planning and execution – Claude might submit a plan but actually skip a critical step. The Verifier performs multi-dimensional post-hoc checks after analysis completes, primarily using heuristic behavioral checks while supplementing with plan/hypothesis/scene completeness validation:
a) Scene completeness check – whether the analysis output covers the scene’s core content:
1 | // Scrolling scenario: check if significant jank exists but Phase 1.9 deep drill was skipped |
b) Hypothesis closure check – whether all submit_hypothesis calls have corresponding resolve_hypothesis calls.
c) Causal chain depth check – whether CRITICAL/HIGH severity findings contain sufficient causal connectors and mechanistic terminology (heuristic text matching).
d) Optional LLM review – using an independent Haiku model for evidence support verification (can be disabled).
If the check finds ERROR-level issues, it triggers a Correction Prompt for Claude to fill in the gaps.
Note that the Verifier does not check whether Claude’s plan phases match the Strategy’s phase numbers – it checks whether “critical analysis actions are reflected in the output,” not whether “the plan format is correct.”
The complete constraint spectrum
Layering all three mechanisms together, different phases form a spectrum of constraint strength:
| Phase | Strategy tone | Planning Gate | Verifier check | Constraint strength |
|---|---|---|---|---|
| Phase 1 (overview) | Suggestive | Plan required | Not individually checked | Medium |
| Phase 1.5 (architecture branch) | Suggestive | – | Not checked | Low |
| Phase 1.7 (root cause branch) | Suggestive + conditional | – | Not checked | Low |
| Phase 1.9 (root cause deep drill) | Must/Prohibited | – | Checks if deep drill tools were called | High |
| Phase 2 (supplementary deep drill) | Optional | – | Not checked | None |
| Phase 3 (comprehensive conclusion) | Must cover distribution | – | Checks conclusion completeness | Medium |
Meanwhile, general.strategy.md (the fallback when no scene matches) is entirely soft guidance: it only provides a routing decision tree based on the user’s focus direction (CPU -> cpu_analysis, memory -> memory_analysis), with no mandatory phases. Claude has complete autonomy in the general scenario.
One-sentence summary
Strategy files tell Claude “analyzing scrolling issues requires at least these steps,” Planning Gate ensures it thinks before acting, and Verifier post-checks whether critical steps were actually performed. But within this framework, what specific data to query, which tools to use, and in what order – all of these are autonomously decided by Claude based on actual data.
Q3: What’s the biggest difference between Agent and Workflow? Where are the Agent’s capability boundaries and what determines them?
Question context: In building our Agent, we went from initially assuming the Agent could understand and make decisions about every Skill given to it, to now having essentially hardcoded a decision tree in the Skills. The stepping stones along the way were always: “we assumed the Agent had capability X, but it didn’t,” causing its output to deviate from our expectations, so we kept adding boundaries to the Skills until it became a hardcoded Workflow.
The essential difference: Who holds decision-making authority
Agent and Workflow aren’t two tools or two frameworks – they’re two ends of the same spectrum:
1 | Hardcoded Workflow <------------------------------------> Fully Autonomous Agent |
| Dimension | Workflow | Agent |
|---|---|---|
| Control flow | Developer hardcodes if/else in code |
LLM autonomously selects next step |
| Tool selection | Predefined execution order | LLM selects on-demand based on data |
| Branch conditions | Conditional logic in code | LLM reasoning |
| Failure handling | try/catch + retry logic |
LLM self-reflection + direction change |
| Predictability | Highly deterministic | Highly uncertain |
| Adapting to new scenarios | Developer must add branches | Can explore autonomously |
But in engineering practice, almost no one operates at either extreme of the spectrum. Pure Workflows can’t handle unknown scenarios; pure Agents are unreliable on critical steps. Real-world production systems sit somewhere in the middle.
The root cause of your pitfalls: Making global assumptions about Agent capabilities
“Defaulting to ‘Agent can understand everything’ -> discovering it can’t -> constantly adding constraints -> becoming a hardcoded Workflow” – the fundamental problem with this path is: making a one-size-fits-all judgment about Agent capabilities.
But Agent capabilities vary enormously across different tasks:
| Capability dimension | LLM reliability | Who should handle it |
|---|---|---|
| Intent understanding (what the user wants) | High | Agent (though simple scenarios can use keyword matching instead) |
| Plan formulation (how many steps, what order) | Medium | Needs a constraint framework: Strategy files provide structure, LLM fills in details |
| Data collection (what to query) | Medium | Semi-autonomous: Skills define what to query, Agent decides order and parameters |
| Data reasoning (attribution after seeing data) | High | Agent – this is LLM’s greatest value |
| Precise computation (numerical statistics) | Very low | Tool system (SQL / Skill Engine) |
| Self-evaluation (knowing if it’s right) | Low | External Verifier; don’t trust Agent self-assessment |
The correct approach isn’t “choose Agent globally or choose Workflow globally,” but assign by task:
1 | Scene recognition -> Workflow (deterministic logic, no LLM needed) |
SmartPerfetto’s approach: A constraint strength spectrum
SmartPerfetto doesn’t choose between Agent and Workflow; instead, it sets different constraint strengths for different analysis phases (detailed in Q2). Here we re-examine this design from the “capability boundary” perspective:
High constraint (Phase 1.9 root cause deep drill) – because the Agent is unreliable at “deciding whether to deep drill”:
1 | # Phase 1.9 in scrolling.strategy.md |
Why the hard constraint? Because we found the Agent has a systematic bias: it tends to jump straight to conclusions after getting overview data, skipping the deep drill. This isn’t because the model isn’t smart enough – Claude is perfectly capable of root cause deep drilling – but because the model exhibits “path dependency”: overview data already contains statistical classifications (reason_code), and for the model, “using classification labels directly for conclusions” has much lower cognitive cost than “spending 3 tool-call rounds doing per-frame deep drilling.”
Low constraint (Phase 1.5 architecture branch) – because the Agent is reliable enough at “selecting tools based on data”:
1 | # Phase 1.5 in scrolling.strategy.md |
This uses suggestive language like “switch to” and “note” without enforcement. Because the architecture detection result (Flutter/WebView/Standard) has already been placed in the system prompt by deterministic code, the Agent has a high probability of selecting the correct Skill after seeing this information.
Zero constraint (general scenario) – because the Agent’s autonomous exploration is the only option:
1 | # general.strategy.md -- only a routing decision tree, no mandatory steps |
The general scenario has zero hard constraints because entering general means the user’s question exceeds predefined scenarios, and a Workflow can’t handle it. At this point, the only option is to trust the Agent’s autonomous exploration capability.
What determines Agent capability boundaries
Agent capability boundaries don’t depend on model parameter count or benchmark scores, but on three engineering factors:
1. Observation capability – what data can the Agent “see”
The same model, given structured L2 per-frame data from the scrolling_analysis Skill vs. writing SQL to query raw tables itself, produces significantly different analysis quality. The Agent’s ceiling is determined by the data tools you provide. SmartPerfetto uses 164 YAML Skills to encapsulate domain experts’ query logic; the Agent gets processed, structured analysis data through invoke_skill, not raw millions of trace events.
2. Constraint framework – within what bounds does the Agent make decisions
An unconstrained Agent is like an intern without a task checklist – knowledgeable enough but unsure what to do first. Strategy files, Planning Gate, and Verifier together define the Agent’s decision boundaries: Strategy tells it “at minimum, what must be done,” Planning Gate forces it to “think before acting,” and Verifier post-checks “whether the analysis is sufficient” (heuristic checks + hypothesis closure + scene completeness + optional LLM review).
3. Feedback quality – can the Agent be corrected when wrong
A significant proportion of Agent findings have issues (shallow attribution, false positives, missing critical steps). Relying solely on model self-correction has limited effectiveness. SmartPerfetto uses multi-layer verification + external correction prompts to close the loop:
1 | Verifier finds ERROR -> Generates Correction Prompt -> Triggers SDK retry |
Addendum: Strategy files are SOPs, and that’s fine
Some might point out that scrolling.strategy.md reads like an SOP (Standard Operating Procedure) – with numbered Phases, condition tables, mandatory items, and even explicit invoke_skill("scrolling_analysis", {...}). How is this different from “hardcoding a decision tree in Skills”?
Let’s be direct: in the data collection phase, SmartPerfetto’s scrolling analysis is a Workflow. Strategy files are SOPs that encode domain experts’ analysis experience into deterministic steps. This is intentional.
The key is understanding what the SOP covers and what it doesn’t:
What the SOP can cover (data collection) – what Strategy files do:
1 | scrolling.strategy.md: |
What the SOP can’t cover (reasoning/attribution) – where the Agent’s value lies:
1 | - Among 47 jank frames, which share the same root cause? (Data clustering) |
Different scenarios have different SOP levels:
| Strategy file | SOP level | Reason |
|---|---|---|
scrolling.strategy.md |
High – Phase numbers + condition tables + mandatory items | Scrolling analysis methodology is most mature; optimal data collection paths are known |
startup.strategy.md |
Medium-high – Has Phase structure, but deep drill directions are more open | Startup scenarios are more diverse (cold/warm/hot, different bottlenecks) |
anr.strategy.md |
Medium – 2-skill pipeline, but root cause analysis relies entirely on reasoning | ANR root causes are highly diverse |
general.strategy.md |
Low – Only a routing decision tree, no mandatory items | Unknown scenarios, impossible to turn into an SOP |
scrolling.strategy.md has the highest SOP level because scrolling analysis methodology is the most mature. general.strategy.md has almost no SOP because user questions are completely unpredictable.
So the correct understanding is: SmartPerfetto = “SOP-driven data collection + Agent-driven reasoning/attribution.”
The SOP addresses “what data to look at, at minimum, when analyzing scrolling issues” – this question has a deterministic answer, and using an SOP is correct. The Agent addresses “how to reason about causality and organize conclusions after getting the data” – this question differs for every trace and can’t be turned into an SOP.
Back to your pitfalls: The problem isn’t “Skills becoming SOPs” – the data collection phase should use SOPs. The problem is “the SOP consuming the reasoning” – if the SOP hardcodes even the conclusions (“if you see X, output Y”), the Agent truly degrades into a Workflow. The key is letting the SOP stop at data collection and leaving reasoning to the Agent.
One-sentence summary
The difference between Agent and Workflow isn’t “intelligent vs. hardcoded,” but “how decision-making authority is allocated.” Agent capability boundaries are jointly determined by “observation capability x constraint framework x feedback quality.” The correct approach is to allocate decision-making authority by task – granting autonomy where the Agent is reliable, adding constraints where it isn’t – rather than making a one-size-fits-all choice.
Q4: Does the Agent architecture need improvement from a business perspective?
Question context: Agent architecture has gone through different designs and evolutions – from the initial ReAct architecture to LangGraph’s node-based architecture. How do different Agent architecture designs affect it? When building your own business Agent, should you consider the architecture’s impact on Agent performance? For example, SmartPerfetto has added different Skill loading modes on top of the Claude Agent SDK based on business understanding.
Essential differences of three mainstream architectures
| Architecture | Control flow model | Developer role | Best suited for |
|---|---|---|---|
| ReAct | Linear loop: Think -> Act -> Observe -> Think… | Define tools | Simple tasks with few tools and short paths |
| LangGraph Node-based | DAG graph: nodes=steps, edges=conditional jumps | Design graph structure + define nodes + write jump conditions | Deterministic processes with clear steps and limited branches |
| Native SDK | SDK manages turn loop, developer only defines tools | Define tools + inject context | Many tools, unpredictable paths, requiring LLM autonomous orchestration |
Their core difference lies in “who decides what to do next”:
- ReAct: LLM makes complete decisions at every step (what to think, what to do, which tool to use); the framework just forwards
- LangGraph: Developers predefine all possible paths (nodes + edges); LLM only makes local decisions within nodes
- Native SDK: SDK manages the conversation loop, LLM autonomously selects tools, developers indirectly constrain through system prompts and tool design
Why SmartPerfetto chose Native SDK + custom constraint layers
SmartPerfetto’s architecture is Claude Agent SDK (Native SDK) + three constraint layers (Strategy/Planning Gate/Verifier):
1 | Claude Agent SDK provides: SmartPerfetto custom-built: |
Why not LangGraph?
The root cause reasoning paths in performance analysis are unpredictable. The same “scrolling stuttering” could be caused by:
- Binder blocking -> needs to trace thread state on the system_server side
- Slow GPU rendering -> needs to check GPU frequency and fence wait
- GC pauses -> needs to examine Java heap and GC events
- Thermal throttling -> needs to check thermal zone and CPU frequency
- Lock contention -> needs to check monitor contention
- A combination of multiple causes above
With LangGraph, you’d need to predefine a DAG node and jump condition for every root cause path. With 21 reason_codes in performance analysis, each combinable with deep drills, the combinatorial explosion of paths makes the DAG graph unmaintainable.
The more fundamental problem is: before seeing the data, you don’t know which path to take. LangGraph’s DAG graph assumes the developer can predict all branch conditions in advance, but performance analysis branch conditions depend on runtime data.
Advantages of the Native SDK architecture:
LLM autonomously selects tool paths, but three constraint layers ensure critical steps aren’t skipped:
1 | # LangGraph requires predefined DAG: |
Key business-driven architectural designs
The following are architectural improvements SmartPerfetto made based on business understanding, each directly corresponding to a business problem:
1. Conditional tool loading – reduce the Agent’s decision space
SmartPerfetto has up to 20 MCP tools total (9 always-on + 11 conditionally injected), but only a subset is injected per analysis:
1 | // claudeMcpServer.ts -- switch tool sets by mode |
Business reason: More tools means higher probability of the Agent selecting the wrong one. A query that just needs a quick answer to “what’s the frame rate,” if presented with a dozen planning/hypothesis/comparison tools, might lead the Agent to over-analyze.
2. Sub-Agent scene gating – avoid unnecessary parallel overhead
1 | // claudeAgentDefinitions.ts -- only enable sub-agents in complex scenarios |
| Scenario | Sub-Agent configuration | Reason |
|---|---|---|
| scrolling | frame-expert + system-expert | Frame analysis and system analysis are suitable for splitting, coordinated by orchestrator |
| startup | startup-expert + system-expert | Startup phase analysis and resource contention analysis are suitable for splitting |
| anr | No sub-agent | ANR is a 2-skill pipeline; extra sub-agents would only add overhead |
Note: The actual parallelism of sub-agents depends on the SDK’s internal scheduling strategy. We design prompts for parallel evidence collection, but actual execution may be sequential.
3. Lightweight vs Full dual mode – quick Q&A doesn’t go through the full pipeline
When users ask “what’s the frame rate of this trace,” there’s no need to go through the full Planning -> Skill -> Verification pipeline. SmartPerfetto’s ClaudeRuntime performs complexity classification at the entry point:
1 | analyze(query) |
Business reason: A significant proportion of user questions are factual queries (“what’s the frame rate,” “is there an ANR”), and running the full pipeline would unnecessarily increase latency.
Answering “whether architecture needs business-driven improvement”
Yes, but the improvement direction isn’t switching the underlying framework (ReAct -> LangGraph), but adding business constraint layers on top of the existing framework. Specific recommendations:
Extract Strategy files from your Skill decision tree: Rewrite the hardcoded
if/elsedecision logic in code as natural language analysis strategies (Markdown files), injected into the system prompt by scene. This way, domain experts can directly modify analysis logic without touching code.Add a Planning Gate: The
requirePlan()implementation is extremely simple (under 10 lines of code), but the effect is significant – forcing the Agent to think before acting empirically reduces going off-track dramatically.Add a post-hoc Verifier: Don’t check whether the Agent’s intermediate steps are “correct” (this is hard to judge); only check whether critical steps “happened” (this is easy to judge).
Dynamically adjust tool sets and constraint strength by scenario/complexity: Not all queries need the same analysis depth; give simple queries a fast path.
One-sentence summary
Architecture choice is a business problem, not a technical one. ReAct/LangGraph/SDK are just different implementations of control flow – what truly affects Agent performance is the constraint layer you build on top of the control flow. SmartPerfetto chose Native SDK not because it’s the most advanced, but because performance analysis root cause paths are unpredictable, and predefined DAGs are less effective than letting the Agent explore autonomously within a constraint framework.
Q5: How should a performance AI agent handle scene recognition?
Question context: We want to do scene recognition and route to the correct Skill. When building this, should we rely more on “user utterance” or “logs,” or is there a better approach? Several paths for scene recognition each have issues: code matching (keyword matching leads to results that are too broad or too narrow), LLM understanding (LLM understanding isn’t necessarily accurate), log reconstruction (can filter for scrolling presence, but that may not be what the user cares about).
SmartPerfetto’s approach: Three signal layers with clear division of labor
SmartPerfetto’s scene recognition doesn’t rely on a single signal source but uses three signal layers working together, each solving a different problem:
1 | Layer 1: User utterance -- keyword matching -> scene type (scrolling / startup / anr / ...) |
Layer 1: User utterance (keyword matching, <1ms)
1 | // sceneClassifier.ts -- 46 lines of code, handles all scene classification |
Keywords are defined in each Strategy file’s YAML frontmatter, not hardcoded in TypeScript:
1 | # scrolling.strategy.md frontmatter |
Why keyword matching instead of LLM?
- Cost: Scene classification executes at the entry of every analysis; keyword matching is <1ms + 0 tokens; LLM calls take ~500ms + ~500 tokens
- Determinism: The cost of misclassification is very high (injecting the wrong Strategy file); keyword matching behavior is fully predictable
- Sufficiently accurate: In the performance analysis domain, user queries are highly formatted – someone saying “scrolling stuttering” is asking about scrolling, someone saying “slow startup” is asking about startup. No LLM needed to “understand” this
What about cases where keyword matching falls short?
Keyword matching does have boundaries – when a user says “why is this app slow,” keywords can’t determine whether it’s slow startup or slow scrolling. SmartPerfetto handles this by: falling back to the general scene when no match is found, letting the Agent autonomously choose a direction within the general strategy’s routing decision tree.
1 | # general.strategy.md -- no hard constraints, only routing suggestions |
The core idea of this design is: don’t try to achieve 100% accurate classification at the entry point; instead, let accurate cases take the fast path (keywords -> Strategy), and uncertain cases take the exploration path (general -> Agent autonomous routing).
Layer 2: Trace data – architecture detection (deterministic code)
Scene classification only resolves “what the user wants to analyze,” but the same scenario (e.g., scrolling) requires completely different analysis paths under different rendering architectures:
| Architecture | Rendering pipeline | Analysis differences |
|---|---|---|
| Standard Android | UI Thread -> RenderThread -> SurfaceFlinger | Dual-track analysis of main thread + RenderThread |
| Flutter TextureView | 1.ui -> 1.raster -> JNISurfaceTexture -> RenderThread updateTexImage | Dual-pipeline; need to analyze Flutter engine threads + texture bridging |
| Flutter SurfaceView | 1.ui -> 1.raster -> BufferQueue -> SurfaceFlinger | Single pipeline; doesn’t go through RenderThread |
| WebView | CrRendererMain -> Viz Compositor | Chromium rendering pipeline; different thread names |
| Compose | UI Thread (Composition) -> RenderThread | Similar to Standard but with Composition phase |
Architecture detection is delegated to the YAML skill rendering_pipeline_detection – it performs thread/Slice signal collection, pipeline scoring, and sub-variant determination at the SQL layer, supporting 24 fine-grained rendering architectures. The TypeScript side (architectureDetector.ts) is only responsible for calling the skill and mapping results; it doesn’t do direct if/else judgments:
1 | rendering_pipeline_detection skill (SQL) |
Detection results are injected into the system prompt (via templates like arch-flutter.template.md), and the Agent selects the corresponding analysis tools after seeing the architecture information.
Layer 3: Data completeness – capability register
Different trace capture configurations yield different available data dimensions. Some traces lack GPU frequency data; others lack thermal zone data. SmartPerfetto probes the availability of 18 data dimensions before analysis begins:
1 | frame_rendering: OK (456 rows) |
This information is likewise injected into the system prompt, telling the Agent which dimensions can be analyzed and which lack data. This prevents the Agent from invoking a Skill with no data backing, getting empty results, then switching directions – such trial-and-error wastes 1-2 tool calls’ worth of tokens.
Evaluating the three paths from the question
1. Code matching (keywords): Viable, but needs well-designed fallback
The questioner said “keyword matching leads to results that are too broad or too narrow.” SmartPerfetto’s experience:
- Priority ordering solves the “too broad” problem: ANR(1) > startup(2) > scrolling(3); when multiple keywords match simultaneously, take the highest priority
- Compound patterns improve precision:
/startup.*slow/is more precise than matching “startup” alone generalfallback solves the “too narrow” problem: when nothing matches, don’t guess – hand it to Agent for autonomous exploration
2. LLM understanding: Not recommended for the classification entry point; can be used in the fallback path
LLM classification in SmartPerfetto isn’t Layer 1 but rather the Agent’s autonomous routing within the general scenario – at that point, the Agent already has trace data and can make more accurate judgments combined with the data.
3. Log reconstruction: Suitable as a supplementary signal for Layer 2
Logs can tell you “what’s in the trace” (whether scrolling events exist, whether ANR exists), but can’t tell you “what the user cares about.” SmartPerfetto’s data completeness probing plays exactly this role – it doesn’t participate in scene classification but provides the Agent with data availability information.
One-sentence summary
Scene recognition shouldn’t try to solve all problems with a single signal source. Use keyword matching for fast routing (accurate cases), use general fallback for Agent autonomous exploration (uncertain cases), and use trace data for architecture and completeness supplementation. Keyword matching + priority ordering + compound patterns + fallback strategy – 46 lines of code is sufficient.
Q6: How to better leverage AI autonomous exploration in “deterministic steps + AI exploration”?
Question context: We’ve found in production that when AI explores autonomously and drills into root causes, it tends to go off track and give incorrect results. For example, in the SmartPerfetto blog post example – after identifying RenderThread being blocked by Binder in the earlier steps (based on deterministic steps), are the subsequent hypothesis formations and validations pure AI, or do we give the AI some common causes as guidance for Binder blocking and let it investigate on its own?
First, answering the core question: It’s neither pure AI nor hardcoded guidance
SmartPerfetto’s approach is structured reasoning framework + on-demand knowledge injection:
1 | Deterministic steps produce data |
Three key mechanisms make AI autonomous exploration more reliable:
Mechanism 1: Hypothesis management tools – adding structure to the reasoning process
SmartPerfetto provides submit_hypothesis and resolve_hypothesis as two MCP tools. Instead of letting the Agent reason implicitly in internal monologue, it forces externalization:
1 | Agent calls: |
Why is this effective? The hypothesis management tools force the Agent to explicitly declare “what I’m verifying” and “what I expect to see” before taking action. This has two benefits:
- Prevents goal drift – the Agent won’t forget what it originally wanted to verify while collecting data
- Auditable – every hypothesis has a complete record; the Verifier can check whether all hypotheses were resolved
Mechanism 2: Knowledge injection – not hardcoded guidance, but on-demand domain knowledge loading
“Could we give the AI some common causes as guidance for Binder blocking?”
Yes, but not hardcoded in Skills. Instead, it’s loaded on demand through the lookup_knowledge MCP tool. After discovering Binder blocking, the Agent can call:
1 | invoke lookup_knowledge("binder-ipc") |
This returns a Binder IPC knowledge template (knowledge-binder-ipc.template.md) containing:
- Classification of typical Binder transaction blocking causes (server-side busy, process frozen, CPU scheduling delay, oneway queue full)
- Investigation paths and key metrics for each cause
- Common misdiagnosis scenarios (e.g., oneway transactions don’t block the caller)
Key design: Knowledge is actively pulled by the Agent, not force-injected by the system. There are currently 8 knowledge templates (rendering-pipeline, binder-ipc, gc-dynamics, cpu-scheduler, thermal-throttling, lock-contention, startup-root-causes, data-sources). If all templates were pre-injected into the system prompt, it would consume a large number of tokens and most would be irrelevant. Through on-demand loading via MCP tools, the Agent only retrieves the relevant domain background knowledge when needed.
SmartPerfetto also provides conditional deep drill suggestion tables in Strategy files, which is another form of guidance:
1 | # scrolling.strategy.md Phase 1.9 |
This table isn’t a hardcoded decision tree – it’s a lookup table for the Agent. The Agent decides which row to follow based on metric values in the data. If the data doesn’t match any row, the Agent can explore autonomously.
Mechanism 3: ReAct Reasoning Nudge – triggering reflection when tools return results
During the first few successful returns from data tools (execute_sql / invoke_skill), SmartPerfetto appends a reasoning prompt at the end of the result:
1 | // claudeMcpServer.ts |
Extremely low cost (~20 tokens/call, ~80 tokens total for the first 4 calls), but significantly effective. The reason it’s not applied throughout is to control token overhead in the latter half of analysis – the first few nudges already establish the pattern of “receive data -> reflect first -> then act.” Without this nudge, the Agent tends to call tools consecutively without pausing to think – collecting data 5 times without forming any intermediate conclusions, resulting in poor final summary quality.
Walking through the complete flow with the Binder example from the article
1 | 1. Start with overview -> discovers 47 jank frames, P90 = 23.5ms |
Note that Steps 4-9 are all AI autonomous exploration, but constrained by three mechanisms:
- Hypothesis tools force reasoning externalization (Steps 5, 7)
- Knowledge injection provides domain investigation paths (Step 4)
- REFLECT nudge triggers reflection after the first few tool returns (Step 7)
Four practical recommendations for making AI autonomous exploration more reliable
1. Give data, not conclusions
Skills should return structured data (frame durations, thread state distributions, blocking function lists), not pre-drawn conclusions (“RenderThread blocking is caused by Binder”). Having AI reason its own conclusions from data is more reliable than having it build further analysis on conclusions provided by others.
2. Give framework, not path
Strategy files should define “what must be done” (Phase 1.9 must deep drill), not “how to do it” (first query A, then B, then C). An Agent autonomously selecting paths within a framework constraint is far more reliable than free exploration without any constraints, yet far more flexible than hardcoded paths.
3. Give knowledge, not answers
Knowledge templates should contain “possible cause classifications and investigation methods,” not “if you see X, it’s Y.” The former helps the Agent build a reasoning framework; the latter turns the Agent back into a Workflow.
4. Verify behavior, not conclusions
Verifiers should use heuristic rules to check “whether the analysis output reflects critical actions” (does the conclusion show traces of deep drill analysis, are all hypotheses resolved, does the causal chain have sufficient depth), rather than trying to judge “whether the conclusion is correct” (this should be done offline with LLM Judge evaluation, not at runtime). Note that these are text pattern matching level heuristic checks, not precise tool call log audits.
One-sentence summary
AI autonomous exploration reliability isn’t guaranteed by “hardcoded guidance,” but by three mechanisms: hypothesis management tools externalize reasoning, on-demand knowledge injection provides domain investigation frameworks, and ReAct nudge prevents blind tool calling. The key principles are “give data not conclusions, give framework not path, give knowledge not answers.”
Q7: How is the Prompt assembled for each turn?
Question context: LLM Agent output quality depends heavily on system prompt design. How does SmartPerfetto construct the prompt for each analysis? How does the prompt change across different scenarios and different turns? How is the token budget controlled?
Overall design: Four-tier layered assembly + cache optimization
SmartPerfetto’s system prompt isn’t a static string but is dynamically assembled by the buildSystemPrompt() function (claudeSystemPrompt.ts:260) before each SDK query. The assembly follows a core principle:
Sort by “stability” – content that changes less frequently goes first, more dynamic content goes last.
The reason for this design is Anthropic API’s automatic caching mechanism: when the system prompt exceeds 1024 tokens, the API automatically caches the prompt prefix. By placing unchanging content at the very front, most of the prompt can hit the cache across multi-turn conversations, significantly reducing latency and cost:
1 | Same trace + same scene: ~4000 tokens cached (~80% savings) |
Four-tier assembly structure
1 | +-------------------------------------------------------+ |
Template loading and variable substitution
All prompt content is defined in Markdown files (detailed in Q1); TypeScript only handles loading and variable substitution:
1 | // strategyLoader.ts -- template system |
Template file inventory:
| Category | File | Purpose |
|---|---|---|
| Static templates | prompt-role.template.md |
Role definition |
prompt-output-format.template.md |
Output format rules (91 lines) | |
prompt-quick.template.md |
Quick mode streamlined prompt | |
| Methodology | prompt-methodology.template.md |
Analysis methodology (contains {{sceneStrategy}} placeholder) |
| Architecture guides | arch-standard.template.md |
Standard Android rendering guidance |
arch-flutter.template.md |
Flutter engine guidance | |
arch-compose.template.md |
Jetpack Compose guidance | |
arch-webview.template.md |
WebView guidance | |
| Selection templates | selection-area.template.md |
Time range selection ({{startNs}}, {{endNs}}…) |
selection-slice.template.md |
Slice selection ({{eventId}}, {{ts}}…) |
|
| Comparison mode | comparison-methodology.template.md |
Dual-trace comparison methodology |
| Scene strategies | 12 *.strategy.md files |
scrolling/startup/anr/memory/… |
| Knowledge templates | 8 knowledge-*.template.md files |
On-demand domain knowledge (not injected into prompt) |
| Auxiliary templates | prompt-complexity-classifier.template.md |
Quick/Full routing decision (not injected into prompt, but determines which path to take) |
Token budget management
Budget ceiling: 4500 tokens (MAX_PROMPT_TOKENS). During correction retries, if SDK auto-compact is detected (conversation history automatically compressed), the budget is reduced to 3000 tokens to leave room; otherwise the original prompt is reused.
Token estimation method: Mixed Chinese-English estimation – Chinese characters at 1.5 tokens/character, ASCII at 0.3 tokens/character. This is a rough approximation but sufficiently accurate for budget management.
Progressive dropping strategy when over budget:
When the assembled prompt’s token count exceeds the budget, entire sections are dropped in order from lowest to highest priority:
1 | Drop order (dropped first -> dropped last): |
Content that is never dropped:
- Role definition, output format (Tier 1 static)
- Architecture info, focus application (Tier 2 per-trace)
- Methodology + scene strategy (Tier 3 core)
- User selection context (user’s explicit intent)
- Conversation context (previous findings and analysis notes)
Complete context construction flow
In ClaudeRuntime.analyze(), prompt assembly is preceded by over twenty preparation phases to collect all context:
1 | Phase 0: Selection context logging |
All Phase results feed into the ClaudeAnalysisContext object, passed to buildSystemPrompt() for final assembly.
Quick vs Full dual mode
Not all queries need the full 4500-token prompt. When users ask factual questions (e.g., “what’s the frame rate”), SmartPerfetto uses a streamlined quick prompt:
1 | // buildQuickSystemPrompt() -- ~1500 tokens |
| Dimension | Quick Mode | Full Mode |
|---|---|---|
| Target tokens | ~1500 | ~4500 |
| Scene strategy | None | One of 12 |
| Methodology | None | prompt-methodology.template.md |
| Conversation context | None | findings + notes + entity + summary |
| Planning Gate | None | Yes |
| Verifier | None | Yes |
| Use case | “What’s the frame rate” “Is there an ANR” | “Analyze scrolling stuttering” “Analyze startup performance” |
How the prompt changes across multi-turn conversations
In multi-turn analysis (user follow-ups or drill-downs), prompt changes depend on whether the SDK session hits resume:
Turn 1: No conversation context, no historical plans, no analysis notes
Turn 2 onward – SDK session resume hit (within 4 hours):
- SDK internally already holds complete conversation history;
previousFindingsandconversationSummaryare not re-injected - But still injected: analysis notes (<=10), entity context (drill-down references), historical plans (<=3 turns)
- Tiers 1-3 remain unchanged, hitting ~80% cache
Turn 2 onward – SDK session expired or unavailable:
- Previous turn’s findings are manually injected as “previous analysis findings” (<=10)
- Conversation summary is manually injected (
sessionContext.generatePromptContext(2000), <=2000 tokens) - Analysis notes, entity context, and historical plans same as above
Correction retry turn: If SDK auto-compact is detected (conversation history automatically compressed), token budget drops from 4500 to 3000, and progressive dropping more aggressively removes non-critical sections. If auto-compact hasn’t occurred, the original system prompt is reused.
A concrete example: Prompt assembly for scrolling analysis
User inputs "Analyze scrolling stuttering", Flutter TextureView architecture, Turn 1:
1 | [Tier 1] prompt-role.template.md -> "You are an Android performance analysis expert..." |
User follows up with "Deep dive into frame 3", Turn 2 (SDK session resume hit):
1 | [Tier 1-3] Same as Turn 1 (hitting ~80% cache) |
One-sentence summary
The prompt is sorted by “stability” across four tiers (Static -> Per-Trace -> Per-Query -> Dynamic), leveraging API prefix caching to achieve ~80% token savings across multi-turn conversations. The template system lets domain experts directly edit analysis strategies without touching TypeScript. When over budget, progressive dropping by priority occurs, but role definition, scene strategy, and user selection are always preserved – these three determine the analysis direction and scope.
Q8: What Skills does SmartPerfetto have?
Question context: SmartPerfetto’s analysis capabilities are carried by YAML Skills. A complete Skill inventory helps understand the system’s analysis coverage.
Overview
| Category | Count | Description |
|---|---|---|
| Atomic | 87 | Single-step detection/statistics, completed with one or a few SQL statements |
| Composite | 29 | Combines multiple atomic skills, supports iterator/conditional |
| Deep | 2 | Deep profiling (callstack, CPU profiling) |
| Pipeline | 28 | Rendering pipeline detection + teaching (24+ architectures) |
| Module | 18 | Modular configuration: app/framework/hardware/kernel |
| Total | 164 |
Atomic Skills (87)
Single-step data extraction and detection – the building blocks for all higher-level Skills.
Frame rendering and jank:
| Skill ID | One-line description |
|---|---|
| consumer_jank_detection | Detect real frame drops from SF consumer perspective (per-layer buffer starvation) |
| frame_blocking_calls | Identify blocking calls during each jank frame (GC, Binder, locks, IO) |
| frame_production_gap | Detect frame production gaps: gaps between consecutive frames exceeding 1.5x VSync |
| frame_pipeline_variance | Detect frame duration jitter and high-variance intervals |
| render_pipeline_latency | Break down latency across all stages of the frame rendering pipeline |
| render_thread_slices | Analyze RenderThread time slice distribution |
| app_frame_production | Analyze application main thread frame production |
| sf_frame_consumption | Analyze SurfaceFlinger frame consumption |
| sf_composition_in_range | Analyze SurfaceFlinger composition latency |
| sf_layer_count_in_range | Count active SF layers within a time range |
| present_fence_timing | Analyze Present Fence timing, detecting actual display latency |
| game_fps_analysis | Game-specific frame rate analysis, supporting fixed frame rate modes |
VSync and refresh rate:
| Skill ID | One-line description |
|---|---|
| vsync_period_detection | Detect VSync period, return refresh rate and confidence |
| vsync_config | Parse actual VSync period and refresh rate settings from trace |
| vsync_alignment_in_range | Analyze frame-to-VSync signal alignment |
| vsync_phase_alignment | Analyze input event to VSync phase relationship, locating touch-to-display latency bottlenecks |
| vrr_detection | Detect whether the device uses variable refresh rate (VRR/LTPO/Adaptive Sync) |
CPU and scheduling:
| Skill ID | One-line description |
|---|---|
| cpu_topology_detection | Dynamically detect CPU big.LITTLE core topology from cpufreq |
| cpu_topology_view | Create reusable SQL VIEW _cpu_topology |
| cpu_slice_analysis | Analyze CPU time slice distribution (with dynamic topology detection) |
| cpu_load_in_range | Analyze per-CPU core load within a specified time range |
| cpu_cluster_load_in_range | Calculate overall CPU load percentage for big and little core clusters |
| cpu_freq_timeline | Analyze per-CPU core frequency change timeline |
| cpu_throttling_in_range | Detect CPU thermal throttling situations |
| sched_latency_in_range | Analyze thread scheduling wait time distribution, detecting CPU contention |
| scheduling_analysis | Analyze thread scheduling latency (Runnability) |
| task_migration_in_range | Analyze thread migration frequency between big and little cores |
| thread_affinity_violation | Detect high-frequency core migration of main thread/RenderThread |
| thermal_predictor | Predict thermal throttling risk based on CPU frequency trends |
| cache_miss_impact | Count cache-miss counters and evaluate fluctuation |
GPU:
| Skill ID | One-line description |
|---|---|
| gpu_render_in_range | Analyze GPU rendering duration and Fence wait |
| gpu_freq_in_range | Analyze GPU frequency changes |
| gpu_metrics | Analyze GPU frequency, utilization, and rendering performance |
| gpu_power_state_analysis | Analyze GPU frequency state transitions, identifying frequency reduction pressure and jitter |
Main thread analysis:
| Skill ID | One-line description |
|---|---|
| main_thread_states_in_range | Count main thread states, blocking functions, and percentages within a range |
| main_thread_slices_in_range | Count main thread slice duration distribution within a range |
| main_thread_sched_latency_in_range | Count main thread Runnable wait time distribution |
| main_thread_file_io_in_range | Count main thread file IO related slice durations within a range |
Binder IPC:
| Skill ID | One-line description |
|---|---|
| binder_in_range | Analyze Binder transactions within a specified time range |
| binder_blocking_in_range | Analyze counterpart process response delays in synchronous Binder calls |
| binder_root_cause | Perform server/client-side blocking cause attribution for slow Binder transactions |
| binder_storm_detection | Detect Binder transaction storms: too many IPC calls in a short period |
Locks and synchronization:
| Skill ID | One-line description |
|---|---|
| lock_contention_in_range | Analyze lock contention within a specified time range |
| futex_wait_distribution | Count futex/mutex lock wait distribution and duration |
Startup-specific (19):
| Skill ID | One-line description |
|---|---|
| startup_events_in_range | Query startup events and TTID/TTFD metrics |
| startup_slow_reasons | Startup slow reasons (Google official classification + self-check) v3.0 |
| startup_critical_tasks | Auto-identify all active threads during startup interval, sorted by CPU time |
| startup_thread_blocking_graph | Build thread block/wakeup relationship graph using waker_utid |
| startup_jit_analysis | Analyze JIT compilation thread impact on startup speed |
| startup_cpu_placement_timeline | Analyze main thread core type changes by time bucket, detecting stuck-on-little-core during startup |
| startup_freq_rampup | Analyze CPU frequency ramp-up speed during cold start, detecting frequency scaling delays |
| startup_binder_pool_analysis | Analyze Binder thread pool utilization and saturation during startup |
| startup_hot_slice_states | Analyze thread state distribution of Top N hot slices during startup interval |
| startup_main_thread_states_in_range | Count main thread Running/Runnable/Blocked percentages during startup |
| startup_main_thread_slices_in_range | Count main thread slice hotspots during startup |
| startup_binder_in_range | Count Binder call distribution during startup |
| startup_main_thread_file_io_in_range | Count main thread file IO during startup |
| startup_sched_latency_in_range | Count main thread Runnable wait latency during startup |
| startup_main_thread_sync_binder_in_range | Count main thread synchronous Binder duration during startup |
| startup_main_thread_binder_blocking_in_range | Analyze main thread synchronous Binder blocking details during startup |
| startup_breakdown_in_range | Count attribution reason time percentages during startup |
| startup_gc_in_range | Count GC slices and main thread percentage during startup |
| startup_class_loading_in_range | Count class loading slice durations during startup |
Memory and GC:
| Skill ID | One-line description |
|---|---|
| gc_events_in_range | Query GC events for a given process and optional time range |
| memory_pressure_in_range | Analyze memory pressure metrics within a specified time range |
| page_fault_in_range | Analyze Page Fault and memory reclaim impact on performance |
Input and touch:
| Skill ID | One-line description |
|---|---|
| input_events_in_range | Extract raw input events within a range, analyzing dispatch latency |
| input_to_frame_latency | Measure latency from each MotionEvent to corresponding frame present |
| touch_to_display_latency | Measure end-to-end latency from touch to frame rendering |
| scroll_response_latency | Measure response latency from scroll gesture input to first frame rendering |
System and device:
| Skill ID | One-line description |
|---|---|
| system_load_in_range | Analyze overall system CPU utilization and process activity |
| device_state_snapshot | Capture device environment info during trace (screen, battery, temperature, etc.) |
| device_state_timeline | Track device state changes over time |
| wakelock_tracking | Track Wake Lock holding, detecting battery drain anomalies |
Others:
| Skill ID | One-line description |
|---|---|
| blocking_chain_analysis | Analyze main thread blocking chain: what blocked the main thread? What was the waker doing? |
| anr_main_thread_blocking | Deep analysis of main thread blocking cause during ANR |
| anr_context_in_range | Extract first ANR event data as time window anchor |
| app_lifecycle_in_range | Track Activity/Fragment lifecycle events |
| compose_recomposition_hotspot | Detect Jetpack Compose recomposition hotspots |
| webview_v8_analysis | Analyze WebView V8 engine: GC, script compilation, execution time |
| rendering_pipeline_detection | Identify application rendering pipeline type (24 fine-grained detection types) |
| pipeline_key_slices_overlay | Query pipeline key Slice ts/dur for timeline overlay |
Composite Skills (29)
Combine multiple atomic skills, supporting iterator (per-frame/per-event deep drill) and conditional (data-driven branching).
| Skill ID | One-line description |
|---|---|
| scrolling_analysis | Scrolling analysis main entry: overview -> frame list -> root cause classification -> per-frame diagnosis |
| flutter_scrolling_analysis | Flutter-specific frame analysis, using Flutter thread model |
| jank_frame_detail | Analyze a specific jank frame in detail: deep drill into jank cause and root cause classification |
| startup_analysis | Startup analysis main entry: Iterator mode, big/little core analysis, four-quadrant |
| startup_detail | Analyze a single startup event: main thread duration, Binder, CPU big/little core ratio |
| anr_analysis | ANR v3.0 analysis: system issue vs. app issue, categorized handling |
| anr_detail | Single ANR event detail: four-quadrant, Binder dependencies, deadlock detection |
| cpu_analysis | CPU analysis: time distribution, big/little core analysis, scheduling chain |
| gpu_analysis | GPU analysis: frequency distribution, memory usage, frame rendering correlation |
| memory_analysis | Memory analysis: GC events, GC-to-frame correlation, thread states |
| gc_analysis | GC analysis: based on stdlib android_garbage_collection_events |
| binder_analysis | Binder deep analysis: transaction basics, thread states |
| binder_detail | Single Binder transaction detail: CPU big/little core, four-quadrant, blocking cause |
| thermal_throttling | Temperature monitoring, thermal throttling detection, CPU frequency correlation |
| lock_contention_analysis | Lock contention multi-dimensional analysis: based on android.monitor_contention |
| surfaceflinger_analysis | SF frame composition performance: GPU/HWC composition ratio, slow composition detection |
| click_response_analysis | Click response analysis: based on stdlib android_input_events |
| click_response_detail | Single slow input event detail: latency breakdown, four-quadrant, main thread blocking |
| scroll_session_analysis | Single complete scroll session: Touch phase vs Fling phase FPS |
| navigation_analysis | Activity/Fragment navigation performance: lifecycle, transition animations |
| lmk_analysis | LMK analysis: cause distribution, timeline, frequency |
| dmabuf_analysis | DMA Buffer analysis: allocation, release, leak detection |
| block_io_analysis | Block IO analysis: device-level statistics, queue depth, long-duration IO |
| io_pressure | IO blocking data detection, IO Wait time, severity assessment |
| suspend_wakeup_analysis | Suspend/wakeup analysis: time distribution, wakeup source ranking |
| network_analysis | Network analysis: traffic overview, per-app traffic, protocol distribution |
| irq_analysis | Hard interrupt and soft interrupt frequency, duration, nesting |
| scene_reconstruction | Reconstruct user operation scenarios through user input and screen state |
| state_timeline | Four-lane continuous state timeline: device/user/app/system |
Deep Skills (2)
Deep profiling, typically requiring longer execution time.
| Skill ID | One-line description |
|---|---|
| cpu_profiling | CPU performance profiling: usage hotspots and scheduling efficiency deep analysis |
| callstack_analysis | Call stack hotspot analysis in Running state |
Pipeline Skills (28)
Rendering pipeline detection + teaching. Each pipeline skill corresponds to a rendering architecture, including pipeline description, key threads, performance metrics, and optimization recommendations.
| Skill ID | Rendering architecture |
|---|---|
| pipeline_android_view_standard_blast | Android 12+ standard HWUI + BLASTBufferQueue |
| pipeline_android_view_standard_legacy | Pre-Android 12 standard HWUI + Legacy BufferQueue |
| pipeline_android_view_software | CPU Skia software rendering, no RenderThread |
| pipeline_android_view_mixed | View + SurfaceView mixed rendering |
| pipeline_android_view_multi_window | Same-process multi-window (Dialog/PopupWindow) |
| pipeline_android_pip_freeform | Picture-in-Picture and freeform window mode |
| pipeline_compose_standard | Jetpack Compose + HWUI RenderThread |
| pipeline_flutter_textureview | Flutter PlatformView fallback mode |
| pipeline_flutter_surfaceview_skia | Flutter + Skia engine (JIT Shader) |
| pipeline_flutter_surfaceview_impeller | Flutter + Impeller engine (pre-compiled Shader) |
| pipeline_webview_gl_functor | Traditional WebView, App RenderThread synchronous wait |
| pipeline_webview_surface_control | Modern WebView + Viz/OOP-R independent composition |
| pipeline_webview_textureview_custom | X5/UC and other custom WebView engines |
| pipeline_webview_surfaceview_wrapper | WebView fullscreen video wrapper mode |
| pipeline_chrome_browser_viz | Chrome Viz compositor, multi-process architecture |
| pipeline_opengl_es | Direct OpenGL ES / EGL rendering |
| pipeline_vulkan_native | Native Vulkan rendering |
| pipeline_angle_gles_vulkan | ANGLE: OpenGL ES -> Vulkan translation layer |
| pipeline_game_engine | Unity/Unreal/Godot and other game engines |
| pipeline_surfaceview_blast | Standalone SurfaceView + BLAST sync |
| pipeline_textureview_standard | SurfaceTexture texture sampling/composition mode |
| pipeline_camera_pipeline | Camera2/HAL3 multi-stream camera rendering |
| pipeline_video_overlay_hwc | HWC video layer hardware-accelerated overlay |
| pipeline_hardware_buffer_renderer | Android 14+ HBR API direct Buffer rendering |
| pipeline_surface_control_api | NDK SurfaceControl direct transaction submission |
| pipeline_variable_refresh_rate | VRR/ARR + FrameTimeline dynamic refresh rate |
| pipeline_imagereader_pipeline | ImageReader API: ML inference, screen recording, custom camera |
| pipeline_software_compositing | SF CPU software composition fallback (when GPU unavailable) |
Note:
_base.skill.yamlis the base template file for Pipeline Skills, not registered as an available Skill, and not counted in the total.
Module Skills (18)
Modular analysis configuration, organized by layer. The Agent discovers them via list_skills and invokes on demand.
Hardware layer (5):
| Skill ID | One-line description |
|---|---|
| cpu_module | CPU frequency, thermal throttling, and power states |
| gpu_module | GPU rendering, frequency, and VRAM usage |
| memory_module | Memory bandwidth, LMK, dmabuf, PSI, page faults |
| thermal_module | Temperature sensors, thermal throttling detection, cooling policy |
| power_module | Wake Lock, CPU idle, power mode, suspend/wakeup |
Framework layer (6):
| Skill ID | One-line description |
|---|---|
| surfaceflinger_module | Frame rendering timing, jank causes, GPU composition |
| choreographer_module | VSync signal, doFrame callbacks, frame production pipeline |
| ams_module | Application lifecycle, process management, startup timing |
| wms_module | Window animations, Activity transitions, multi-window |
| art_module | GC, JIT compilation, and memory allocation |
| input_module | Touch latency, input dispatch, and click response |
Kernel layer (4):
| Skill ID | One-line description |
|---|---|
| scheduler_module | Thread scheduling latency, CPU utilization, big/little core assignment |
| binder_module | Cross-process calls, blocking transactions, call latency |
| lock_contention_module | Mutex/Futex, Java monitor, deadlock detection |
| filesystem_module | Block IO, file operations, database, SharedPreferences |
Application layer (3):
| Skill ID | One-line description |
|---|---|
| launcher_module | Home screen performance, app launch, widget updates |
| systemui_module | Status bar, notification shade, quick settings, navigation bar |
| third_party_module | Third-party app performance, stuttering, and resource usage |
Relationships between Skills
1 | Module Skills (configuration layer) |
Agent’s typical invocation path (scrolling analysis example):
1 | invoke_skill("scrolling_analysis") <- Composite, internally calls multiple Atomic |
(Continuously updated; new questions will be added as received)