Android 17 and higher supports the Neural Processing
Unit (NPU) Manager (com.android.npumanager), which coordinates the allocation
and scheduling of NPU resources across system services and application
workloads. By moving resource arbitration from custom vendor daemons to the
Android platform, the NPU Manager increases predictability, prevents resource
starvation, manages thermal boundaries, and enhances overall device performance.
Background and motivation
Before the NPU Manager, apps and system modules submitted workloads directly to vendor drivers or proprietary services. This approach had several drawbacks:
- Inefficient resource competition: Heavy machine learning workloads (such as Large Language Model (LLM) inference engines or on-device vision systems) competed directly with other high-priority systems for finite NPU resources (such as SRAM, weights memory, and execution channels).
- System instability: Uncoordinated workloads could trigger thermal throttling, memory page faults, or low memory killer daemon (LMKD) if the demands exceeded hardware capacity.
- Inefficient prioritization: The system server can't adjust NPU priority in response to context shifts, such as a background task loading a massive model while a latency-sensitive camera pipeline or user assistant is active in the foreground.
The NPU Manager addresses these challenges by acting as a system-level arbiter that gates model loading and dynamically adjusts execution priorities based on current device health and app states.
System architecture
The NPU Manager is implemented as a system service named npu running within
the Android framework. The NPU Manager isolates the high-level coordination of
scheduling policies from the low-level vendor driver implementation.
The following diagram illustrates the NPU Manager environment layers:

Figure 1. NPU Manager environment layers.
Key components
- Framework API Client (
android.npumanager.NpuManager): The entry point used by clients to request model load reservations - System Service (
npu): A system service that gates model load approvals and manages preemption commands based on scheduling priority rules - NPU Scheduling HAL (
android.hardware.npu): An AIDL-based interface that relays that relays Android app priorities callbacks between the framework and the driver - Vendor driver: A low-level driver that controls the hardware execution blocks and implements low-level prioritization mechanisms
SDK and framework API
Before calling low-level neural network libraries or loading model files,
framework clients must interact with the NpuManager service. To do this,
clients first define a model load request and then execute the request and
approval flow.
Model load request
A model load request is represented by ModelLoadRequest. This object contains:
- Unique request ID
- Estimated model size class, such as
NPU_MODEL_SIZE_LESS_THAN_1GBorNPU_MODEL_SIZE_GREATER_THAN_2G - Intended priority, such as
NPU_MODEL_PRIORITY_BACKGROUND,NPU_MODEL_PRIORITY_NORMAL, orNPU_MODEL_PRIORITY_OPPORTUNISTIC
The following code example builds a ModelLoadRequest with a size limit greater
than 2 GB and normal execution priority:
ModelLoadRequest request = new ModelLoadRequest.Builder(requestId)
.setSize(NPU_MODEL_SIZE_GREATER_THAN_2G)
.setPriority(NPU_MODEL_PRIORITY_NORMAL)
.build();
Request and approval flow
Clients invoke requestCanLoadModel asynchronously:
npuManager.requestCanLoadModel(request, callback, executor);
When NPU resources are available, the framework responds using
ModelLoadRequestCallback with the following events:
onCanLoadModel(request, status, listener): Fired when the request is approved. The client receives anNpuManager.ModelLoadStatusListenertoken. After the client fully loads the model in the driver memory, it must calllistener.notifyModelLoaded(request).onRequestUnloadModel(request)oronRequestUnloadModel(request, reason): Fired when the system experiences resource pressure (such as an incoming foreground request or thermal spike) and requires the client to release its model. After reclaiming the NPU resources, the client callslistener.notifyModelUnloaded(request).onModelLoadRequestComplete(request, status): Informs the client of final request lifecycle changes, such as cancellation.
Clients can cancel pending invitations using cancelModelLoad(request).
HAL and vendor integration
To support the NPU Manager, device-specific vendor implementations must conform
to the android.hardware.npu AIDL service interfaces.
Scheduling configuration
The system relays app priority using the SchedulingConfig AIDL
the SchedulingConfig AIDL structure defined in
IScheduling.aidl:
package android.hardware.npu;
@VintfStability
parcelable SchedulingConfig {
int minPriority;
int maxPriority;
int uid;
int appPriority;
boolean hasDirectAccess;
boolean canAttributeOtherUid;
}
Using this structure, the NPU Manager coordinates priority alignments. For example, if a background app submits a high-priority job, the priority is adjusted downwards to prevent interference with foreground graphics.
Task status and profiling
Vendor drivers must report the lifecycle status of NPU execution groups to the
manager. WorkInfo tracks the tasks (defined in
WorkInfo.aidl):
package android.hardware.npu;
import android.hardware.npu.NpuUuid;
@VintfStability
parcelable WorkInfo {
int id;
@nullable NpuUuid groupId;
int uid;
int debugPid;
int originalUid;
@nullable String debugFeatureId;
int jobPriority;
int effectivePriority;
long timestampMs;
int deviceNumber;
}
Event debouncing
The scheduling framework supports event debouncing using the
debounce_duration_ms parameter within the scheduling callback registration.
This avoids log flooding and suppresses rapid notifications, for example,
consecutive start and end events for repeating models.
The callback lifecycle states are reported as follows:
onWorkRequested: Workload is enqueued by the vendor service.onWorkStarted: Workload execution begins.NPU_START_REASON_INITIAL: First execution run.NPU_START_REASON_RESUMED: Execution resumed after preemption.
onWorkEnded: Workload execution ended.NPU_END_REASON_COMPLETED: Successful run completion.NPU_END_REASON_CANCELLED_USER: Cancelled by client.NPU_END_REASON_CANCELLED_SYSTEM: Preempted by system policy.NPU_END_REASON_FAILED: Execution error or driver failure.NPU_END_REASON_PAUSED: Temporarily suspended for higher-priority tasks.
Device readiness and testing
Ensure these configurations are in place before verifying device health.
Application declarations
Clients seeking NPU scheduling prioritization must declare the NPU hardware
feature in their AndroidManifest.xml:
<uses-feature android:name="android.hardware.npu" android:required="false" />
For models deployed on newer generations of partner hardware, this declaration might be required for optimal engine creation.
VTS integration testing
NPU HAL implementations can be validated with VTS functional tests, for
example, VtsHalNpuSchedulingTargetTest.