We are seeking a highly experienced GPU architect to lead the definition and execution of next-generation mobile GPU architecture, while driving architectural convergence between GPU and NPU toward a coherent xPU sub-system design.
This role requires deep expertise in GPU microarchitecture, strong system-level architectural capability, including both hardware and software, and a thorough understanding in graphics and AI common workload. A proven track record of delivering related sub-system IP or complex SoC silicon is highly desirable.
The successful candidate will lead the effort in shaping a converged xPU architecture native for future AI compute, optimised for performance, power efficiency, and silicon area in the next generation mobile compute platforms.
Key Responsibilities:
xPU Converged Architecture Design
* Based on 1st order principle, analyse and characterise future mobile graphics and AI workload, redefine an xPU (GPU & NPU) converged architecture, including hardware and software, from the ground up that is optimal for future applications.
* Ensure compatibility or easy transition from the old architecture.
* Define unified or partially unified execution resources (vector, scalar, tensor units)
* Develop shared scheduling and workload dispatch mechanisms for graphics and AI
* Design resource sharing and isolation strategies under mixed workloads
* Evaluate architectural trade-offs between dedicated and converged compute blocks
* Mobile GPU Architecture Leadership
* Ensure the timely delivery of next-generation mobile GPU architecture and long-term roadmap
* Lead evolution of shader cores, execution pipelines, and cache hierarchy
* Drive performance, power efficiency (Perf/W), and area efficiency (Perf/mm²)
* Provide architectural leadership from concept phase through tape-out
* Memory & Interconnect Architecture
* Define a memory hierarchy strategy for converged GPU/NPU workloads
* The architect shared cache structures and bandwidth arbitration policies
* Optimise on-chip interconnect for heterogeneous compute traffic
* Reduce data movement overhead across compute domains
* System-Level Architecture Collaboration
* Collaborate with CPU, AI software, runtime, and system architecture teams
* Participate in SoC-level power, thermal, and floorplanning trade-offs
* Align hardware architecture with graphics APIs and AI frameworks
* Support performance modelling, workload characterisation, and silicon bring-up
Required:
* 15+ years of experience in GPU, AI accelerator, or heterogeneous compute architecture
* Deep understanding of GPU microarchitecture (SIMD/SIMT, scheduling, memory systems)
* Strong knowledge of tensor/matrix computation and AI acceleration techniques
* Proven experience delivering high-volume silicon
* Expertise in performance modelling and power analysis
* Strong cross-functional communication and leadership capability