Zain Taufique "Efficient run-time systems for AI inference on heterogeneous computing platforms"

HIIT Special Seminars

This talk is part of the HIIT Special Seminar series. The talks in this series are provided by candidates who have applied to our HIIT Fellowship recruitment call and are highly considered for the position. All talks are virtual, open to the public, and recorded for the future

When

5.5.2026 14:00 – 14:30 (UTC +3)

Where

Online

Event language(s)

English

Join via zoom

This talk can be viewed via zoom. (Note: this talk will be recorded)

Title: "Efficient run-time systems for AI inference on heterogeneous computing platforms”

Abstract:

Efficient AI orchestration across heterogeneous edge platforms is crucial for satisfying workload Quality-of-Service (QoS) requirements, such as latency, accuracy, and priority, while maintaining platform energy efficiency. The edge platforms integrate heterogeneous compute clusters, including CPUs, GPUs, and Neural Processing Units, each exhibiting asymmetric energy performance characteristics. However, these baseline platforms remain resource-constrained with limited computational capacity. At the same time, modern AI workloads are compound in nature, comprising diverse models such as Deep Neural Networks, transformers, diffusion models, and large language models. This diversity manifests at runtime as varied inference requests that arrive both continuously and in response to user prompts, each with distinct latency, accuracy, and priority requirements. Consequently, executing multiple AI models simultaneously while satisfying diverse QoS constraints introduces a complex design-space exploration problem that requires intelligent run-time resource scheduling. The existing scheduling mechanisms remain largely conservative, failing to fully exploit resource heterogeneity while often violating workload and system constraints. This talk presents my current research on efficient AI orchestration on resource-constrained edge platforms and outlines future directions toward integrated edge–cloud computing paradigms.

Bio: Zain Taufique is a doctoral researcher in technology at the University of Turku, Finland, working on efficient runtime systems for edge AI inference. His research focuses on optimizing the performance of modern AI workloads such as multi-DNN pipelines and large language models on heterogeneous computing platforms including CPUs, GPUs, and NPUs. He has over seven years of experience in embedded systems, AI inference optimization, and performance analysis across edge computing platforms. Zain has published multiple papers in leading conferences and journals in embedded and edge AI systems and has also worked in industry roles ranging from embedded systems engineering to AI product development.

Updated: 27.4.2026
Published: 17.4.2026