Contact Us

Use the form on the right to contact us.

You can edit the text in this area, and change where the contact form on the right submits to, by entering edit mode using the modes on the bottom right. 

1 Create Way
Singapore, 138602

Research project within TUM CREATE. Focussing on modelling and optimisation of architecture and infrastructure, urban systems simulation like traffic and power are the main research interests. Apart from that, a cognitive systems group deal with human computer interaction. 

Traffic Simulation on Heterogeneous Hardware

Research Projects

Traffic Simulation on Heterogeneous Hardware

David Eckhoff

Challenges of Simulation on Heterogeneous Hardware

Agent-based Simulation (ABS) has become a mainstream method of modelling and simulation. However, performance issues arise for the following reasons:



We explore versatile hardware platforms to accelerate agent-based simulation. These hardware platforms include multi-core CPUs, many-core CPUs, Graphics Processing Units (GPUs), Accelerated Processing Units (APUs), Field-Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs) and System on Chip (SoC).

The adaption of a CPU-based sequential simulation for execution on a heterogeneous platform comes with five challenges:

  • Hardware assignment: the distribution of workload among the hardware platforms.

  • Data transfer overhead: the cost to transfer agent data from one device to another.

  • Scattered memory accesses: accesses to a device’s memory must follow hardware-specific patterns to avoid cache misses and costly accesses to high-latency memory.

  • Maximisation of parallelism: to fully exploit the hardware resources.

  • Abstraction from hardware specifics: frameworks ease the simulation development by avoiding the need for detailed hardware knowledge.


Our goal is to reduce the manual work involved in developing a simulation targeting heterogeneous hardware comprised of CPUs, GPUs, and FPGAs. A middleware detects the simulation parts fit to run on a certain type of hardware. The middleware takes the simulation code as the input and automatically decomposes the simulation according to the available hardware. It then generates the corresponding code and orchestrates the assignment of code segments to the hardware devices.


Current Progress

To explore performance potentials, we developed an agent-based traffic simulation in OpenCL that can run on a CPU, a GPU, or an APU.

We also conducted a comprehensive performance study on execution schemes for this traffic simulation on a heterogeneous CPU/GPU platform. The execution schemes are illustrated in the figure bellow.

The simulation is decomposed into three stages: SENSE, THINK, and ACT. Moderate speedup over the sequential execution on a CPU can be achieved when only the THINK stage is parallelised on the GPU. The performance can be improved by using the fused GPU of an APU. Due to the zero-copy technique, the data transfer overhead between the fused CPU and the fused GPU is eliminated.

Substantial speedup can be achieved when parallelising all stages. We achieved a speedup of up to 28.7x over the CPU-based execution