Image Processing on tiny-gpu

Published: April 27, 2026

This assignment is intended for prospective undergraduate researchers. If you are a prospective graduate researcher, please refer to the AI Hardware Task or the NTT Task. This task introduces you to GPU-style parallel programming using a simplified system. You will implement a basic image processing application, execute it on a minimal GPU platform, and evaluate its behavior and performance.

The goal is to assess your ability to:

Understand a new system quickly
Map a real-world problem to parallel execution
Reason about performance and system limitations

Tiny-GPU Setup

tiny-gpu is a minimal, educational GPU-like system designed to demonstrate the fundamentals of parallel execution. Unlike modern GPUs, it is intentionally simplified and lacks many performance optimizations and features. This makes it easier to understand how threads are scheduled, how memory is accessed, and how parallel workloads are structured. In this assignment, you will use tiny-gpu as a platform to map a simple real-world computation onto a parallel execution model.

Clone and build the tiny-gpu repository:

git clone https://github.com/adam-maj/tiny-gpu.git
cd tiny-gpu

Follow the repository instructions to compile and run the provided examples. As you do this, familiarize your self with cocotb python based testbench environment and briefly examine how GPU kernels are written and executed. Focus on understanding how threads are launched and how work is divided across them. You are not expected to fully understand the system internals, just enough to modify and run your own kernel.

Task: Image Brightness Adjustment

You will implement a simple image processing operation: increasing the brightness of a grayscale image.

For each pixel:

\[output(x, y) = \min(255, input(x, y) + k)\]

where:

\(input(x, y)\) is the original pixel value (0–255)
\(k\) is a constant brightness factor (e.g., 30–80)
The result must be clamped to 255

You may consider the following for your implementation:

Represent the image as a 1D or 2D array
Assign one thread per pixel
Each thread:
- Reads one pixel
- Applies the brightness operation
- Writes the result back
Your implementation should:
- Work for multiple image sizes
- Produce correct output (proper clamping)
- Be clearly structured and documented everything in a repository

Performance Evaluation

You may use following two images for your evalautions.

Green Parrot — Sample images for the task.

Red flower — Sample images for the task.

Run your brightness-adjustment kernel using multiple image sizes, for example:

Image Size	Number of Pixels	tiny-gpu Execution Time
64×64	4,096
128×128	16,384

Report the following:

Whether the output is correct, you may use python unit testing for this.
How execution time changes with image size
Any limitations encountered when scaling

Public Artifact Repository

You must create a public repository containing your solution. Repository must include at least following components:

Source Code
Testbench
Input Files
README.md

Interview Expectation

During the interview, you will be expected to:

Walk through your implementation
Explain how pixels are mapped to tiny-gpu threads
Explain your design decisions
Discuss trade-offs (e.g., image size, latency, memory access, system limitations)
Demonstrate your simulation setup

This task is intended as a concise illustration of the core skills expected for this position. It is not designed to have a single “correct” solution. Instead, you are encouraged to explore different design approaches, experiment with trade-offs, and justify the decisions you make.

Focus on demonstrating clarity of thought, sound engineering judgment, and the ability to connect a simple real-world computation to practical hardware execution.

Good luck and happy coding/designing!