Things I’m building
PCCX repositories now live under the organization; personal baselines stay here.
Start here for the active NPU architecture, FPGA implementation, and verification tooling.
- pccxactive
A parallel compute core executor for edge FPGAs: custom ISA, INT8 systolic array, runtime queues, and a Python-facing driver stack.
why it matters · It lets me study edge LLM inference behavior: memory movement, kernel shape, and driver overhead rather than MAC count alone.
- pccx-labactive
Visual performance profiler and pre-RTL simulator for the pccx NPU.
why it matters · Hardware needs good software tooling to be debuggable. This bridges the gap between Verilog waveforms and high-level execution graphs.
- llm-bottleneck-labactive
A compact LLM serving/reference stack with Python runtime pieces, C++ kernels, and KV-cache experiments.
why it matters · It gives me a software baseline before moving an optimization down into FPGA kernels.
- pccx-FPGA-NPU-LLM-kv260wip
Bare-metal FPGA implementation of the pccx NPU for LLM inference on AMD Kria KV260.
why it matters · It pushed me from "model acceleration" into memory hierarchy, scheduling, and runtime design.
- driver-drowsiness-detectionarchived
An undergraduate latency-focused computer vision project using facial landmarks and a small model.
why it matters · It was the first project that made me care more about end-to-end latency than benchmark accuracy.