point cloud · sample_office_scene.ply · 120k pts

This is a subsampled preview (120k of ~2M points). Watch the full reconstruction video ↗, or browse the full playlist ↗.

diwakar@robotics — zsh click to restore ↑

$ whoami

Hi, I'm Diwakar Ravichandran

Robotics Perception & GPU Inference Engineer

M.S. in Robotics from UC Riverside, working at the intersection of SLAM, 3D reconstruction, and GPU systems. I take research ideas all the way to shipped, production code — from drone-based 3D reconstruction to custom CUDA kernels for real-time perception and LLM inference.

View My Work Download CV

02 — about

About Me

I graduated with a Master's degree in Robotics from the University of California, Riverside, where I specialized in computer vision, SLAM (Simultaneous Localization and Mapping), and autonomous systems. My thesis, Celesta, is a fully differentiable optimization framework that integrates distributed bundle adjustment with Leiden-based graph partitioning for scalable, GPU-accelerated visual SLAM using NVIDIA Thrust.

Previously, I worked as a Data Scientist at Jio Platforms Ltd., where I developed and shipped computer vision solutions for drone-based tower reconstruction using SLAM. I contributed to video analytics for surveillance and visual document understanding, delivering production-ready pipelines from prototype to deployment.

Lately I've gone deep on the GPU layer underneath perception and ML systems — writing custom CUDA kernels for LLM inference (fused attention, KV-cache compression, quantized GEMM on Llama 3.1 8B) and profiling every optimization with Nsight Compute. My toolkit spans C++, Python, CUDA, and ROS across NVIDIA platforms from datacenter GPUs to Jetson edge.

I care about writing maintainable, performant code and bringing research ideas into deployed systems — and I'm always chasing the next hard problem in robotics, perception, and GPU computing.

M.S.

Robotics Degree

5+

Years Experience

10+

Research Projects

03 — projects

Featured Projects

Fused attention kernel: N×N score matrix, online softmax, KV-cache compression, W4A16 matmul

LLM Inference Kernels

Custom CUDA kernels for LLM inference — fused FlashAttention-style attention, INT4 KV-cache compression, and W4A16 quantized matmul on Llama 3.1 8B. 1.91× over PyTorch SDPA, 6.97× over fp16 cuBLAS, −51% peak VRAM end-to-end, with every step attributed to a specific Nsight Compute metric.

CUDA LLM Nsight

Dense 3D Reconstruction Engine

A from-scratch dense 3D reconstruction engine built from image collections — a direct continuation of the production reconstruction work I shipped at Jio, with the same co-author. The engine is private; the public Open3D visualizer (linked) walks through reconstructed scenes including a ~4.2M-point capture of Yatra Garden.

Python 3D

Celesta

Dockerized demo of Celesta — distributed, GPU-accelerated bundle adjustment built on DABA with Leiden graph partitioning for better load balancing across GPUs. Validated on BAL "Ladybug" (1,723 cameras, 678K measurements); custom CUDA kernels with NCCL + MPI sync. My M.S. thesis.

CUDA Optimization Thrust Docker

Bundle Adjustment

GPU-accelerated bundle adjustment for structure-from-motion and SLAM — a CUDA-parallelized nonlinear least-squares solver hitting 10× over CPU on the Washington BAL dataset (RTX 4090). The direct predecessor to Celesta.

CUDA C++ SLAM

GNSS/INS EKF estimated trajectory vs ground truth on KITTI

GNSS / INS Extended Kalman Filter

A loosely-coupled 15-state Extended Kalman Filter fusing GNSS position fixes with inertial measurements on the KITTI raw dataset. Position RMS 2.26 m fused vs 1693 m IMU-only dead-reckoning, with a GPS-dropout demo showing graceful degradation and recovery. Implemented in NumPy/SciPy.

Python EKF Sensor Fusion KITTI

Photo evaluation agent pipeline: facial analysis, aesthetic scoring, CLIP/BLIP, DINOv2+FAISS, DETR/ViT

Photo Evaluation Vision Agent

A computer-vision agent that analyzes photos end-to-end — facial analysis, aesthetic scoring, CLIP/BLIP semantic tagging, DINOv2 + FAISS similarity search, and DETR/ViT scene understanding.

Python CLIP DINOv2 FAISS

Collaborative V2V Communication System

Vehicle-to-vehicle communication systems for intelligent collaborative driving. Autonomous vehicles in CARLA simulation with multi-agent coordination.

Python CARLA Autonomous Driving

Vision Transformer: image patches to token sequence with CLS and positional encoding into an N-layer encoder

MNIST-Former

A compact Vision Transformer trained on Fashion-MNIST, with training and inference separated from visualization and optional Nsight profiling hooks for GPU timeline analysis.

PyTorch Transformer ViT

Semicon Bayesian Drift

Bayesian drift modeling and analysis for semiconductor applications. Jupyter-based workflows for inference and visualization.

Python Jupyter Bayesian

MNIST with C++ and CUDA

A fun project implementing a simple neural network in CUDA for MNIST digit classification, written in C++ with GPU acceleration.

C++ CUDA ML

Two-link robot arm with joint angles theta-1 and theta-2 reaching a target along a planned path

Foundations of Robotics

A ROS playground for forward/inverse kinematics, open- and closed-loop control, and 2D path planning, visualized in Gazebo. From UC Riverside's EE283A (Foundations of Robotics).

ROS Gazebo Control

Hi, I'm Diwakar Ravichandran

About Me

M.S.

5+

10+

Featured Projects

LLM Inference Kernels

Dense 3D Reconstruction Engine

Celesta

Bundle Adjustment

GNSS / INS Extended Kalman Filter

Photo Evaluation Vision Agent

Collaborative V2V Communication System

MNIST-Former

Semicon Bayesian Drift

MNIST with C++ and CUDA

Foundations of Robotics

Research

Celesta

Writing

Foundations — A First-Principles Field Guide to GPU Kernel Optimization

Six Iterations to a Faster Attention Kernel — Two Were Reverts

Skills & Technologies

Programming & Robotics

Computer Vision & ML

Tools & Frameworks

Shipping & Deployment

Get In Touch

Let's build something together.