AI ENGINEER · ISLAMABAD · UTC+5

Abdul Rehman Baber — AI Engineer

AI and full-stack engineer with about four years building and operating production LLM systems. My lane is the reliability side — agents and MCP, including proactive, tool-using assistants on OpenClaw and Hermes — plus RAG, and the evals, observability and cost work that keep those systems honest, with a sharper edge in AI-search visibility (GEO).

Ask my AI twinView work

Open to remote roles & contracts·No visa sponsorship required·Overlaps EU hours + US-Eastern mornings

01 —

What I do

three things, in plain terms

01 · reliability

I make AI systems reliable

The unglamorous layer most AI demos skip — evaluation, monitoring, and retrieval that is allowed to answer “I don’t know” instead of guessing. My open-source checker grades its own AI-reading against human labels and refuses to ship if it slips below the bar (right now it agrees about 98.5% of the time).

02 · agents

I build assistants that take action

Not chatbots — assistants that use real tools, remember context, run on a schedule, and message you on WhatsApp or Telegram. One is a fully offline voice assistant; another ran unattended overnight and collected 6,277 AI answers at under a 0.1% error rate.

03 · ai-visibility

I measure how brands show up inside AI answers

When someone asks ChatGPT, Perplexity, or Google’s AI “what are the best tools for X?”, a handful of brands get named and cited — and most companies have no idea whether they are one of them. I measure that (it’s called GEO) and move the numbers. The specialty most engineers don’t have yet.

02 —

Selected work

one standout per domain — the rest is in the map

live

AI-search visibility (GEO)

geocheck

An AI-search visibility checker that is both a CLI and an MCP server, shipped with its own eval harness — the extractor is graded against human labels behind a CI gate. Public and runnable offline with no API key.

GEOMCPEVALSPYTHON

View repo →

Agentic assistants

OpenClaw proactive assistant

A self-hosted assistant that takes action: a fully offline WhatsApp voice loop (own STT + TTS, zero cloud speech) plus scheduled jobs that run unattended — one collected 6,277 AI answers overnight at under a 0.1% error rate.

AGENTSOPENCLAWVOICECRON

live

Full-stack product

MapleScholar

A live, nonprofit AI research-discovery PWA — resolve a paper across 7 scholarly APIs, chat with it in 70+ languages, read it as clean HTML. Sole primary engineer, end to end.

NEXT.JSRAGSUPABASEPWA

Visit →

Reliability / LLM-ops

The bug that made our alerts lie

On a production AI-visibility platform I operate, error alerting had been silently broken for months — a log-prefix bug broke level extraction, so “no errors” just meant we had stopped seeing them. Found by querying the log store directly; fixed with one pipeline stage.

OBSERVABILITYSREGRAFANALOKI

See all work

03 —

Stack

what I run in production

Languages

PythonTypeScriptJavaScriptSQL

AI & LLM

OpenAIClaudeGeminiPerplexityLangGraphLangChainMCPRAGOpenClawHermes

Web & full-stack

Next.jsReactNestJSFastAPINode.jsTailwind

Data & infra

PostgreSQLpgvectorRedisElasticsearchNeo4jTemporalDocker

Cloud & deploy

AWSGoogle CloudVercelRailway

Observability & ops

GrafanaLokiOpenTelemetryStripePlaywright

04 —

Writing

notes from the lab

Read all writing

05 —

The map

the full body of work, connected

The five above are highlights across different domains. The rest — the GEO scraper engine, remote-desktop infra, an ATS résumé builder, a clinical decision-support PWA, multi-provider research and document-AI pipelines, and more — lives in an interactive knowledge map that wires every project to the tech, capabilities, and writing behind it.

An interactive knowledge map

Every project wired to the tech, capabilities, frameworks, and writing behind it.

97 nodes · 289 connections · 26 projects

Explore the full map

06 —

Ask my AI twin

the fastest way to vet me

A live agent grounded in the full corpus of my projects, incidents, and decisions. Pick a question below or type your own — it answers only from my record, and says so when it doesn't know.

loading the twin…