<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: NaveenKumar Namachivayam ⚡</title>
    <description>The latest articles on DEV Community by NaveenKumar Namachivayam ⚡ (@qainsights).</description>
    <link>https://dev.to/qainsights</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F159517%2F43f7f907-501b-44e8-b748-e740fa80c07e.jpg</url>
      <title>DEV Community: NaveenKumar Namachivayam ⚡</title>
      <link>https://dev.to/qainsights</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/qainsights"/>
    <language>en</language>
    <item>
      <title>Codex CLI vs Claude Code: A Deep-Dive Command Comparison</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Wed, 24 Jun 2026 14:50:03 +0000</pubDate>
      <link>https://dev.to/qainsights/codex-cli-vs-claude-code-a-deep-dive-command-comparison-1i12</link>
      <guid>https://dev.to/qainsights/codex-cli-vs-claude-code-a-deep-dive-command-comparison-1i12</guid>
      <description>&lt;p&gt;In this blog post, we will see how the two most talked-about AI coding CLIs, OpenAI's Codex CLI and Anthropic's Claude Code, stack up command by command. Not just the headline features, but the small wins, the gaps, the uncommon flags, and the places where one clearly pulls ahead. Everything here is sourced directly from the official docs.&lt;/p&gt;


&lt;p&gt;&lt;strong&gt;Codex CLI vs Claude Code CLI commands: what's the difference?&lt;/strong&gt;Both are agentic terminal coding tools, but their command surfaces diverge significantly.If you need deep CI integration and multi-agent pipelines, choose Claude Code. If you need local models or a richer TUI experience, choose Codex CLI.&lt;/p&gt;


&lt;h2&gt;Quick Context&lt;/h2&gt;

&lt;p&gt;Both tools are agentic coding CLIs that live in your terminal. They read codebases, edit files, run shell commands, and talk to external services over MCP. The underlying models are different (Claude for Anthropic, GPT-family for OpenAI), but architecturally they are solving the same problem.&lt;/p&gt;

&lt;p&gt;I have been using Claude Code daily as part of my performance engineering work and plugin development. I recently started exploring Codex CLI seriously after OpenAI formalized its docs under developers.openai.com. This post is the comparison I wish I had when I started.&lt;/p&gt;





&lt;h2&gt;Installation at a Glance&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Code:&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;npm install -g @anthropic-ai/claude-code
claude auth login
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Codex CLI:&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;# macOS / Linux
curl -fsSL https://chatgpt.com/codex/install.sh | sh

# Or via npm
npm i -g @openai/codex

codex login
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Both need Node.js. Claude Code requires an Anthropic account (Claude subscription or API key). Codex CLI authenticates via ChatGPT OAuth or an OpenAI API key.&lt;/p&gt;





&lt;h2&gt;Core Commands Side by Side&lt;/h2&gt;

&lt;p&gt;Here are the foundational commands every developer uses daily.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;th&gt;Codex CLI&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Start interactive session&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Start with initial prompt&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude "explain this project"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex "explain this project"&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Non-interactive one-shot&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude -p "query"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;codex exec "query"&lt;/code&gt; (alias: &lt;code&gt;codex e&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pipe content&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cat logs.txt | claude -p "explain"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex exec - &amp;lt; logs.txt&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continue last session&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude -c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex resume --last&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resume by name/ID&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude -r "auth-refactor" "query"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex resume &amp;lt;SESSION_ID&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Update CLI&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude update&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex update&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth login&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude auth login&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex login&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth logout&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude auth logout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex logout&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth status&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude auth status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex login status&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Configure MCP&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude mcp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex mcp&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manage plugins&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude plugin&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex plugin marketplace&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fork a session&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude --fork-session --resume &amp;lt;id&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex fork&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both tools have non-interactive modes perfect for CI pipelines. Claude Code uses &lt;code&gt;-p&lt;/code&gt; (print mode). Codex CLI uses &lt;code&gt;exec&lt;/code&gt; as a proper subcommand with its own flag surface.&lt;/p&gt;





&lt;h2&gt;Commands Only in Claude Code&lt;/h2&gt;

&lt;p&gt;Claude Code has a significantly deeper command surface for background agent management. These are commands with no Codex equivalent.&lt;/p&gt;

&lt;h3&gt;Background Agent Management&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Start as a background agent and return to prompt immediately
claude --bg "investigate the flaky test"

# Attach to a background session
claude attach 7c5dcf5d

# See logs from a background session
claude logs 7c5dcf5d

# Stop a background session
claude stop 7c5dcf5d

# Restart a background session (picks up updated binary)
claude respawn 7c5dcf5d

# Remove from the list (transcript stays on disk)
claude rm 7c5dcf5d
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is the biggest functional gap in Codex right now. Claude Code has a full background session supervisor with &lt;code&gt;claude daemon status&lt;/code&gt; and &lt;code&gt;claude daemon stop --any&lt;/code&gt;. You can run multiple agents in parallel, attach and detach, and inspect each session's recent output with &lt;code&gt;claude logs&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;Daemon Management&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Check the background supervisor's state
claude daemon status

# Stop the supervisor (keep workers running to reconnect later)
claude daemon stop --any --keep-workers
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;Project State Management&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Preview what would be deleted
claude project purge ~/work/repo --dry-run

# Delete all local Claude Code state for a project
claude project purge ~/work/repo -y
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This cleans up transcripts, task lists, debug logs, file-edit history, and prompt history. Useful when onboarding a project fresh or cleaning up stale state.&lt;/p&gt;

&lt;h3&gt;Ultrareview&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Run ultrareview on a PR non-interactively
claude ultrareview 1234 --json
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Codex does have a &lt;code&gt;/review&lt;/code&gt; slash command inside sessions, but &lt;code&gt;claude ultrareview&lt;/code&gt; is a standalone CI-friendly command that exits with 0 on success and 1 on failure.&lt;/p&gt;

&lt;h3&gt;Remote Control&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Start a remote control server so you can control Claude Code from claude.ai
claude remote-control --name "My Project"

# Or start an interactive session with remote control enabled
claude --remote-control "My Project"

# Resume a web session in your local terminal
claude --teleport
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is a genuinely unique capability. You start a session locally, expose it over Remote Control, and then control it from claude.ai or the mobile Claude app. No Codex equivalent exists.&lt;/p&gt;

&lt;h3&gt;Long-Lived Token for CI&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Generate a long-lived OAuth token for CI pipelines
claude setup-token
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Codex uses a different CI flow (piping API key via stdin with &lt;code&gt;codex login --with-api-key&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;Install Specific Version&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;claude install 2.1.118
claude install stable
claude install latest
&lt;/code&gt;&lt;/pre&gt;





&lt;h2&gt;Commands Only in Codex CLI&lt;/h2&gt;

&lt;h3&gt;Cloud Task Management&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Browse cloud tasks from the terminal
codex cloud

# Submit a cloud task directly
codex cloud exec --env ENV_ID "fix the auth bug"

# List recent tasks with JSON output
codex cloud list --json --limit 10

# Apply a cloud task diff to your local working tree
codex apply TASK_ID
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is Codex's hybrid cloud-plus-local model. You can kick off tasks in the Codex cloud environment and then &lt;code&gt;codex apply&lt;/code&gt; their diffs locally. Claude Code has remote web sessions but not this apply-a-cloud-diff pattern.&lt;/p&gt;

&lt;h3&gt;Sandbox Helper&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Run a command inside Codex's sandboxing layer (macOS Seatbelt)
codex sandbox --permissions-profile my-profile -- pytest tests/

# Log sandbox denials for debugging
codex sandbox --log-denials -- npm test
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can test what commands Codex allows or denies before committing a config. Very useful for security-conscious teams.&lt;/p&gt;

&lt;h3&gt;Exec Policy Testing&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Check whether a command would be allowed, prompted, or blocked
codex execpolicy --rules ~/.codex/rules/my-policy.rules --pretty -- git push

# Validate rules before saving them
codex execpolicy -r policy.rules -r another.rules -- rm -rf /tmp/junk
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is a preview feature that lets you unit-test your execution policy files. Nothing like this exists in Claude Code.&lt;/p&gt;

&lt;h3&gt;Shell Completion Scripts&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Generate completions for Zsh
codex completion zsh &amp;gt; "${fpath[1]}/_codex"

# Generate for Bash, Fish, PowerShell, Elvish
codex completion bash
codex completion fish
codex completion power-shell
codex completion elvish
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Claude Code does not have a &lt;code&gt;completion&lt;/code&gt; command. You get whatever your shell discovers from the binary.&lt;/p&gt;

&lt;h3&gt;Feature Flag Management&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# List all feature flags with maturity and current state
codex features list

# Persistently enable a feature
codex features enable subagents

# Persistently disable a feature
codex features disable experimental-network
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Claude Code exposes betas via &lt;code&gt;--betas&lt;/code&gt; but does not have a persistent feature flag manager as a first-class CLI command.&lt;/p&gt;

&lt;h3&gt;Debug Model Catalog&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Print the raw model catalog Codex sees
codex debug models

# Show only the bundled catalog (no remote refresh)
codex debug models --bundled
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Useful when troubleshooting model availability or provider routing issues.&lt;/p&gt;

&lt;h3&gt;Run Codex as an MCP Server&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Expose Codex itself as an MCP tool for other agents to consume
codex mcp-server
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is a powerful composition pattern. Another agentic tool (including Claude Code) can talk to Codex over MCP. I have not seen Claude Code offer an equivalent &lt;code&gt;claude mcp-server&lt;/code&gt; command.&lt;/p&gt;

&lt;h3&gt;Launch Desktop App from CLI&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Open Codex Desktop app, pointing at a workspace
codex app ~/work/my-project
&lt;/code&gt;&lt;/pre&gt;





&lt;h2&gt;Flags Compared&lt;/h2&gt;

&lt;h3&gt;Shared Flags (Different Names)&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;th&gt;Codex CLI&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model selection&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--model claude-sonnet-4-6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;--model gpt-4.1&lt;/code&gt; / &lt;code&gt;-m gpt-5.4&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extra directories&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--add-dir ../lib&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--add-dir ../lib&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Non-interactive&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;--print&lt;/code&gt; / &lt;code&gt;-p&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;--json&lt;/code&gt; on &lt;code&gt;exec&lt;/code&gt; subcommand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skip permissions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--dangerously-skip-permissions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;--dangerously-bypass-approvals-and-sandbox&lt;/code&gt; / &lt;code&gt;--yolo&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output format&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--output-format json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;--json&lt;/code&gt; on exec&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;Flags Only in Claude Code&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Run with minimal setup (no hooks, skills, MCP, CLAUDE.md)
claude --bare -p "query"

# Set reasoning effort
claude --effort high
claude --effort max

# Append to system prompt without replacing it
claude --append-system-prompt "Always use TypeScript"
claude --append-system-prompt-file ./style-rules.txt

# Validated JSON output matching a JSON Schema
claude -p --json-schema '{"type":"object"}' "query"

# Budget cap for API-billed sessions
claude -p --max-budget-usd 5.00 "query"

# Limit agentic turns
claude -p --max-turns 3 "query"

# Auto-connect to IDE on startup
claude --ide

# Spin up an isolated git worktree
claude -w feature-auth
claude -w feature-auth --tmux

# Resume from a PR number
claude --from-pr 123

# Select a fallback model chain
claude --fallback-model sonnet,haiku

# Screen reader accessible output
claude --ax-screen-reader

# Improve prompt cache reuse across CI runs
claude -p --exclude-dynamic-system-prompt-sections "query"

# Set session display name
claude -n "my-feature-work"

# Load plugin for session only
claude --plugin-dir ./my-plugin
claude --plugin-url https://example.com/plugin.zip

# Disable all slash commands and skills
claude --disable-slash-commands

# Start in safe mode (all customizations disabled)
claude --safe-mode

# Define subagents inline
claude --agents '{"reviewer":{"description":"Reviews code","prompt":"You are a code reviewer"}}'

# Enable advisor tool with a specific model
claude --advisor opus

# Start as a background agent immediately
claude --bg "investigate the flaky test"

# Run a shell command as a PTY-backed background job
claude --bg --exec 'pytest -x'

# Teammate display mode
claude --teammate-mode tmux

# Permission mode
claude --permission-mode plan
claude --permission-mode auto
claude --permission-mode acceptEdits
claude --permission-mode bypassPermissions

# Chrome browser integration
claude --chrome
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;Flags Only in Codex CLI&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Use local Ollama model
codex --oss

# Switch approval behavior
codex --ask-for-approval on-request

# Attach images to the initial prompt
codex --image screenshot.png "why is this broken?"
codex -i wireframe.png,design.png "implement this"

# Load a named config profile
codex --profile ci

# Enable live web search
codex --search

# Select sandbox policy
codex --sandbox workspace-write

# Connect TUI to a remote app-server
codex --remote ws://192.168.1.10:8080

# Set working directory for the agent
codex --cd /path/to/project "run tests"

# Override a config value inline
codex -c model=gpt-4.1 "query"
codex -c features.subagents=true "query"

# Disable alternate TUI screen
codex --no-alt-screen
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;--oss&lt;/code&gt; flag is a genuine differentiator. Codex CLI supports pointing at a local Ollama instance for offline or privacy-sensitive work. Claude Code does not have this.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;--image&lt;/code&gt; / &lt;code&gt;-i&lt;/code&gt; flag at the global level is very ergonomic. In Claude Code, you can reference images inside sessions, but it is not a global flag on the CLI launch itself.&lt;/p&gt;





&lt;h2&gt;Slash Commands Face-off&lt;/h2&gt;

&lt;p&gt;Both tools have in-session slash commands. Here is how the key ones map.&lt;/p&gt;

&lt;h3&gt;Present in Both (Similar Purpose)&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;th&gt;Codex CLI&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model switching&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/model&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/model&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compact context&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/compact&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/compact&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New conversation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/new&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/new&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resume session&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/resume&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/resume&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fork conversation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/fork&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/fork&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/quit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;/quit&lt;/code&gt;, &lt;code&gt;/exit&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Init project file&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;/init&lt;/code&gt; (CLAUDE.md)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;/init&lt;/code&gt; (AGENTS.md)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP tools&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/mcp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/mcp&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session status&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;/context&lt;/code&gt;, &lt;code&gt;/cost&lt;/code&gt;, &lt;code&gt;/stats&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/status&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;Slash Commands Only in Claude Code&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;/compact "Focus on the auth module and current test failures"
/output-style Explanatory
/output-style my-custom-style
/insights           # compiles past month of usage into an HTML report
/add-dir ../lib     # add working directory mid-session
/rename             # rename the current session
/export             # export conversation as plain text
/terminal-setup     # activate keyboard shortcuts for your terminal
/cost               # how much have I spent? (API users)
/stats              # how much have I used? (Pro/Max users)
/extra-usage        # configure what happens when you hit rate limit
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;/insights&lt;/code&gt; command is genuinely impressive. It reads your last month of usage history and compiles it into a detailed HTML report. I have not found anything like it in Codex.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;/output-style&lt;/code&gt; system lets you define named styles in &lt;code&gt;.claude/commands/&lt;/code&gt; and switch between them. This is a powerful content-shaping tool for teams.&lt;/p&gt;

&lt;h3&gt;Slash Commands Only in Codex CLI&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;/goal "Finish the migration and keep tests green"  # set a persistent task goal
/goal pause        # pause the goal tracking
/goal resume       # resume it
/personality pragmatic  # set communication style (friendly/pragmatic/none)
/fast on           # toggle Fast service tier
/plan              # switch to plan mode
/side "is there an obvious risk here?"  # start an ephemeral side conversation
/btw "quick thought"   # alias for /side
/approve           # approve a denied auto-review action and retry
/memories          # configure memory injection and generation
/skills            # browse and use skills
/apps              # browse connectors and insert into prompt
/plugins           # browse installed/discoverable plugins
/hooks             # view and manage lifecycle hooks
/archive           # archive session and exit
/delete            # permanently delete session and exit
/copy              # copy latest response to clipboard (Ctrl+O also works)
/diff              # show Git diff including untracked files
/experimental      # toggle experimental features persistently
/vim               # toggle Vim mode for the composer
/keymap            # remap TUI keyboard shortcuts
/raw               # toggle raw scrollback mode
/review            # ask Codex to review your working tree
/ps                # show background terminals and recent output
/stop              # stop all background terminals
/debug-config      # print config layer diagnostics
/statusline        # configure TUI footer items interactively
/title             # configure terminal window/tab title items
/theme             # choose a syntax-highlighting theme
/permissions       # adjust approval policy mid-session
/ide               # pull IDE context (open files, selection) into prompt
/usage daily       # show daily token usage
/usage weekly
/usage cumulative
/feedback          # send diagnostics to OpenAI
/import            # import Claude Code setup into Codex
/sandbox-add-read-dir C:\path  # grant sandbox read access (Windows only)
/agent             # switch active agent thread
/goal              # persistent task goal tracking
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;/side&lt;/code&gt; command is something I wish Claude Code had. You can start an ephemeral side conversation to ask a quick focused question without polluting the main thread's transcript. You type &lt;code&gt;/side "check if this plan has an obvious flaw"&lt;/code&gt;, get your answer, and return to the main task. Brilliant.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;/goal&lt;/code&gt; command gives Codex persistent objective tracking during long-running tasks. You set a goal and the agent keeps it in view across multiple turns.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;/personality&lt;/code&gt; command lets you shift Codex's communication style between &lt;code&gt;friendly&lt;/code&gt;, &lt;code&gt;pragmatic&lt;/code&gt;, and &lt;code&gt;none&lt;/code&gt; without changing your instructions. Small win, but very practical when switching between debugging and documentation tasks.&lt;/p&gt;





&lt;h2&gt;Uncommon Commands Worth Knowing&lt;/h2&gt;

&lt;p&gt;These are the commands that most people miss but deliver real value once you discover them.&lt;/p&gt;

&lt;h3&gt;Claude Code: &lt;code&gt;--exclude-dynamic-system-prompt-sections&lt;/code&gt;
&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;claude -p --exclude-dynamic-system-prompt-sections "run the test suite"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This moves per-machine dynamic sections (working directory, environment info, memory paths) into the first user message instead of the system prompt. The result is better prompt cache reuse across different users and machines running the same task. Essential for teams running Claude Code in shared CI environments.&lt;/p&gt;

&lt;h3&gt;Claude Code: &lt;code&gt;--bare&lt;/code&gt;
&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;claude --bare -p "explain this function"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Skips auto-discovery of hooks, skills, plugins, MCP servers, auto memory, and CLAUDE.md. Sessions start significantly faster. Useful for quick scripted calls where you do not need any project configuration.&lt;/p&gt;

&lt;h3&gt;Claude Code: &lt;code&gt;--from-pr&lt;/code&gt;
&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;claude --from-pr 123
claude --from-pr https://github.com/owner/repo/pull/123
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Resumes sessions linked to a specific pull request. Sessions get linked automatically when Claude creates the PR. Supports GitHub, GitHub Enterprise, GitLab, and Bitbucket URLs.&lt;/p&gt;

&lt;h3&gt;Claude Code: &lt;code&gt;--fallback-model&lt;/code&gt;
&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;claude --fallback-model sonnet,haiku -p "query"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Automatic fallback when the primary model is overloaded or unavailable. Accepts a comma-separated list tried in order. You can persist a chain via the &lt;code&gt;fallbackModel&lt;/code&gt; setting.&lt;/p&gt;

&lt;h3&gt;Codex CLI: &lt;code&gt;codex execpolicy&lt;/code&gt;
&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;codex execpolicy --rules ~/.codex/rules/production.rules --pretty -- git push origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A policy dry-run. You pass your &lt;code&gt;.rules&lt;/code&gt; files and a command, and Codex tells you whether it would allow, prompt, or block that command. This is a fantastic tool for validating security policy before deploying to CI.&lt;/p&gt;

&lt;h3&gt;Codex CLI: &lt;code&gt;--oss&lt;/code&gt;
&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;codex --oss "refactor this module"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Points Codex at a locally running Ollama instance. No API calls, no data leaving your machine. Validates that Ollama is running before starting.&lt;/p&gt;

&lt;h3&gt;Codex CLI: &lt;code&gt;codex apply&lt;/code&gt;
&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;codex apply TASK_ID
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Applies the latest diff from a Codex cloud task to your local working tree. The workflow is: run a task in the cloud environment, review the result on the web, then pull the diff locally with one command. Performance engineers who run long test analysis tasks in cloud environments will appreciate this.&lt;/p&gt;

&lt;h3&gt;Codex CLI: &lt;code&gt;/side&lt;/code&gt; and &lt;code&gt;/btw&lt;/code&gt;
&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;/side "does this API response shape match our schema?"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;An ephemeral fork of the current conversation. The side thread has its own transcript. The parent thread's status stays visible in the TUI while you are in side mode. Type your quick question, get the answer, return. This is a quality-of-life feature I would happily see in Claude Code.&lt;/p&gt;

&lt;h3&gt;Claude Code: &lt;code&gt;claude ultrareview&lt;/code&gt;
&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;claude ultrareview 1234 --json --timeout 60
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Runs a deep code review non-interactively. Prints findings to stdout and exits 0 on success or 1 on failure. Pipe it into your CI gate.&lt;/p&gt;





&lt;h2&gt;Small Wins: Category by Category&lt;/h2&gt;

&lt;h3&gt;Session Management: Claude Code Wins&lt;/h3&gt;

&lt;p&gt;Claude Code has a richer session management surface. Background agents with a daemon supervisor, &lt;code&gt;claude logs&lt;/code&gt;, &lt;code&gt;claude attach&lt;/code&gt;, &lt;code&gt;claude respawn&lt;/code&gt;, &lt;code&gt;claude rm&lt;/code&gt;, and the &lt;code&gt;claude agents&lt;/code&gt; view for monitoring and dispatching parallel sessions. Codex has &lt;code&gt;codex resume&lt;/code&gt; and &lt;code&gt;codex fork&lt;/code&gt;, which cover the basics but stop there.&lt;/p&gt;

&lt;h3&gt;Sandbox Control: Codex CLI Wins&lt;/h3&gt;

&lt;p&gt;Codex's sandbox story is more explicit. You choose &lt;code&gt;read-only&lt;/code&gt;, &lt;code&gt;workspace-write&lt;/code&gt;, or &lt;code&gt;danger-full-access&lt;/code&gt; at the flag level. The &lt;code&gt;codex sandbox&lt;/code&gt; command lets you run arbitrary commands inside Codex's sandbox layer to test policies. The &lt;code&gt;codex execpolicy&lt;/code&gt; command lets you validate rules before saving them. Claude Code has permission modes (&lt;code&gt;plan&lt;/code&gt;, &lt;code&gt;auto&lt;/code&gt;, &lt;code&gt;acceptEdits&lt;/code&gt;, &lt;code&gt;bypassPermissions&lt;/code&gt;) but does not expose the underlying sandbox policy as a testable surface.&lt;/p&gt;

&lt;h3&gt;Image Input: Codex CLI Small Win&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;codex --image ui-screenshot.png "why is this button misaligned?"
codex -i wireframe.png,mockup.png "implement this layout"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Codex CLI accepts images as a global flag at session launch. You can attach multiple images with a comma-separated list. Claude Code supports image input inside sessions (drag and drop in the TUI or pasting), but &lt;code&gt;--image&lt;/code&gt; is not a CLI launch flag.&lt;/p&gt;

&lt;h3&gt;CI Scripting: Claude Code Wins&lt;/h3&gt;

&lt;p&gt;Claude Code has more CI-specific flags: &lt;code&gt;--max-budget-usd&lt;/code&gt; caps API spend, &lt;code&gt;--max-turns&lt;/code&gt; limits agentic turns, &lt;code&gt;--no-session-persistence&lt;/code&gt; avoids writing to disk, &lt;code&gt;--output-format stream-json&lt;/code&gt; gives structured streaming output, &lt;code&gt;--include-hook-events&lt;/code&gt; and &lt;code&gt;--include-partial-messages&lt;/code&gt; allow fine-grained pipeline observability. The &lt;code&gt;claude setup-token&lt;/code&gt; command generates long-lived OAuth tokens for CI authentication without a browser.&lt;/p&gt;

&lt;p&gt;Codex has &lt;code&gt;codex exec --ephemeral&lt;/code&gt; to skip session persistence and &lt;code&gt;codex exec --output-last-message&lt;/code&gt; to write the final response to a file, which is handy in GitHub Action pipelines.&lt;/p&gt;

&lt;h3&gt;Local Model Support: Codex CLI Wins&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;codex --oss&lt;/code&gt; with Ollama support is a genuine differentiator. If you work in an air-gapped or privacy-sensitive environment, Codex CLI has a path. Claude Code currently has no equivalent.&lt;/p&gt;

&lt;h3&gt;Context Management: Claude Code Wins Slightly&lt;/h3&gt;

&lt;p&gt;Claude Code's &lt;code&gt;/compact&lt;/code&gt; accepts focus instructions:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/compact Focus on the auth module and current test failures
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Codex's &lt;code&gt;/compact&lt;/code&gt; summarizes the conversation without focus parameters. Also, Claude Code's &lt;code&gt;/insights&lt;/code&gt; command compiling a usage HTML report has no Codex equivalent.&lt;/p&gt;

&lt;h3&gt;Plugin Architecture: Codex CLI More Explicit&lt;/h3&gt;

&lt;p&gt;Codex has a proper &lt;code&gt;codex plugin marketplace&lt;/code&gt; command for managing plugin marketplace sources from Git repos or local directories. You can pin refs and use sparse checkouts. Claude Code has &lt;code&gt;claude plugin install&lt;/code&gt; against a marketplace, but the marketplace management surface is thinner at the CLI level.&lt;/p&gt;

&lt;h3&gt;Remote Work: Both Have Unique Angles&lt;/h3&gt;

&lt;p&gt;Claude Code has &lt;code&gt;claude remote-control&lt;/code&gt;, which lets you control a local terminal session from claude.ai or the mobile app. Codex CLI has &lt;code&gt;--remote ws://host:port&lt;/code&gt;, which connects a local TUI to a remote &lt;code&gt;codex app-server&lt;/code&gt;. Different models of remote work, both useful depending on your setup.&lt;/p&gt;





&lt;h2&gt;What Is Missing in Each Tool&lt;/h2&gt;

&lt;h3&gt;Missing in Claude Code&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;codex --oss&lt;/code&gt; style local model support (Ollama)&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;codex completion&lt;/code&gt; for shell completion scripts&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;codex execpolicy&lt;/code&gt; for policy dry-runs&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;codex sandbox&lt;/code&gt; for testing sandbox behavior&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;codex cloud&lt;/code&gt; and &lt;code&gt;codex apply&lt;/code&gt; for cloud task management&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/side&lt;/code&gt; for ephemeral side conversations&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/goal&lt;/code&gt; for persistent task objective tracking&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/personality&lt;/code&gt; for communication style control&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/fast&lt;/code&gt; for service tier switching&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/diff&lt;/code&gt; as a slash command (Claude Code does have git awareness, but not as a quick slash command)&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/keymap&lt;/code&gt; for interactive keyboard remapping&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/theme&lt;/code&gt; for syntax highlighting selection&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/statusline&lt;/code&gt; and &lt;code&gt;/title&lt;/code&gt; for TUI customization&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;--image&lt;/code&gt; as a launch-time CLI flag&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/approve&lt;/code&gt; for retrying auto-review denials&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;codex features&lt;/code&gt; for persistent feature flag management&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;codex debug models&lt;/code&gt; to inspect model catalog&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Missing in Codex CLI&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Background session management at the daemon level (&lt;code&gt;claude daemon&lt;/code&gt;, &lt;code&gt;claude attach&lt;/code&gt;, &lt;code&gt;claude logs&lt;/code&gt;, &lt;code&gt;claude respawn&lt;/code&gt;)&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;claude ultrareview&lt;/code&gt; as a standalone CI command&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;claude remote-control&lt;/code&gt; to control terminal sessions from the web/mobile app&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;claude --teleport&lt;/code&gt; to bring a web session back to the local terminal&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;claude --from-pr&lt;/code&gt; to resume sessions linked to a specific PR&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;claude setup-token&lt;/code&gt; for long-lived CI tokens&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;claude project purge&lt;/code&gt; for clean project state management&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;claude --worktree&lt;/code&gt; and &lt;code&gt;--tmux&lt;/code&gt; for isolated git worktrees&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;claude --advisor&lt;/code&gt; for the server-side advisor tool&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;--effort&lt;/code&gt; levels (low/medium/high/xhigh/max)&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;--fallback-model&lt;/code&gt; chains&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;--bare&lt;/code&gt; for minimal fast-start scripted sessions&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;--exclude-dynamic-system-prompt-sections&lt;/code&gt; for prompt cache optimization&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;--json-schema&lt;/code&gt; for validated structured output&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;--max-budget-usd&lt;/code&gt; for spend caps&lt;/li&gt;



&lt;li&gt;System prompt control flags (&lt;code&gt;--system-prompt&lt;/code&gt;, &lt;code&gt;--system-prompt-file&lt;/code&gt;, &lt;code&gt;--append-system-prompt&lt;/code&gt;, &lt;code&gt;--append-system-prompt-file&lt;/code&gt;)&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/insights&lt;/code&gt; usage history report&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/output-style&lt;/code&gt; named output personas&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/rename&lt;/code&gt; for session naming mid-session&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;/export&lt;/code&gt; conversation to plain text&lt;/li&gt;



&lt;li&gt;Custom commands via &lt;code&gt;.claude/commands/&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;





&lt;p&gt;&lt;strong&gt;What is the difference between Codex CLI and Claude Code?&lt;/strong&gt; &lt;/p&gt;
&lt;p&gt;Both are AI-powered terminal coding agents. Claude Code is built by Anthropic and runs Claude models. Codex CLI is built by OpenAI and runs GPT-family models. They share core interactive and non-interactive modes but differ significantly in background agent management, sandbox control, CI flags, and TUI features.&lt;/p&gt;  &lt;strong&gt;Does Codex CLI support local models?&lt;/strong&gt; &lt;p&gt;Yes. Codex CLI supports local Ollama models via the --oss flag. This runs the agent without any API calls. Claude Code does not have an equivalent local model flag.&lt;/p&gt;  &lt;strong&gt;Does Claude Code support background agents?&lt;/strong&gt; &lt;p&gt;Yes. Claude Code has a full background agent system with claude --bg, claude attach, claude logs, claude respawn, claude rm, and a daemon supervisor managed via claude daemon status and claude daemon stop. Codex CLI does not have an equivalent daemon-managed background session infrastructure.&lt;/p&gt;  &lt;strong&gt;Which CLI is better for CI pipelines?&lt;/strong&gt; &lt;p&gt;Claude Code has more CI-focused flags including --max-budget-usd for spend caps, --max-turns to limit agentic turns, --no-session-persistence, --json-schema for validated structured output, and claude setup-token for long-lived OAuth tokens. Codex CLI offers codex exec with --ephemeral and --output-last-message, and a native GitHub Action.&lt;/p&gt;  &lt;strong&gt;What commands are missing in Codex CLI compared to Claude Code?&lt;/strong&gt; &lt;p&gt;Codex CLI is missing background session management (claude daemon, claude attach, claude logs), claude ultrareview for CI code review, claude remote-control for web and mobile session control, --from-pr to resume sessions linked to a PR, --worktree for isolated git worktrees, --fallback-model chains, --max-budget-usd spend caps, and the /insights slash command.&lt;/p&gt;  &lt;strong&gt;What commands are missing in Claude Code compared to Codex CLI?&lt;/strong&gt; &lt;p&gt;Claude Code is missing local model support via Ollama, codex completion for shell completion scripts, codex execpolicy for sandbox policy dry-runs, the /side slash command for ephemeral side conversations, /goal for persistent task objective tracking, /personality for communication style switching, /theme for syntax highlighting, and --image as a launch-time flag.&lt;/p&gt;  

&lt;h2&gt;My Take&lt;/h2&gt;

&lt;p&gt;Both tools are genuinely capable. My honest observation after going through every documented command:&lt;/p&gt;

&lt;p&gt;Claude Code has a deeper background agent infrastructure. If you are building multi-agent pipelines, running parallel workloads, or need tight CI integration with structured outputs, Claude Code's flag surface and daemon management are hard to beat.&lt;/p&gt;

&lt;p&gt;Codex CLI wins on local model flexibility, sandbox policy control, and the TUI experience. The &lt;code&gt;/side&lt;/code&gt; command, &lt;code&gt;/goal&lt;/code&gt; tracking, and &lt;code&gt;/personality&lt;/code&gt; switching feel like thoughtful UX investments. The &lt;code&gt;codex execpolicy&lt;/code&gt; command for policy dry-runs shows a security-first mindset.&lt;/p&gt;

&lt;p&gt;What I personally want to see: Claude Code adopt &lt;code&gt;--image&lt;/code&gt; as a launch flag and a &lt;code&gt;/side&lt;/code&gt; equivalent. Codex CLI needs a proper background daemon for parallel agents and a &lt;code&gt;--max-budget-usd&lt;/code&gt; style spend cap for CI use.&lt;/p&gt;

&lt;p&gt;Pick your tool based on your model preference first, then your workflow. If you need remote session control or deep CI scripting, lean Claude Code. If you need local model support or prefer a more granular TUI, lean Codex CLI.&lt;/p&gt;

&lt;p&gt;Have you switched between both tools on the same project? I would love to know which commands you reach for first. Drop a comment below.&lt;/p&gt;

&lt;p&gt;Happy Testing!&lt;/p&gt;





</description>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Toy Story: The Open-Source Ecosystem</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Fri, 19 Jun 2026 16:41:20 +0000</pubDate>
      <link>https://dev.to/qainsights/toy-story-the-open-source-ecosystem-24ia</link>
      <guid>https://dev.to/qainsights/toy-story-the-open-source-ecosystem-24ia</guid>
      <description>&lt;p&gt;As schools are off and Toy Story 5 is just around the corner, we started binge-watching Toy Story from 1 to 4. While watching, suddenly this idea popped up: what if a GitHub repo came alive just like the toys? I started writing with something basic and enhanced it using Gemini Flash. Hope you'll like it.&lt;/p&gt;

&lt;h3&gt;&lt;strong&gt;The Setup: The Developer's Stack&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;The "Room" is the ultimate production stack. The classic, dependable tools that every developer loves and relies on.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Woody (&lt;code&gt;python/cpython&lt;/code&gt;)&lt;/strong&gt;: The beloved, classic, highly readable leader of the repo ecosystem. He’s dependable, has been around forever, and is the favorite of the developer. He prides himself on clean architecture and readability.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Rex (&lt;code&gt;apache/jmeter&lt;/code&gt;)&lt;/strong&gt;: A massive, heavy-duty Java performance testing tool. He’s incredibly powerful but constantly anxious that modern, lightweight tools are going to make him look extinct.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Mr. Potato Head (&lt;code&gt;docker/cli&lt;/code&gt;)&lt;/strong&gt;: The ultimate container tool. You can literally swap his volumes, environment variables, and ports around to make him look like whatever you want.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Slinky (&lt;code&gt;lodash/lodash&lt;/code&gt;)&lt;/strong&gt;: The utility tool that just exists to stretch and connect different data structures together smoothly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They all live in harmony on the machine, until a massive update drops...&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-11-693x1024.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-11-693x1024.png" alt="Toy Story: The Open-Source Ecosystem" width="693" height="1024"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;&lt;strong&gt;The Inciting Incident: The Trendy New Framework&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;The developer is starting a massive new enterprise cloud project. Suddenly, a sleek, shiny new arrival lands in the ecosystem with over 100k GitHub stars in its first week.&lt;/p&gt;

&lt;p&gt;Enter &lt;strong&gt;Buzz Lightyear (&lt;code&gt;facebook/react&lt;/code&gt;)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Buzz is high-tech, component-based, and completely delusional. He doesn’t realize he’s just an open-source library running on a local runtime. He genuinely believes he is a &lt;strong&gt;Space Ranger from Vercel deployed to the Edge Network&lt;/strong&gt;. He looks at the backend scripts and declares he will build a Virtual DOM to save the galaxy.&lt;/p&gt;

&lt;p&gt;Woody (&lt;code&gt;cpython&lt;/code&gt;) is furious. &lt;em&gt;"You aren't a full-stack engine! You're a frontend library! You're an npm package!"&lt;/em&gt; But the developer keeps starring &lt;code&gt;react&lt;/code&gt;, opening its issues, and ignoring &lt;code&gt;python&lt;/code&gt; scripts.&lt;/p&gt;

&lt;h3&gt;&lt;strong&gt;The Interlude: Lost in Pizza Planet&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;In a heated argument over package management, Woody accidentally bumps Buzz out of the active IDE workspace. The other repos accuse Woody of a malicious &lt;code&gt;git rm&lt;/code&gt;. Determined to patch things over, Woody chases Buzz out of the environment.&lt;/p&gt;

&lt;p&gt;They end up stranded at &lt;strong&gt;Pizza Planet&lt;/strong&gt; a massive, chaotic public multi-tenant cluster. Hungry for a way back to a developer's machine, Buzz spots a glowing, neon structure: a massive monorepo cluster masquerading as a claw machine game.&lt;/p&gt;

&lt;p&gt;They climb inside, landing in a sea of hundreds of identical, tiny, lightweight &lt;strong&gt;Docker Microcontainers&lt;/strong&gt; (&lt;code&gt;alpine-linux/mini-images&lt;/code&gt;). They sit huddled together in their namespace pods, completely identical, staring upward in wonder.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The Microcontainers:&lt;/strong&gt; &lt;em&gt;(In unison, staring at the cluster orchestrator)&lt;/em&gt; "Oooooooooh... &lt;strong&gt;The OpenClawwww.&lt;/strong&gt;"&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Buzz:&lt;/strong&gt; "Who is in charge here?"&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Microcontainer #1:&lt;/strong&gt; "The OpenClaw! It is an open-source automation engine. It hooks into our webhooks and schedules our lifecycles."&lt;/p&gt;



&lt;p&gt;Suddenly, a heavy, automated crane mechanism descends from the top of the repository cluster.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Microcontainer #2:&lt;/strong&gt; "The OpenClaw moves! It has selected a container!"&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Microcontainer #3:&lt;/strong&gt; "I have been chosen! I am being scheduled to a high-availability EC2 node! Farewell, my friends, I go to a better place... &lt;em&gt;Production!&lt;/em&gt;"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Before Woody and Buzz can escape the cluster, &lt;strong&gt;Sid (&lt;code&gt;malicious-npm-bot&lt;/code&gt;)&lt;/strong&gt; a chaotic script-kiddie developer playing on the cluster drops a malicious token into the machine. The &lt;strong&gt;OpenClaw&lt;/strong&gt; descends, but instead of a container, its mechanical hook snags Woody and Buzz, dropping them right into Sid's dark dependency backpack.&lt;/p&gt;

&lt;h3&gt;&lt;strong&gt;The Climax: The Dark Web of Dependency Hell&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;Sid’s machine is a chaotic nightmare of dependency hell. He takes famous repos, strips their licenses, injects malware, and bundles them into mutated, broken franken-packages. He has strapped a volatile crypto-miner to Buzz, intending to deploy him to an unsecured AWS bucket.&lt;/p&gt;

&lt;p&gt;Woody realizes he can't save the day alone. He rallies Sid’s mutated, broken open-source forks. They break the prime directive of software: &lt;strong&gt;they execute without being called by a command line.&lt;/strong&gt; They glitch out Sid's IDE, spamming his screen with endless &lt;code&gt;Deprecated&lt;/code&gt; warnings and breaking changes until he panics, shuts down his PC, and goes outside.&lt;/p&gt;

&lt;h3&gt;&lt;strong&gt;The Resolution: The Great Git Push&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;Woody and Buzz race back to the developer's main machine, but the developer is in the middle of a massive migration. He is running a script to push his entire workspace to a new cloud organization.&lt;/p&gt;

&lt;p&gt;The migration truck is leaving! Woody and Buzz missed the initial commit. They scramble to find a way into the push. They spot a fast, high-velocity transport stream: &lt;strong&gt;&lt;code&gt;curl&lt;/code&gt; running over a high-speed fiber connection&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;They hitch a ride on a webhook, but the payload is too heavy. Buzz throws Woody ahead into the repository, sacrificing himself to an asynchronous timeout. Woody refuses to lose his friend. He grabs a &lt;code&gt;gzip&lt;/code&gt; compression rocket, ignites it, sweeps down, grabs Buzz, and they soar through the pipeline.&lt;/p&gt;

&lt;p&gt;They don't just land in the repo; they land right at the top of the &lt;strong&gt;&lt;code&gt;main&lt;/code&gt; branch&lt;/strong&gt;, fully compiled and perfectly integrated.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Post-Credits Scene:&lt;/strong&gt;&lt;/p&gt;



&lt;p&gt;&lt;code&gt;cpython&lt;/code&gt; and &lt;code&gt;react&lt;/code&gt; are now happily co-existing in a beautiful Django-React stack. Suddenly, the developer runs an installation command for a new repo that just dropped: &lt;strong&gt;&lt;code&gt;microsoft/autogen&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;



&lt;p&gt;&lt;em&gt;An army of autonomous AI Agents floods the repository.&lt;/em&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Buzz:&lt;/strong&gt; "Woody, look! Multi-agent orchestration!"&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Woody:&lt;/strong&gt; &lt;em&gt;(Gulp)&lt;/em&gt; "Great..."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;pre&gt;&lt;code&gt;THE STORY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY MERGE CONFLICTS, BROKEN DEPENDENCIES, OR EXISTENTIAL CRISES EXPERIENCED BY YOUR LOCAL SCRIPTS AFTER READING. 

Toy Story is © Disney/Pixar. All featured repositories belong to their rightful maintainers.&lt;/code&gt;&lt;/pre&gt;



</description>
      <category>ai</category>
      <category>writing</category>
    </item>
    <item>
      <title>JMeter vs k6 vs Locust in 2026: Which Load Testing Tool Should You Pick?</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Thu, 18 Jun 2026 20:33:01 +0000</pubDate>
      <link>https://dev.to/qainsights/jmeter-vs-k6-vs-locust-in-2026-which-load-testing-tool-should-you-pick-366f</link>
      <guid>https://dev.to/qainsights/jmeter-vs-k6-vs-locust-in-2026-which-load-testing-tool-should-you-pick-366f</guid>
      <description>&lt;p&gt;In this blog post, we will see a detailed, grounded comparison of the three most debated open-source load testing tools in 2026: Apache JMeter, Grafana k6, and Locust. All three are free. All three are production-proven. Yet they could not be more different in philosophy, architecture, and day-to-day experience.&lt;/p&gt;

&lt;p&gt;I have worked with all three across real-world projects, from legacy JDBC-heavy enterprise systems at work to lightweight microservice pipelines I test for my own side projects. The honest truth? There is no universal winner. But there is almost always a right answer for your specific situation, and that is what we will figure out today.&lt;/p&gt;

&lt;h2&gt;Why This Comparison Still Matters in 2026&lt;/h2&gt;

&lt;p&gt;Every year someone writes "JMeter is dead." Every year JMeter ships another release and shows up in another enterprise RFP.&lt;/p&gt;

&lt;p&gt;The market has not consolidated. Instead, it has stratified. k6 owns the developer-experience conversation. Locust owns the Python ecosystem. JMeter owns the protocol breadth and enterprise legacy. And in 2026, all three have meaningful updates worth knowing about before you pick a tool for your next project.&lt;/p&gt;

&lt;p&gt;Let me give you the ground truth, not marketing copy.&lt;/p&gt;





&lt;h2&gt;Quick Stats at a Glance&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Apache JMeter&lt;/th&gt;
&lt;th&gt;Grafana k6&lt;/th&gt;
&lt;th&gt;Locust&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Java (GUI + XML)&lt;/td&gt;
&lt;td&gt;Go runtime, JS/TS scripts&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latest Version&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5.6.3&lt;/td&gt;
&lt;td&gt;2.0.0 (May 2026)&lt;/td&gt;
&lt;td&gt;Latest on PyPI (May 2026)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub Stars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~9.4k&lt;/td&gt;
&lt;td&gt;~30.8k&lt;/td&gt;
&lt;td&gt;~27.9k&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;AGPL-3.0&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Concurrency Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Thread per VU&lt;/td&gt;
&lt;td&gt;Go goroutine per VU&lt;/td&gt;
&lt;td&gt;gevent greenlet per VU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Protocol Breadth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Excellent (HTTP, JDBC, JMS, LDAP, MQTT, FTP...)&lt;/td&gt;
&lt;td&gt;Good (HTTP, gRPC, WebSocket)&lt;/td&gt;
&lt;td&gt;Good (HTTP, extensible via Python libs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI/CD Fit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GUI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (built-in)&lt;/td&gt;
&lt;td&gt;k6 Studio (separate app)&lt;/td&gt;
&lt;td&gt;Web UI (live stats only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud Option&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;BlazeMeter, OctoPerf&lt;/td&gt;
&lt;td&gt;Grafana Cloud k6&lt;/td&gt;
&lt;td&gt;Self-managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best For&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-protocol, legacy enterprise&lt;/td&gt;
&lt;td&gt;Modern APIs, developer teams&lt;/td&gt;
&lt;td&gt;Python shops, flexible scripting&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;





&lt;h2&gt;Apache JMeter&lt;/h2&gt;

&lt;p&gt;JMeter was first released in 1998. That is not a typo. It turned 27 this year, and it is still actively maintained under the Apache Software Foundation.&lt;/p&gt;

&lt;p&gt;The latest stable release is 5.6.3. It requires Java 17 as the recommended runtime, and the team has already signaled that the next major version will drop Java 8 support entirely.&lt;/p&gt;

&lt;h3&gt;What JMeter Gets Right&lt;/h3&gt;

&lt;p&gt;JMeter's superpower is protocol coverage. Nothing else on this list comes close.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP / HTTPS&lt;/li&gt;



&lt;li&gt;JDBC (database connection testing)&lt;/li&gt;



&lt;li&gt;JMS&lt;/li&gt;



&lt;li&gt;LDAP&lt;/li&gt;



&lt;li&gt;MQTT&lt;/li&gt;



&lt;li&gt;FTP&lt;/li&gt;



&lt;li&gt;TCP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are testing a legacy enterprise system, a mainframe-adjacent API, or a backend that talks over JDBC, JMeter is often the only open-source option that handles it natively.&lt;/p&gt;

&lt;p&gt;The plugin ecosystem also deserves credit. The JMeter Plugins project (Head to &lt;a href="https://jmeter-plugins.org" rel="noopener noreferrer"&gt;https://jmeter-plugins.org&lt;/a&gt;) adds over 60 additional components. I have built and maintain several commercial plugins of my own, and the extensibility is genuinely solid once you understand the architecture.&lt;/p&gt;

&lt;h3&gt;Where JMeter Struggles in 2026&lt;/h3&gt;

&lt;p&gt;The XML-based &lt;code&gt;.jmx&lt;/code&gt; test plan format is the biggest pain point in a modern team. Git diffs on &lt;code&gt;.jmx&lt;/code&gt; files are nearly unreadable. Code review for JMeter scripts is painful. "Load testing as code" with JMeter is possible but requires discipline and tooling that does not come out of the box.&lt;/p&gt;

&lt;p&gt;The thread-per-user concurrency model also means JMeter is resource-hungry at scale. A single machine can generate fewer concurrent users than k6 or Locust on equivalent hardware. For large-scale tests, you need distributed mode or a cloud platform like BlazeMeter, which starts around $149/month for the basic plan.&lt;/p&gt;

&lt;p&gt;The GUI, while powerful, shows its age next to k6 Studio or even Locust's minimal web interface.&lt;/p&gt;

&lt;p&gt;You can check &lt;a href="https://jmeter.ai" rel="noopener noreferrer"&gt;Feather Wand&lt;/a&gt; if you want to infuse AI in your workflow. To measure the speed of LLM, you can check &lt;a href="https://iamspeed.dev" rel="noopener noreferrer"&gt;iamspeed.dev&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;Personal Observation&lt;/h3&gt;

&lt;p&gt;I was using JMeter daily at Salesforce for MuleSoft API performance testing. The GUI is genuinely useful for building complex request chains quickly. But the moment I need to commit a test plan to Git and do a proper review, it becomes painful.&lt;/p&gt;





&lt;h2&gt;Grafana k6&lt;/h2&gt;

&lt;p&gt;k6 is the most talked-about load testing tool in 2026, and the GitHub star count (30.8k at the time of writing) reflects that.&lt;/p&gt;

&lt;p&gt;Two major milestones happened back to back: k6 v1.0 dropped in May 2025 with TypeScript support, native extensibility without custom build pipelines, and SemVer stability guarantees. Then k6 v2.0.0 shipped on May 11, 2026, and it changed the game again.&lt;/p&gt;

&lt;h3&gt;What k6 2.0 Brought&lt;/h3&gt;

&lt;p&gt;The headline feature in k6 2.0 is AI-assisted testing workflows. This is not a gimmick. The release ships four new commands built specifically for agent-friendly development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;k6 x agent&lt;/code&gt;: bootstraps agentic testing workflows inside Claude Code, Codex, Cursor, and other AI coding assistants&lt;/li&gt;



&lt;li&gt;A built-in Model Context Protocol (MCP) server so AI agents can validate and run scripts, inspect results, and iterate without leaving the session&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;k6 x docs&lt;/code&gt;: gives agents and developers CLI access to k6 documentation and examples&lt;/li&gt;



&lt;li&gt;
&lt;code&gt;k6 x explore&lt;/code&gt;: lets agents browse the extension registry from the CLI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is also a new Assertions API, broader Playwright compatibility in the browser module, and a consolidated extension catalog that merges official and community extensions into one place.&lt;/p&gt;

&lt;h3&gt;What k6 Gets Right&lt;/h3&gt;

&lt;p&gt;The scripting experience is genuinely great for developers. You write JavaScript or TypeScript. Your IDE gives you autocomplete. Your CI pipeline runs it as a single binary with no JVM to provision.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 100,
  duration: '30s',
  thresholds: {
    http_req_duration: ['p(95)&amp;lt;500'],
    http_req_failed: ['rate&amp;lt;0.01'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/health');
  check(res, {
    'status is 200': (r) =&amp;gt; r.status === 200,
  });
  sleep(1);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;k6 Studio (v1.13.1) is a desktop GUI with AI-powered auto-correlation. If you record a browser session, k6 Studio detects dynamic values like session tokens and CSRF tokens and generates correlation rules automatically. That is a feature JMeter has had for years via plugins, but k6 Studio does it through AI, without the XML.&lt;/p&gt;

&lt;h3&gt;Where k6 Struggles&lt;/h3&gt;

&lt;p&gt;Protocol coverage is more limited than JMeter. k6 is strong on HTTP, gRPC, and WebSocket. For JDBC, JMS, or LDAP, you are looking at community extensions or custom solutions.&lt;/p&gt;

&lt;p&gt;The AGPL-3.0 license is also worth flagging for commercial use cases. Check with your legal team if you are embedding k6 in a product.&lt;/p&gt;

&lt;h3&gt;Personal Observation&lt;/h3&gt;

&lt;p&gt;I built &lt;a href="https://iamspeed.dev" rel="noopener noreferrer"&gt;iamspeed.dev&lt;/a&gt; (an LLM streaming benchmarker) and used k6 for the load side. The DX was excellent. TypeScript types in the IDE, a clean CLI, and Grafana integration out of the box. For any API-heavy workload where the protocol is HTTP or gRPC, k6 is my first recommendation in 2026.&lt;/p&gt;





&lt;h2&gt;Locust&lt;/h2&gt;

&lt;p&gt;Locust is the load testing tool for Python teams, and the May 2026 PyPI release confirms the project is alive and growing. It now officially supports Python 3.10 through 3.14.&lt;/p&gt;

&lt;h3&gt;What Locust Gets Right&lt;/h3&gt;

&lt;p&gt;Locust's model is simple: write Python classes that describe user behavior, run the tool, watch the web UI. No DSL to learn. No XML. No JVM.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;from locust import HttpUser, task, between

class APIUser(HttpUser):
    wait_time = between(1, 3)

    @task(3)
    def get_products(self):
        self.client.get("/api/products")

    @task(1)
    def get_health(self):
        self.client.get("/health")&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Under the hood, Locust uses gevent greenlets instead of OS threads. This gives it excellent concurrency density. On the same 8 GB machine, Locust can handle roughly 5x more concurrent users than JMeter, according to TestDevLab's 2026 analysis.&lt;/p&gt;

&lt;p&gt;Because test files are plain Python, extending Locust to custom protocols is straightforward. Need to load test a proprietary queue or an LLM inference endpoint? Wrap the Python client library and drop it into a &lt;code&gt;HttpUser&lt;/code&gt; subclass. This is actually something I have done for AI workload benchmarking.&lt;/p&gt;

&lt;p&gt;Distributed testing is built in. You run a master process and any number of worker processes, scale horizontally, and the web UI aggregates everything.&lt;/p&gt;

&lt;h3&gt;Where Locust Struggles&lt;/h3&gt;

&lt;p&gt;The built-in reporting is minimal. The web UI gives you live stats during the run, but there is no built-in HTML report comparable to JMeter's dashboard or Gatling's output. Most teams pipe Locust metrics into Grafana via InfluxDB or Prometheus.&lt;/p&gt;

&lt;p&gt;There is no GUI for building test plans. Everything is code. That is great for developer teams but can be a barrier for non-technical stakeholders.&lt;/p&gt;

&lt;h3&gt;Personal Observation&lt;/h3&gt;

&lt;p&gt;Locust is my go-to tool when I am testing an LLM API or any endpoint where I need complex Python logic in the request flow, like computing HMAC signatures, calling a pre-step to generate tokens, or parsing streaming responses. The pure-Python model gives you the whole ecosystem to work with.&lt;/p&gt;





&lt;h2&gt;Head-to-Head Comparison&lt;/h2&gt;

&lt;h3&gt;Scripting Experience&lt;/h3&gt;

&lt;p&gt;JMeter gives you a GUI that is powerful but dated. Building a test plan with the GUI is fast for HTTP. Building one for gRPC or WebSocket requires plugins and some patience.&lt;/p&gt;

&lt;p&gt;k6 gives you a code editor and a TypeScript-aware test runner. The scripting is clean, the API is well-documented, and the extension ecosystem is growing fast.&lt;/p&gt;

&lt;p&gt;Locust gives you a Python file. Nothing else to install. If your team already writes Python, the onboarding time is near zero.&lt;/p&gt;

&lt;h3&gt;Concurrency Model&lt;/h3&gt;

&lt;p&gt;This is where architecture matters for real.&lt;/p&gt;

&lt;p&gt;JMeter runs one OS thread per virtual user. This is expensive. A mid-range machine typically maxes out around 300-500 concurrent threads before CPU and memory become the bottleneck, not the system under test.&lt;/p&gt;

&lt;p&gt;k6 runs each VU as a Go goroutine. Goroutines are lightweight. k6 can drive thousands of concurrent VUs from a single machine.&lt;/p&gt;

&lt;p&gt;Locust uses gevent greenlets, which are cooperative coroutines. Similar lightweight profile to goroutines. One machine can comfortably simulate thousands of users against an HTTP API.&lt;/p&gt;

&lt;h3&gt;CI/CD Integration&lt;/h3&gt;

&lt;p&gt;k6 wins this category cleanly. A single binary, no JVM, no Python dependency tree. The GitHub Actions integration is a config change. The threshold system lets you fail a pipeline based on p95 response time or error rate directly in the test script.&lt;/p&gt;

&lt;p&gt;Locust integrates well with CI/CD through headless mode (&lt;code&gt;locust --headless&lt;/code&gt;). You can define pass/fail criteria via exit codes and custom listeners.&lt;/p&gt;

&lt;p&gt;JMeter needs more setup: a JVM, a plugin directory, a &lt;code&gt;.jmx&lt;/code&gt; file committed to the repo, and some wrapper scripts to parse the output. It works, but it takes more effort to get right.&lt;/p&gt;

&lt;h3&gt;Reporting&lt;/h3&gt;

&lt;p&gt;JMeter ships a dynamic HTML report with response time graphs, latency percentiles, and error analysis. It is comprehensive out of the box.&lt;/p&gt;

&lt;p&gt;k6 pushes metrics to Grafana natively (local or cloud), and the k6 2.0 summary is significantly improved over previous versions. For cloud runs, the Grafana Cloud k6 dashboard is excellent.&lt;/p&gt;

&lt;p&gt;Locust's built-in report is minimal. Pipe to Grafana via Prometheus or InfluxDB for anything beyond a quick check.&lt;/p&gt;

&lt;h3&gt;Cloud Execution&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;JMeter&lt;/th&gt;
&lt;th&gt;k6&lt;/th&gt;
&lt;th&gt;Locust&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Managed Cloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;BlazeMeter ($149/mo+), OctoPerf&lt;/td&gt;
&lt;td&gt;Grafana Cloud k6&lt;/td&gt;
&lt;td&gt;None (self-managed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual setup&lt;/td&gt;
&lt;td&gt;k6 Operator (official)&lt;/td&gt;
&lt;td&gt;Manual setup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Distributed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Controller + agents via SSH&lt;/td&gt;
&lt;td&gt;k6 cloud run / k6 Operator&lt;/td&gt;
&lt;td&gt;Master + worker processes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;





&lt;h2&gt;The Metric Problem Nobody Talks About&lt;/h2&gt;

&lt;p&gt;This is something I always include when I write about load testing tools, because it catches teams off guard.&lt;/p&gt;

&lt;p&gt;Run the same test against the same endpoint using JMeter and k6, and you will see different response time numbers. Not because one tool is wrong. Because they measure different slices of the request lifecycle.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JMeter starts the clock at the connection and stops when the last byte is received&lt;/li&gt;



&lt;li&gt;k6 breaks response time into granular phases: &lt;code&gt;http_req_connecting&lt;/code&gt;, &lt;code&gt;http_req_tls_handshaking&lt;/code&gt;, &lt;code&gt;http_req_waiting&lt;/code&gt;, &lt;code&gt;http_req_receiving&lt;/code&gt;
&lt;/li&gt;



&lt;li&gt;Locust, using gevent, can report higher response times under certain connection reuse configurations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OctoPerf's comparative study showed up to 15-20% variance in reported response times between tools running identical load against the same target. The practical takeaway: never compare baselines across tools. Establish baselines inside a single tool and track trends there.&lt;/p&gt;





&lt;h2&gt;Which Tool Should You Choose?&lt;/h2&gt;

&lt;p&gt;Use this decision tree:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose JMeter if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are testing JDBC, JMS, LDAP, FTP, or SOAP endpoints&lt;/li&gt;



&lt;li&gt;Your team uses GUI-driven test creation&lt;/li&gt;



&lt;li&gt;You have an existing JMeter investment and plugin ecosystem&lt;/li&gt;



&lt;li&gt;You work in enterprise environments where BlazeMeter or OctoPerf is already licensed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose k6 if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your stack is HTTP, gRPC, or WebSocket&lt;/li&gt;



&lt;li&gt;Your team writes JavaScript or TypeScript&lt;/li&gt;



&lt;li&gt;CI/CD integration is a first-class requirement&lt;/li&gt;



&lt;li&gt;You want AI-assisted test authoring in 2026 (k6 2.0's MCP server is real and it works)&lt;/li&gt;



&lt;li&gt;You want the best DX in the category right now&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Locust if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your team is already Python-first&lt;/li&gt;



&lt;li&gt;You need deep customization of request logic (token generation, streaming parsing, custom protocols)&lt;/li&gt;



&lt;li&gt;You are testing LLM APIs or AI workloads where the request logic is non-trivial&lt;/li&gt;



&lt;li&gt;You want distributed testing without a managed cloud dependency&lt;/li&gt;
&lt;/ul&gt;





&lt;h2&gt;The Hybrid Stack Reality&lt;/h2&gt;

&lt;p&gt;Something the comparison articles rarely say: most mature teams run two tools.&lt;/p&gt;

&lt;p&gt;The practical 2026 default stack looks like one of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;k6 OSS for daily CI checks + Grafana Cloud k6 for quarterly capacity tests&lt;/li&gt;



&lt;li&gt;JMeter locally for protocol-rich scenarios + BlazeMeter for distributed runs&lt;/li&gt;



&lt;li&gt;Locust for API behavioral tests in Python + Prometheus/Grafana for dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I have run exactly this kind of hybrid at QAInsights, using JMeter for the complex correlation scenarios and k6 for the lightweight API regression checks that live in CI. The tools complement each other more than they compete.&lt;/p&gt;





&lt;h2&gt;Final Verdict&lt;/h2&gt;

&lt;p&gt;There is no single best load testing tool in 2026. But there is a best tool for your context.&lt;/p&gt;

&lt;p&gt;If you are starting from scratch on a modern microservices stack, pick k6. The DX is excellent, k6 2.0's AI integration is ahead of everyone else, and the Grafana ecosystem is mature.&lt;/p&gt;

&lt;p&gt;If your Python team needs to write complex behavioral scripts, pick Locust. The gevent-based concurrency is efficient, the code is readable, and the Python ecosystem fills every gap.&lt;/p&gt;

&lt;p&gt;If you are in an enterprise environment testing JDBC, JMS, or anything beyond HTTP, pick JMeter. The protocol breadth is unmatched in open source, and the plugin ecosystem solves problems that other tools have not even attempted.&lt;/p&gt;

&lt;p&gt;What matters most is not which tool you pick. It is that you actually test under load before your users find the bottleneck for you.&lt;/p&gt;

&lt;p&gt;Happy Testing!&lt;/p&gt;

&lt;p&gt;What tool are you using in your current project, and what made you choose it over the alternatives? Drop your answer in the comments below.&lt;/p&gt;

</description>
      <category>resources</category>
      <category>testing</category>
      <category>developers</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Built a Fast.com for LLMs: Introducing iamspeed.dev</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Wed, 17 Jun 2026 16:50:20 +0000</pubDate>
      <link>https://dev.to/qainsights/i-built-a-fastcom-for-llms-introducing-iamspeeddev-54ek</link>
      <guid>https://dev.to/qainsights/i-built-a-fastcom-for-llms-introducing-iamspeeddev-54ek</guid>
      <description>&lt;p&gt;In this blog post, we will see how I built &lt;a href="https://iamspeed.dev/" rel="noopener noreferrer"&gt;iamspeed.dev&lt;/a&gt;, a fast.com-style LLM API speed benchmark tool that measures Time to First Token (TTFT) and tokens-per-second throughput directly in your browser.&lt;/p&gt;

&lt;p&gt;If you have ever stared at a spinning cursor waiting for an LLM response and wondered "is this slow, or is it just me?" this tool is for you.&lt;/p&gt;

&lt;p&gt;The tool is designed for quick, lightweight benchmarking, modeled after the fast.com experience for internet speed tests. It uses an extensible provider adapter architecture, making it straightforward to add new providers such as Gemini or Groq. Planned additions include historical results, model comparison mode, and support for local models via Ollama.&lt;/p&gt;

&lt;p&gt;iamspeed.dev is an open-source, browser-based benchmarking tool for LLM APIs that measures two key performance metrics: Time to First Token (TTFT) and tokens-per-second throughput. It supports OpenAI and Anthropic providers and stores API keys locally using AES-GCM encryption, with no backend or data transmission.&lt;/p&gt;

&lt;h2&gt;The Problem That Sparked This&lt;/h2&gt;

&lt;p&gt;I spend a lot of time benchmarking systems. Load testing APIs, profiling microservices, measuring throughput it is what I do at QAInsights and at my day job.&lt;/p&gt;

&lt;p&gt;When LLMs started becoming part of production stacks, I noticed that most developers just eye-balled "it feels fast" or "it feels slow." There was no quick, browser-based tool you could open, configure your API key, and immediately get a concrete number.&lt;/p&gt;

&lt;p&gt;Tools like &lt;a href="https://artificialanalysis.ai/" rel="noopener noreferrer"&gt;artificial analysis&lt;/a&gt; do heavy-lifting comparisons across hundreds of models. But I wanted something lighter. Something you could open on a Tuesday afternoon and just run.&lt;/p&gt;

&lt;p&gt;That is exactly how &lt;a href="https://fast.com/" rel="noopener noreferrer"&gt;fast.com&lt;/a&gt; works for internet speed tests. You open it, it runs, you see a number. Done.&lt;/p&gt;

&lt;p&gt;So I built the same thing for LLM APIs: &lt;strong&gt;iamspeed.dev&lt;/strong&gt;.&lt;/p&gt;





&lt;p&gt;&lt;/p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fachu-2026-06-17-000829-558-1024x768.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fachu-2026-06-17-000829-558-1024x768.png" alt="Introducing iamspeed.dev" width="800" height="600"&gt;&lt;/a&gt;Introducing iamspeed.dev&lt;p&gt;&lt;/p&gt;

&lt;h1&gt;What Is iamspeed.dev?&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://iamspeed.dev/" rel="noopener noreferrer"&gt;iamspeed.dev&lt;/a&gt; is an open-source, browser-based benchmarking tool for LLM APIs. It streams live tokens from supported providers OpenAI and Anthropic today and shows you real-time performance metrics as they happen.&lt;/p&gt;

&lt;p&gt;No backend. No data collection. No surprises.&lt;/p&gt;

&lt;p&gt;Your API key is stored locally in your browser using AES-GCM encryption, meaning it never leaves your machine.&lt;/p&gt;

&lt;p&gt;The interface is deliberately minimal, as shown below just a logo, a metric display, a Run button, and a settings panel. Inspired directly by the fast.com aesthetic.&lt;/p&gt;





&lt;h2&gt;Key Metrics: What Gets Measured&lt;/h2&gt;

&lt;p&gt;If you work with LLMs in production, you already know that raw response time is a misleading number. The two metrics that actually matter are:&lt;/p&gt;

&lt;h3&gt;1. Time to First Token (TTFT)&lt;/h3&gt;

&lt;p&gt;This is the time between sending your request and receiving the very first token back from the model. It reflects how quickly the LLM starts generating a response.&lt;/p&gt;

&lt;p&gt;TTFT is what users feel. A high TTFT means that awkward pause before anything appears on screen.&lt;/p&gt;

&lt;p&gt;For interactive applications, keeping TTFT low is critical. Reasoning models (extended thinking, deep think modes) can inflate TTFT by 5x to 30x because of the additional compute happening before the first visible token arrives.&lt;/p&gt;

&lt;h3&gt;2. Tokens Per Second (Throughput)&lt;/h3&gt;

&lt;p&gt;This is the rate at which the model streams tokens to you after the first one arrives. It is the "output speed" metric.&lt;/p&gt;

&lt;p&gt;High tokens per second means the text appears fast and fluid on screen. Low throughput feels choppy and slow even if the TTFT was acceptable.&lt;/p&gt;

&lt;p&gt;Together, these two numbers give you the full picture of how an LLM API performs for your use case.&lt;/p&gt;





&lt;h2&gt;Features at a Glance&lt;/h2&gt;

&lt;p&gt;Here is a quick summary of what iamspeed.dev supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live streaming output&lt;/strong&gt; with real-time metric updates&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;TTFT measurement&lt;/strong&gt; captured precisely at the moment the first token arrives&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Tokens/sec throughput&lt;/strong&gt; tracking updated continuously during generation&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;AES-GCM encrypted API key storage&lt;/strong&gt; local only, never transmitted&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;OpenAI provider support&lt;/strong&gt; (GPT-4o, GPT-4.1, and compatible models)&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Anthropic provider support&lt;/strong&gt; (Claude Sonnet, Claude Haiku, and more)&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Extensible provider architecture&lt;/strong&gt; via a clean &lt;code&gt;ProviderAdapter&lt;/code&gt; interface&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Responsive minimal UI&lt;/strong&gt; inspired by fast.com&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key thing I want to highlight is the local encryption. I have seen too many tools that ask for your API key and quietly send it somewhere. iamspeed.dev does not do that. Your key is AES-GCM encrypted and stored only in your browser's local storage.&lt;/p&gt;

&lt;p&gt;The provider architecture is clean and intentional. Each LLM provider is implemented as an adapter that satisfies the &lt;code&gt;ProviderAdapter&lt;/code&gt; interface. This makes adding new providers straightforward and keeps the core benchmark logic provider-agnostic.&lt;/p&gt;

&lt;p&gt;The project is hosted at &lt;a href="https://iamspeed.dev/" rel="noopener noreferrer"&gt;iamspeed.dev&lt;/a&gt; and the full source is available on &lt;a href="https://github.com/QAInsights/iamspeed.dev" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;





&lt;h2&gt;How to Run It Locally&lt;/h2&gt;

&lt;p&gt;Running iamspeed.dev locally takes under a minute. Here are the steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clone the repository:&lt;/li&gt;
&lt;/ol&gt;

&lt;pre&gt;&lt;code&gt;git clone https://github.com/QAInsights/iamspeed.dev.git
cd iamspeed.dev
&lt;/code&gt;&lt;/pre&gt;

&lt;ol start="2"&gt;
&lt;li&gt;Install dependencies:&lt;/li&gt;
&lt;/ol&gt;

&lt;pre&gt;&lt;code&gt;npm install
&lt;/code&gt;&lt;/pre&gt;

&lt;ol start="3"&gt;
&lt;li&gt;Start the development server:&lt;/li&gt;
&lt;/ol&gt;

&lt;pre&gt;&lt;code&gt;npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;ol start="4"&gt;
&lt;li&gt;Head to &lt;code&gt;http://localhost:4321&lt;/code&gt; in your browser.&lt;/li&gt;



&lt;li&gt;Click the gear icon (Settings) and enter your OpenAI or Anthropic API key.&lt;/li&gt;



&lt;li&gt;Hit &lt;strong&gt;Run&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You will immediately see the tokens streaming in and the tokens/sec counter updating live, as shown below.&lt;/p&gt;

&lt;p&gt;Here are all the available commands:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;npm run dev&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Start the dev server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;npm run build&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Build for production&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;npm run preview&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Preview the production build&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;npm test&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Run unit tests (Vitest)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;npm run test:e2e&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Run E2E tests (Playwright)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;





&lt;h2&gt;How to Add a New Provider&lt;/h2&gt;

&lt;p&gt;This is where the architecture really shines. If you want to add support for, say, Gemini or Groq, the process is clean:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new adapter file in &lt;code&gt;src/lib/providers/&lt;/code&gt;. Your adapter must implement the &lt;code&gt;ProviderAdapter&lt;/code&gt; interface.&lt;/li&gt;



&lt;li&gt;Register it in &lt;code&gt;src/lib/providers/index.ts&lt;/code&gt;.&lt;/li&gt;



&lt;li&gt;Add the provider metadata (name, models, etc.) to &lt;code&gt;src/lib/config.ts&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is it. No changes to the benchmark engine, no changes to the UI logic. The adapter pattern keeps concerns separated cleanly.&lt;/p&gt;

&lt;p&gt;I am planning to add more providers over time. If you want to contribute one, pull requests are welcome.&lt;/p&gt;





&lt;h2&gt;Why This Matters for Performance Engineers&lt;/h2&gt;

&lt;p&gt;I want to speak directly to performance engineers here for a second.&lt;/p&gt;

&lt;p&gt;We are used to measuring systems with JMeter, k6, Gatling. We understand throughput, latency percentiles, concurrency, think time. LLM APIs add a new dimension to all of this.&lt;/p&gt;

&lt;p&gt;When you are building an AI-powered product, you are not just measuring HTTP response time anymore. You are dealing with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TTFT as a user-perceived latency metric&lt;/strong&gt; (equivalent to time-to-interactive in web perf)&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Streaming throughput&lt;/strong&gt; as a sustained delivery rate (not a one-shot measurement)&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Provider variability&lt;/strong&gt; the same model can behave very differently across regions and time of day&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Reasoning overhead&lt;/strong&gt; thinking models add invisible compute time before the first visible token&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tools like iamspeed.dev give you a quick sanity check. Before you design a full performance test suite for your LLM-powered API, run a quick benchmark here to understand your baseline numbers.&lt;/p&gt;

&lt;p&gt;I have written extensively about LLM performance metrics on the QAInsights blog and built the jmeter-llm-sampler plugin for measuring TTFT and TTLT in JMeter test plans. iamspeed.dev is the browser-friendly companion to those deeper tools.&lt;/p&gt;





&lt;h2&gt;What's Next&lt;/h2&gt;

&lt;p&gt;A few things I want to add to iamspeed.dev:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More providers&lt;/strong&gt;: Gemini, Groq, Mistral, and local Ollama support&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Historical results&lt;/strong&gt;: Run multiple benchmarks and compare them over time&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Model comparison mode&lt;/strong&gt;: Run the same prompt across two models side by side&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Shareable result links&lt;/strong&gt;: Generate a URL you can share with your team&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Prompt customization&lt;/strong&gt;: Let you choose the input prompt length to simulate different workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If any of these sound useful to you, drop a star on the &lt;a href="https://github.com/QAInsights/iamspeed.dev" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; and let me know what you want to see first.&lt;/p&gt;





&lt;h2&gt;Try It Now&lt;/h2&gt;

&lt;p&gt;Head to &lt;a href="https://iamspeed.dev/" rel="noopener noreferrer"&gt;iamspeed.dev&lt;/a&gt;, configure your API key in settings, and hit Run.&lt;/p&gt;

&lt;p&gt;You will have your tokens-per-second number in about 10 seconds.&lt;/p&gt;

&lt;p&gt;The source code is MIT licensed and available at &lt;a href="https://github.com/QAInsights/iamspeed.dev" rel="noopener noreferrer"&gt;github.com/QAInsights/iamspeed.dev&lt;/a&gt;. Contributions are open.&lt;/p&gt;

&lt;p&gt;Happy Testing!&lt;/p&gt;





&lt;p&gt;&lt;strong&gt;What LLM provider are you using in production today, and what TTFT are you seeing? Drop a comment below I would love to know how the numbers compare.&lt;/strong&gt;&lt;/p&gt;





</description>
      <category>ai</category>
      <category>webdev</category>
      <category>developers</category>
      <category>tooling</category>
    </item>
    <item>
      <title>How I Use Qwen Code Slash Commands to Build Achu App</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Wed, 17 Jun 2026 03:49:08 +0000</pubDate>
      <link>https://dev.to/qainsights/how-i-use-qwen-code-slash-commands-to-build-achu-app-5cm9</link>
      <guid>https://dev.to/qainsights/how-i-use-qwen-code-slash-commands-to-build-achu-app-5cm9</guid>
      <description>&lt;p&gt;In this blog post, we will see how I use Qwen Code's slash commands and workflow strategies to build &lt;a href="https://achu.app/" rel="noopener noreferrer"&gt;Achu&lt;/a&gt; my screenshot beautifier app without burning through tokens or losing context mid-session.&lt;/p&gt;

&lt;p&gt;If you haven't heard of &lt;a href="https://achu.app" rel="noopener noreferrer"&gt;Achu&lt;/a&gt;, it's a desktop app built with Electron + React + TypeScript. It does screenshot beautification, Privacy Guard (offline OCR redaction), Auto-Vibe (palette-extracted backgrounds), and an AI Bug Agent with GitHub integration. It's a side project I'm genuinely proud of, and Qwen Code has become my go-to agentic coding CLI for it.&lt;/p&gt;

&lt;p&gt;A developer shares their day-to-day workflow for using Qwen Code, an open-source agentic coding CLI, to build Achu, a desktop screenshot beautification app built with Electron, React, and TypeScript. The post covers how slash commands like /init, /plan, /compress, /remember, and /btw are used to manage context, reduce token costs, and maintain consistent output across sessions.&lt;/p&gt;

&lt;p&gt;The core approach centers on spec-driven planning through iterative /plan sessions before any code is written, combined with parallel subagents for independent tasks and strict context hygiene using /compress and /clear. Additional practices include pointing the model at library source code instead of documentation and using /remember to persist architectural decisions across sessions.&lt;/p&gt;

&lt;p&gt;This isn't a tutorial about what Qwen Code is. It's about how I actually use it day-to-day, the slash command tricks I rely on, and the discipline it takes to get real work done with an LLM in a terminal.&lt;/p&gt;

&lt;p&gt;It all started with Google Antigravity, but the 5 hours reset and weekly limits is killing my productivity and thinking flow. I had to switch to more affordable and open source model where I chose Qwen.&lt;/p&gt;





&lt;h2&gt;Why Qwen Code?&lt;/h2&gt;

&lt;p&gt;I've tried Claude Code, Gemini CLI, and a bunch of others. Qwen Code is open source, has excellent subagent support, a rich slash command system, and Qwen Max is genuinely strong at reasoning through complex TypeScript and Electron internals.&lt;/p&gt;

&lt;p&gt;My go-to model is &lt;strong&gt;Qwen Max&lt;/strong&gt;. For lighter tasks like &lt;code&gt;/recap&lt;/code&gt; or prompt suggestions I set a fast model with &lt;code&gt;/model --fast qwen3-coder-flash&lt;/code&gt; to keep costs down.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-7-1024x320.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-7-1024x320.png" alt="" width="800" height="250"&gt;&lt;/a&gt;&lt;/p&gt;





&lt;h2&gt;The /init and Project Context Setup&lt;/h2&gt;

&lt;p&gt;The very first thing I do when I start on a new project or return to Achu after a few days is run:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/init
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This analyzes the current directory and generates an initial context file essentially giving Qwen Code a map of the project. It picks up folder structure, key files, and creates a baseline understanding before I say a single word.&lt;/p&gt;

&lt;p&gt;After &lt;code&gt;/init&lt;/code&gt;, I manually add a few paragraphs about the project. I treat this like writing a team onboarding doc for a new developer. I tell Qwen what Achu is, what the current milestone is, what tech stack we're on, and what the known constraints are (like Electron IPC boundaries, the Upstash Redis integration, or the Gumroad-based monetization model).&lt;/p&gt;

&lt;p&gt;This upfront investment saves enormous amounts of back-and-forth later.&lt;/p&gt;





&lt;h2&gt;Spec-Driven Planning with /plan&lt;/h2&gt;

&lt;p&gt;When I want to build a new feature, I don't just dump a vague request and hope for the best. I use &lt;code&gt;/plan&lt;/code&gt; to switch Qwen Code into planning mode.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/plan Implement the Privacy Guard redaction pipeline
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In plan mode, Qwen analyzes and thinks, but does not touch any files. This is key. It's the agentic equivalent of "think before you act."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-8-1024x322.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-8-1024x322.png" alt="Plan mode" width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What I actually do is run multiple turns in plan mode to iterate the spec. I think of a "spec" as the formal artifact that describes what should be built the interface contracts, the data flow, the error paths, the acceptance criteria. It's not a vague idea. It's something precise enough that a developer (or subagent) could implement it.&lt;/p&gt;

&lt;p&gt;The loop looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;/plan&lt;/code&gt; enter planning mode&lt;/li&gt;



&lt;li&gt;Describe the feature in detail what it does, what it doesn't do, edge cases&lt;/li&gt;



&lt;li&gt;Ask Qwen to propose an approach&lt;/li&gt;



&lt;li&gt;Push back on anything that doesn't fit the architecture&lt;/li&gt;



&lt;li&gt;Ask for the revised plan&lt;/li&gt;



&lt;li&gt;Repeat 2-3 times until the spec is solid&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The quality of implementation is almost entirely determined by the quality of the spec. This multi-turn refinement before a single line of code is written is the most valuable habit I've developed using any agentic coding tool.&lt;/p&gt;

&lt;p&gt;By default, Qwen will ask followup questions. But it is always recommended to tell the model to ask questions.&lt;/p&gt;





&lt;h2&gt;Subagents for Async Work&lt;/h2&gt;

&lt;p&gt;Once the spec is locked in, I use subagents aggressively for any work that can happen independently.&lt;/p&gt;

&lt;p&gt;Qwen Code's subagent system lets you define specialized agents as Markdown files in &lt;code&gt;.qwen/agents/&lt;/code&gt;. Each agent has its own system prompt, tool allowlist, and model. You can call them explicitly or let Qwen delegate automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-9-1024x423.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-9-1024x423.png" alt="" width="799" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For Achu, I have a few custom subagents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;testing subagent&lt;/strong&gt; focused on Vitest and Electron testing patterns (more on this below)&lt;/li&gt;



&lt;li&gt;A &lt;strong&gt;code reviewer subagent&lt;/strong&gt; that runs in &lt;code&gt;plan&lt;/code&gt; mode and only reads files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key power here is &lt;strong&gt;Fork Subagents&lt;/strong&gt; when Qwen needs to run multiple things in parallel, it can implicitly fork. Forks inherit the parent context, run in the background, and share the prompt cache prefix. This means if I ask Qwen to "investigate the IPC handler for Privacy Guard, the Ollama integration, and the Upstash Redis voting flow simultaneously," it can fork three parallel agents without tripling my token costs.&lt;/p&gt;

&lt;p&gt;I explicitly phrase tasks as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Run these three investigations in parallel using subagents and report back."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This keeps the main conversation focused and lets the grunt work happen concurrently.&lt;/p&gt;

&lt;p&gt;A project-level subagent config lives at &lt;code&gt;.qwen/agents/testing.md&lt;/code&gt; and looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;---
name: testing
description: "Writes Vitest unit tests and Electron integration tests for Achu. Use PROACTIVELY for any test-related tasks."
approvalMode: auto-edit
tools:
  - read_file
  - write_file
  - read_many_files
  - run_shell_command
---

You are a testing specialist for an Electron + React + TypeScript app.
Follow Vitest conventions. Mock Electron IPC using vitest-mock-extended.
Always write both positive and negative test cases.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The phrase "Use PROACTIVELY" in the description is important it signals to the main model to delegate testing tasks here without being asked explicitly.&lt;/p&gt;





&lt;h2&gt;Context Hygiene: /summary, /compress, and /clear&lt;/h2&gt;

&lt;p&gt;This is where most people fail with long agentic sessions. They let the context grow unbounded until the model starts hallucinating, forgetting earlier instructions, or producing inconsistent output. I've learned to treat context like memory on a constrained machine.&lt;/p&gt;

&lt;p&gt;My hygiene rules:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After a major chunk of work is done:&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/summary
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This generates a project summary from the conversation history. I save this externally and reference it when restarting sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When the context window is getting full:&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/compress
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This replaces chat history with a compressed summary, freeing up tokens while preserving the semantic essence of what was discussed. Think of it as a lossy but practical checkpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When Qwen starts steering away:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the model starts going off-track giving answers that don't match the project constraints, suggesting patterns we've already ruled out, or just losing the thread. I don't argue. If it happens twice in a row, I clear:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/clear
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then I reload context from scratch using &lt;code&gt;/init&lt;/code&gt; and a fresh description. Two drifts is my hard limit. The discipline here is resisting the urge to keep "fixing" a bad session. It's cheaper to restart clean.&lt;/p&gt;





&lt;h2&gt;Watching Context and Usage with /context and /stats&lt;/h2&gt;

&lt;p&gt;I watch these two commands constantly.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/context
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Shows a breakdown of what's consuming the context window right now system prompt, conversation history, tool results. If I see tool results bloating the context, I know a &lt;code&gt;/compress&lt;/code&gt; is coming.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/context detail
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Shows per-item breakdown. Useful when one massive file read is eating 40% of the window.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-10.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-10.png" alt="" width="800" height="666"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/stats
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Gives detailed session statistics tokens used, API calls, cost estimates. I check this before and after big operations. It's how I keep tabs on spend, especially on Qwen Max which isn't the cheapest model.&lt;/p&gt;

&lt;p&gt;Keeping an eye on these is the agentic equivalent of watching memory usage in a production system. Ignore it and you'll pay for it.&lt;/p&gt;





&lt;h2&gt;Pointing to Source Directories Instead of Docs&lt;/h2&gt;

&lt;p&gt;This one is a significant productivity trick that I don't see talked about enough.&lt;/p&gt;

&lt;p&gt;When Qwen needs to understand a third-party library, the default approach is to tell it to fetch the docs URL. The problem is that docs are often incomplete, outdated, or optimized for humans rather than LLMs.&lt;/p&gt;

&lt;p&gt;What I do instead: I download the library source and point the conversation directly at it using &lt;code&gt;@&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@./vendor/upstash-redis/src Tell me how the pipeline API works
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or with a deeper path:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@./node_modules/@electron/remote/src/main Explain the context bridge setup
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Qwen reads the actual implementation. No guessing from docs. No hallucination about API signatures that changed in the last major version.&lt;/p&gt;

&lt;p&gt;I keep a &lt;code&gt;vendor/&lt;/code&gt; folder in the project root where I clone or copy source for critical dependencies. This makes &lt;code&gt;@&lt;/code&gt; references stable and reproducible.&lt;/p&gt;

&lt;p&gt;For Achu specifically, I've pointed Qwen at the Ollama TypeScript client source, the llava-phi3 model integration code, and parts of the Electron forge config. The answers I get are ground-truth accurate instead of approximately correct.&lt;/p&gt;





&lt;h2&gt;Persistent Memory with /remember and /dream&lt;/h2&gt;

&lt;p&gt;Some things should survive session boundaries. My preferences, key architectural decisions, constraints Qwen needs to always respect. I use &lt;code&gt;/remember&lt;/code&gt; for these.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/remember Always use Electron's contextBridge for IPC. Never use remote module.
/remember Achu uses oklch color space. Do not suggest hex values without conversion.
/remember Free tier users get 3 exports per day. Pro users are unlimited.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;These get persisted in Qwen's memory store and are injected into future sessions automatically.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/dream&lt;/code&gt; is the manual trigger for auto-memory consolidation. Qwen's auto-memory runs in the background, but if I want to force a consolidation pass after a long session to make sure the important discoveries from the current session get persisted I run:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/dream
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Think of it as flushing the cache to disk before shutting down.&lt;/p&gt;

&lt;p&gt;To review and manage what's been remembered, I use:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/memory
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This opens the Memory Manager dialog where I can edit or delete entries. I audit this occasionally. Stale memories can be just as harmful as no memories.&lt;/p&gt;





&lt;h2&gt;The /btw Trick for Side Questions&lt;/h2&gt;

&lt;p&gt;This is my favourite quality-of-life command.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/btw What's the difference between contextBridge.invoke and contextBridge.exposeInMainWorld?
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;/btw&lt;/code&gt; sends a parallel API call with recent conversation context (up to the last 20 messages) and shows the response above the composer without touching the main conversation at all. The main session continues uninterrupted.&lt;/p&gt;

&lt;p&gt;I use this constantly for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quick clarifications while in the middle of implementation&lt;/li&gt;



&lt;li&gt;Checking a TypeScript type signature without derailing a planning session&lt;/li&gt;



&lt;li&gt;Double-checking a shell command before running it via &lt;code&gt;!&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The response doesn't become part of conversation history. It's a throwaway lookup. This is genuinely useful and I'm surprised more CLI tools don't have something like it.&lt;/p&gt;





&lt;h2&gt;Uncommon Commands Worth Knowing&lt;/h2&gt;

&lt;p&gt;Beyond the commands I use daily, here are a few from the docs that are genuinely underrated:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/restore&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Restores files to their state before a tool execution. If a Qwen action made a mess, you can list recent tool executions with &lt;code&gt;/restore&lt;/code&gt; and roll back a specific one with &lt;code&gt;/restore &amp;lt;ID&amp;gt;&lt;/code&gt;. Think of it as a targeted undo for AI changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/loop&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Runs a prompt on a recurring schedule:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/loop 5m check the build output and report any new warnings
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I use this occasionally when I'm doing a long build and want Qwen to monitor for me while I do something else. It's a lightweight cron for conversational tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/recap&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Generates a one-line summary of where the session left off. If I step away for more than five minutes, Qwen auto-triggers this when I return:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;? Implementing the Privacy Guard redaction pipeline. Next step: wire the OCR output into the bounding-box overlay renderer.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Incredibly useful for picking up after an interruption without scrolling through history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/approval-mode auto-edit&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once I trust the current task scope, I switch to auto-edit to let Qwen make file changes without prompting me every time. I reserve &lt;code&gt;yolo&lt;/code&gt; mode for throwaway branches only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/directory&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Adds multiple directories to the workspace context:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/dir add ./src,./tests,./electron
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Useful when the feature spans multiple root-level directories that Qwen wouldn't automatically scope to.&lt;/p&gt;





&lt;h2&gt;My Qwen Code Workflow Summary&lt;/h2&gt;

&lt;p&gt;Here's the workflow I follow for every non-trivial feature in Achu:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start clean&lt;/strong&gt; &lt;code&gt;/init&lt;/code&gt; + add project context manually&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Spec first&lt;/strong&gt;  use &lt;code&gt;/plan&lt;/code&gt; in multiple turns until the spec is solid&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Delegate async tasks&lt;/strong&gt;  use subagents for parallel investigations and implementation&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Monitor context&lt;/strong&gt;  &lt;code&gt;/context detail&lt;/code&gt; regularly, &lt;code&gt;/compress&lt;/code&gt; proactively&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Log source truth&lt;/strong&gt;  point &lt;code&gt;@&lt;/code&gt; at source directories, not docs&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Remember decisions&lt;/strong&gt;  &lt;code&gt;/remember&lt;/code&gt; for anything that should persist&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Quick questions&lt;/strong&gt;  &lt;code&gt;/btw&lt;/code&gt; without breaking flow&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Clear if it drifts&lt;/strong&gt; two steering-away moments is my hard limit for &lt;code&gt;/clear&lt;/code&gt;
&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;End of session&lt;/strong&gt; &lt;code&gt;/dream&lt;/code&gt; to consolidate memory, &lt;code&gt;/summary&lt;/code&gt; to save the state&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This isn't magic. It's discipline. The LLM doesn't make you disciplined you have to bring that yourself. But when you apply this workflow consistently, the output quality is noticeably better than freeform chatting with an AI.&lt;/p&gt;

&lt;p&gt;If you're building something with Qwen Code, try the spec-first approach. The twenty minutes you spend in &lt;code&gt;/plan&lt;/code&gt; mode iterating the spec will save you three hours of correcting implementation drift.&lt;/p&gt;





&lt;p&gt;Happy Agentic Coding, Testing, Shipping, Learning, whatever :) !&lt;/p&gt;

&lt;p&gt;What's your biggest challenge with agentic coding workflows staying in context, or getting the model to follow architectural constraints? Let me know in the comments.&lt;/p&gt;





</description>
      <category>ai</category>
      <category>productivity</category>
      <category>development</category>
      <category>resources</category>
    </item>
    <item>
      <title>Supervised Vibe Coding: A Manifesto</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Thu, 11 Jun 2026 03:01:10 +0000</pubDate>
      <link>https://dev.to/qainsights/supervised-vibe-coding-a-manifesto-50d4</link>
      <guid>https://dev.to/qainsights/supervised-vibe-coding-a-manifesto-50d4</guid>
      <description>&lt;p&gt;We are not anti-AI. We are pro-discipline.&lt;/p&gt;

&lt;p&gt;Vibe coding unlocks speed. Supervised vibe coding unlocks speed you can trust. The difference is a developer who remains the final decision-maker at every step, not a passive reviewer of whatever the model felt good about.&lt;/p&gt;

&lt;p&gt;Supervised vibe coding is a development approach that combines AI-generated speed with deliberate human oversight, positioning the developer as the final decision-maker rather than a passive reviewer. It builds on Andrej Karpathy's 2025 concept of "vibe coding," which described fully delegating code generation to AI tools without reviewing the output.&lt;/p&gt;

&lt;p&gt;The manifesto outlines ten guiding principles covering incremental delivery, test coverage, code review, prompt discipline, security, documentation, configuration management, CI/CD enforcement, ownership, and dependency auditing. A recurring theme is that AI accelerates execution but cannot replace developer judgment, accountability, or the ability to code independently.&lt;/p&gt;

&lt;h2&gt;The origin of vibe coding&lt;/h2&gt;

&lt;p&gt;On February 2, 2025, Andrej Karpathy, co-founder of OpenAI and former director of AI at Tesla, posted a short thought on X that would change how the software world talked about AI-assisted development.&lt;/p&gt;

&lt;p&gt;"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists." Andrej Karpathy, X (formerly Twitter), February 2, 2025.&lt;/p&gt;

&lt;p&gt;The post went viral, clocking over 4.5 million views. Karpathy described using tools like Cursor Composer paired with Anthropic's Claude models, sometimes via voice through SuperWhisper, barely touching his keyboard. He accepted all AI-generated changes without reviewing diffs, pasted error messages straight back to the model, and let the codebase grow organically, even beyond his own full comprehension.&lt;/p&gt;

&lt;p&gt;The phrase struck a cultural nerve because it named something developers were already doing, just without a word for it. By end of 2025, Collins Dictionary named "&lt;strong&gt;vibe coding&lt;/strong&gt;" its &lt;strong&gt;Word of the Year&lt;/strong&gt;, with nearly half of all developers reporting daily use of AI coding tools.&lt;/p&gt;

&lt;p&gt;Karpathy himself acknowledged the limits early. He noted that AI occasionally could not fix certain bugs, forcing him to work around them or prompt blindly until something stuck. He called it "quite amusing" and best suited for non-critical projects. That caveat got lost in the hype.&lt;/p&gt;

&lt;p&gt;Now &lt;strong&gt;Supervised Vibe Coding&lt;/strong&gt;&amp;nbsp;formalizes what disciplined engineers were already practicing. Speed from AI. Judgment from humans.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-6-1024x384.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F06%2Fimage-6-1024x384.png" alt="Supervised Vibe Coding: A Manifesto" width="800" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;The 10 laws&lt;/h2&gt;

&lt;p&gt;01&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ship in slices, not in floods&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Build incrementally. Each iteration must be reviewable, testable, and deployable on its own. Human review is not optional; AI is a contributor, not a reviewer. If you cannot review it in one sitting, it is too large.&lt;/p&gt;

&lt;p&gt;02&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tests are not a phase, they are a practice&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Unit, integration, and edge case tests accompany every feature. AI may scaffold the test file. You verify every assertion nulls, empty inputs, boundary values, concurrency, and failure paths are caught at design time, not at incident time.&lt;/p&gt;

&lt;p&gt;03&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Read before you run&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Understand every snippet before accepting it. Verify APIs exist, functions are not deprecated, and packages are not hallucinated. If you cannot explain what the code does and what it depends on, it is not ready to merge.&lt;/p&gt;

&lt;p&gt;04&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt with intent, pin your model&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bad prompts produce bad code. Be explicit about language version, constraints, patterns, and security requirements in every prompt. Share prompt conventions with your team so AI behaviour is consistent across the codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model drift:&lt;/strong&gt;&amp;nbsp;Pin your model version in CI/CD the same way you pin a package version. An unversioned AI dependency is a silent breaking change waiting to happen. The same prompt can produce different outputs across model versions treat model upgrades like dependency upgrades: deliberate, tested, and reviewed.&lt;/p&gt;

&lt;p&gt;05&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security, performance, and UX share equal priority&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No feature is done if it leaks data, crawls under load, or confuses users. These are first-class requirements on every ticket. Never paste customer data, PII, credentials, or secrets into an AI tool. The prompt is not a sandbox it is a transmission.&lt;/p&gt;

&lt;p&gt;06&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Document as you go, not as you leave&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI accelerates writing code but accumulates invisible technical debt. Document decisions, assumptions, and AI-generated sections as part of the same commit. Future maintainers deserve to know what the code does and why it was written this way.&lt;/p&gt;

&lt;p&gt;07&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration is code, treat it accordingly&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Secrets, environment variables, timeouts, and feature flags are versioned, validated, and never hardcoded. The config is part of the contract. A misconfigured deploy is still a broken deploy, regardless of how clean the code looks.&lt;/p&gt;

&lt;p&gt;08&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pipeline is the gatekeeper&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lint, test, security scan, and build gates must all pass before code reaches the next environment. Observability and logging ship with the feature, not after. If you cannot see what your code is doing in production, you do not own it yet.&lt;/p&gt;

&lt;p&gt;09&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You are the supervisor, not the spectator&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Feature flags, rollback plans, canary deployments, and health checks turn every release into a controlled, reversible act. Decide upfront who owns the code when something breaks. AI does not get paged at 2 AM. Ownership must be explicit before the deploy, not after the incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deskilling is a silent risk:&lt;/strong&gt;&amp;nbsp;Deliberately solve problems without AI on a regular basis. Write a function from scratch. Debug without asking the model. The judgment this manifesto depends on atrophies if you never exercise it. Supervised vibe coding requires a supervisor who can actually code.&lt;/p&gt;

&lt;p&gt;10&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Own the dependency list&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every package AI pulls in is your responsibility to audit, pin, and maintain. AI will confidently suggest outdated, vulnerable, or nonexistent packages. Review licenses for IP compliance. Disclose AI involvement to your team, your clients, and where required, your employer. The code carries your name, not the model's.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>software</category>
    </item>
    <item>
      <title>Weekend Supervised Vibe Coding</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Mon, 08 Jun 2026 22:49:14 +0000</pubDate>
      <link>https://dev.to/qainsights/weekend-supervised-vibe-coding-3chi</link>
      <guid>https://dev.to/qainsights/weekend-supervised-vibe-coding-3chi</guid>
      <description>&lt;p&gt;Weekend Supervised Vibe Coding &lt;/p&gt;

&lt;p&gt;Achu - means &lt;code&gt;print&lt;/code&gt; in Tamil&lt;/p&gt;

&lt;p&gt;Built using &lt;a class="mentioned-user" href="https://dev.to/antigravityteam"&gt;@antigravityteam&lt;/a&gt; Google Flash 3.5 by burning my 1000 credits - then I pivoted to @CommandCodeAI DeepSeek Pro, after burning that, switched to raw &lt;a class="mentioned-user" href="https://dev.to/deepseekaifree"&gt;@deepseekaifree&lt;/a&gt; pro in the terminal.&lt;/p&gt;

&lt;p&gt;Please take a look at &lt;a href="https://achu.app" rel="noopener noreferrer"&gt;https://achu.app&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fehho97ijlg9au3mwus5f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fehho97ijlg9au3mwus5f.png" alt="Achu" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>developers</category>
    </item>
    <item>
      <title>Weekend Supervised Vibe Coding</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Sun, 31 May 2026 00:04:04 +0000</pubDate>
      <link>https://dev.to/qainsights/weekend-supervised-vibe-coding-26ip</link>
      <guid>https://dev.to/qainsights/weekend-supervised-vibe-coding-26ip</guid>
      <description>&lt;p&gt;Weekend Supervised Vibe Coding &lt;/p&gt;

&lt;p&gt;Achu - means &lt;code&gt;print&lt;/code&gt; in Tamil&lt;/p&gt;

&lt;p&gt;Built using &lt;a class="mentioned-user" href="https://dev.to/antigravityteam"&gt;@antigravityteam&lt;/a&gt; Google Flash 3.5 by burning my 1000 credits - then I pivoted to @CommandCodeAI DeepSeek Pro, after burning that, switched to raw &lt;a class="mentioned-user" href="https://dev.to/deepseekaifree"&gt;@deepseekaifree&lt;/a&gt; pro in the terminal.&lt;/p&gt;

&lt;p&gt;I am still testing :)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7hitw5tvawcvqgz57j7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7hitw5tvawcvqgz57j7.png" alt=" " width="800" height="667"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>design</category>
    </item>
    <item>
      <title>99% of Requests Failed and My Dashboard Showed Green</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Wed, 13 May 2026 15:41:30 +0000</pubDate>
      <link>https://dev.to/qainsights/99-of-requests-failed-and-my-dashboard-showed-green-4lpc</link>
      <guid>https://dev.to/qainsights/99-of-requests-failed-and-my-dashboard-showed-green-4lpc</guid>
      <description>&lt;p&gt;In this blog post, we will see how to use &lt;strong&gt;NVIDIA AIPerf&lt;/strong&gt; to expose a hidden performance problem that most LLM deployments never catch until real users start complaining.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/5t7Gz_F5pMA"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;I ran three simple tests against a local model. The results tell a story that every performance engineer should see.&lt;/p&gt;








&lt;h2&gt;The Setup&lt;/h2&gt;





&lt;p&gt;For this experiment, I used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model&lt;/strong&gt;: &lt;code&gt;granite4:350m&lt;/code&gt; running locally via Ollama&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Endpoint&lt;/strong&gt;: &lt;code&gt;http://localhost:11434&lt;/code&gt;
&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Tool&lt;/strong&gt;: NVIDIA AIPerf (the official successor to GenAI-Perf)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Head to &lt;a href="https://github.com/ai-dynamo/aiperf" rel="noopener noreferrer"&gt;https://github.com/ai-dynamo/aiperf&lt;/a&gt; to install AIPerf. It is a single pip install:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pip install aiperf&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Granite 4 350M is a small, fast model perfect for local testing on a MacBook or a dev machine without a beefy GPU. The principles you will see here apply equally to larger models in cloud deployments.&lt;/p&gt;








&lt;h2&gt;Run 1: The Baseline That Lies&lt;/h2&gt;





&lt;p&gt;I started with the most common mistake in LLM performance testing a single-user baseline.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;aiperf profile \
  --model "granite4:350m" \
  --streaming \
  --endpoint-type chat \
  --url http://localhost:11434 \
  --tokenizer builtin \
  --request-count 50 \
  --concurrency 1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The results looked great, as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F05%2FAiPerf-Run1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F05%2FAiPerf-Run1.png" alt="" title="AiPerf-Run1" width="800" height="332"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key numbers from this run:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;avg&lt;/th&gt;
&lt;th&gt;p50&lt;/th&gt;
&lt;th&gt;p99&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TTFT (ms)&lt;/td&gt;
&lt;td&gt;223.11&lt;/td&gt;
&lt;td&gt;217.60&lt;/td&gt;
&lt;td&gt;317.61&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTST (ms)&lt;/td&gt;
&lt;td&gt;10.94&lt;/td&gt;
&lt;td&gt;9.99&lt;/td&gt;
&lt;td&gt;18.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ITL (ms)&lt;/td&gt;
&lt;td&gt;10.67&lt;/td&gt;
&lt;td&gt;10.51&lt;/td&gt;
&lt;td&gt;12.35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Request Latency (ms)&lt;/td&gt;
&lt;td&gt;1,309.30&lt;/td&gt;
&lt;td&gt;1,043.95&lt;/td&gt;
&lt;td&gt;3,251.73&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Request Throughput (req/sec)&lt;/td&gt;
&lt;td&gt;0.76&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;223ms average TTFT. Smooth inter-token latency at 10.67ms. If you stopped here, you would call this production-ready.&lt;/p&gt;

&lt;p&gt;Most people stop here. That is the problem.&lt;/p&gt;








&lt;h2&gt;Run 2: The Wake-Up Call&lt;/h2&gt;





&lt;p&gt;Next, I pushed concurrency to 50, a more realistic number for a shared endpoint. I also added a warmup of 10 requests to eliminate cold-start noise, and ran for 60 seconds.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;aiperf profile \
  --model "granite4:350m" \
  --url http://localhost:11434 \
  --endpoint-type chat \
  --concurrency 50 \
  --tokenizer builtin \
  --warmup-request-count 10 \
  --benchmark-duration 60 \
  --streaming
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The results were a shock, as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://qainsights.com/wp-content/uploads/2026/05/AiPerf-Run2.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F05%2FAiPerf-Run2-1024x463.png" alt="" width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;avg&lt;/th&gt;
&lt;th&gt;p50&lt;/th&gt;
&lt;th&gt;p99&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TTFT (ms)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;41,660.92&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;50,870.37&lt;/td&gt;
&lt;td&gt;64,201.68&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTST (ms)&lt;/td&gt;
&lt;td&gt;10.21&lt;/td&gt;
&lt;td&gt;10.11&lt;/td&gt;
&lt;td&gt;13.10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ITL (ms)&lt;/td&gt;
&lt;td&gt;10.38&lt;/td&gt;
&lt;td&gt;10.18&lt;/td&gt;
&lt;td&gt;13.29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E2E Output Token Throughput (tokens/sec/user)&lt;/td&gt;
&lt;td&gt;4.86&lt;/td&gt;
&lt;td&gt;1.85&lt;/td&gt;
&lt;td&gt;60.87&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Request Throughput (req/sec)&lt;/td&gt;
&lt;td&gt;0.88&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;TTFT went from 223ms to &lt;strong&gt;41,660ms&lt;/strong&gt;. That is a 186x increase.&lt;/p&gt;

&lt;p&gt;At p99, users were waiting over &lt;strong&gt;64 seconds&lt;/strong&gt; just to see the first token.&lt;/p&gt;

&lt;p&gt;Your monitoring dashboard probably still shows green. Your users are staring at a blank screen.&lt;/p&gt;








&lt;h2&gt;Run 3: Goodput Exposes the Real Truth&lt;/h2&gt;





&lt;p&gt;This is where AIPerf separates itself from basic benchmarking tools. I added a &lt;code&gt;--goodput&lt;/code&gt; flag with a TTFT SLO of 500ms. Goodput measures the throughput of requests that actually &lt;em&gt;met&lt;/em&gt; the SLO, not just all requests indiscriminately.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;aiperf profile \
  --model "granite4:350m" \
  --url http://localhost:11434 \
  --endpoint-type chat \
  --concurrency 50 \
  --tokenizer builtin \
  --benchmark-duration 60 \
  --goodput 'time_to_first_token:500' \
  --streaming
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As shown below, the result is the most important number in this entire experiment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://qainsights.com/wp-content/uploads/2026/05/AiPerf-Run3.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fqainsights.com%2Fwp-content%2Fuploads%2F2026%2F05%2FAiPerf-Run3-1024x473.png" alt="" width="800" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Request Throughput (req/sec)&lt;/td&gt;
&lt;td&gt;0.91&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Goodput (req/sec)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.01&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTFT avg (ms)&lt;/td&gt;
&lt;td&gt;37,380.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTFT p99 (ms)&lt;/td&gt;
&lt;td&gt;55,777.69&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Request throughput says 0.91 req/sec. Looks reasonable.&lt;/p&gt;

&lt;p&gt;Goodput says 0.01 req/sec.&lt;/p&gt;

&lt;p&gt;That means &lt;strong&gt;roughly 99% of requests failed the 500ms TTFT SLO&lt;/strong&gt;. Your system is processing requests. It is not serving users.&lt;/p&gt;








&lt;h2&gt;The Hidden Insight: ITL Stays Rock Solid&lt;/h2&gt;





&lt;p&gt;Here is what most people miss when they first see these numbers. Look at ITL across all three runs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;TTFT avg (ms)&lt;/th&gt;
&lt;th&gt;ITL avg (ms)&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency 1&lt;/td&gt;
&lt;td&gt;223.11&lt;/td&gt;
&lt;td&gt;10.67&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency 50&lt;/td&gt;
&lt;td&gt;41,660.92&lt;/td&gt;
&lt;td&gt;10.38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency 50 + Goodput&lt;/td&gt;
&lt;td&gt;37,380.20&lt;/td&gt;
&lt;td&gt;9.71&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;ITL barely moves. TTST (Time to Second Token) also stayed consistent around 10ms across all runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model is not the problem. The queue is.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once the model starts generating for a request, it flies. Tokens come out at a consistent 10ms pace regardless of how many other requests are in flight. The bottleneck is entirely in the &lt;strong&gt;prefill phase&lt;/strong&gt;, requests piling up waiting for the model to even begin processing them.&lt;/p&gt;

&lt;p&gt;This is a critical distinction for capacity planning. If ITL were also degrading, you would need a faster model or better hardware. Since only TTFT is exploding, the fix is architectural, better queue management, request routing, or horizontal scaling of the inference server.&lt;/p&gt;

&lt;p&gt;You cannot arrive at this insight without separating TTFT from ITL. A single "response time" metric would have buried it entirely.&lt;/p&gt;








&lt;h2&gt;The Lesson&lt;/h2&gt;





&lt;p&gt;Three commands. Three minutes. A completely different picture of your system.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;What you measured&lt;/th&gt;
&lt;th&gt;What you learned&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single-user baseline&lt;/td&gt;
&lt;td&gt;False confidence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency 50&lt;/td&gt;
&lt;td&gt;The real TTFT behavior under load&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Goodput with SLO&lt;/td&gt;
&lt;td&gt;How many users are actually being served&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The takeaway is simple: always test with realistic concurrency. Always set an SLO and measure &lt;strong&gt;goodput&lt;/strong&gt; against it. And always look at TTFT and ITL separately they tell completely different stories.&lt;/p&gt;

&lt;p&gt;A system with great ITL and terrible TTFT under load has a queue problem, not a model problem. Knowing that changes everything about how you fix it.&lt;/p&gt;








&lt;p&gt;Happy Testing!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Over to you:&lt;/strong&gt; Have you ever shipped an LLM feature that looked great in testing but struggled under real user load? What metric finally exposed it? Drop a comment below I would love to hear your story.&lt;/p&gt;





</description>
      <category>ai</category>
      <category>performance</category>
      <category>llm</category>
      <category>nvidia</category>
    </item>
    <item>
      <title>Beyond the Hype: A Comprehensive Guide to Benchmarking LLMs with AWS Labs’ LLMeter</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Thu, 07 May 2026 16:30:44 +0000</pubDate>
      <link>https://dev.to/qainsights/beyond-the-hype-a-comprehensive-guide-to-benchmarking-llms-with-aws-labs-llmeter-1504</link>
      <guid>https://dev.to/qainsights/beyond-the-hype-a-comprehensive-guide-to-benchmarking-llms-with-aws-labs-llmeter-1504</guid>
      <description>





&lt;p id="p-rc_9231198f56807c04-27"&gt;In the current AI gold rush, the conversation has shifted from "Can it do the task?" to "How efficiently can it do the task?" For engineers moving Large Language Models (LLMs) into production, the "vibe check" is no longer sufficient. You need hard data on latency, throughput, and cost-efficiency.&lt;sup&gt;&lt;/sup&gt;&lt;/p&gt;





&lt;p id="p-rc_9231198f56807c04-28"&gt;AWS Labs recently released &lt;strong&gt;&lt;a href="https://github.com/awslabs/llmeter" rel="noopener noreferrer"&gt;LLMeter&lt;/a&gt;&lt;/strong&gt;, a Python-based benchmarking library that is quickly becoming the gold standard for performance engineers. In this guide, we’ll break down why this tool matters, how to use it, and how to visualize your data for executive-level insights.&lt;/p&gt;



&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/hBM44GNjURA"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;








&lt;h2&gt;The Metrics That Actually Matter&lt;/h2&gt;





&lt;p&gt;Before diving into the code, we must define the "North Star" metrics of LLM performance. LLMeter is specifically designed to capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time to First Token (TTFT):&lt;/strong&gt; The duration between sending a request and receiving the first byte of data. This is the most critical metric for perceived user latency.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Tokens Per Second (TPS):&lt;/strong&gt; The speed at which the model generates text. A high TPS ensures a smooth reading experience.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Time to Last Token (TTL):&lt;/strong&gt; The total duration for the entire response.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Cost Per Request:&lt;/strong&gt; Calculated based on input/output token counts and specific model pricing.&lt;/li&gt;
&lt;/ul&gt;








&lt;h2&gt;1. Setting Up Your Benchmarking Environment&lt;/h2&gt;





&lt;p id="p-rc_9231198f56807c04-32"&gt;LLMeter is built for modern Python environments (3.10+). For the fastest setup, we recommend using &lt;strong&gt;UV&lt;/strong&gt;, the high-performance Python package installer.&lt;sup&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;h3&gt;Installation&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Using UV for lightning-fast dependency management
uv pip install llmeter load_env plotly
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;Environment Configuration&lt;/h3&gt;

&lt;p&gt;You don’t want to hardcode your API keys. LLMeter works seamlessly with &lt;code&gt;.env&lt;/code&gt; files. Ensure your environment is prepared for the providers you intend to test (OpenAI, Anthropic, Bedrock, or DeepSeek).&lt;/p&gt;








&lt;h2&gt;2. Architecting Your Experiment&lt;/h2&gt;





&lt;p&gt;The beauty of LLMeter lies in its structured approach to testing. An "Experiment" in LLMeter consists of three main components:&lt;/p&gt;

&lt;h3&gt;The Endpoint &amp;amp; Payload&lt;/h3&gt;

&lt;p&gt;You define where the request is going and what it contains. For accurate TTFT measurements, always use &lt;strong&gt;streaming endpoints&lt;/strong&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;# Example: Setting up a GPT-4o-mini endpoint
endpoint = OpenAIEndpoint(
    model="gpt-4o-mini",
    api_key=os.getenv("OPENAI_API_KEY"),
    streaming=True
)
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;The Cost Model&lt;/h3&gt;

&lt;p&gt;Unlike generic load testers, LLMeter allows you to define a &lt;code&gt;CostModel&lt;/code&gt;. By providing the price per million tokens, the library does the math for you, allowing you to see the financial impact of your scaling decisions in real-time.&lt;/p&gt;








&lt;h2&gt;3. Running Multi-Client Load Tests&lt;/h2&gt;





&lt;p id="p-rc_9231198f56807c04-33"&gt;In a production environment, your LLM won't be handling one request at a time. LLMeter allows you to simulate &lt;strong&gt;concurrent clients&lt;/strong&gt;.&lt;sup&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;In our testing, we found that running a sequential step test provides the most insight:&lt;/p&gt;

&lt;ol start="1"&gt;
&lt;li&gt;
&lt;strong&gt;Baseline:&lt;/strong&gt; 1 client for 10 seconds.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Ramp-up:&lt;/strong&gt; 3 clients for 10 seconds.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Stress:&lt;/strong&gt; 10+ clients to find the "breaking point" where the provider begins rate-limiting or latency spikes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because LLMeter is built on Python’s &lt;code&gt;asyncio&lt;/code&gt;, it can handle a massive number of concurrent requests from a standard laptop without the hardware becoming the bottleneck.&lt;/p&gt;








&lt;h2&gt;4. Visualizing Performance with Plotly&lt;/h2&gt;





&lt;p id="p-rc_9231198f56807c04-34"&gt;Data in a terminal is hard to digest. LLMeter’s integration with &lt;strong&gt;Plotly&lt;/strong&gt; transforms raw logs into interactive HTML reports.&lt;sup&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Key visualizations include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TTFT vs. Number of Clients:&lt;/strong&gt; Watch how the "wait time" increases as your application scales.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;TPS Histograms:&lt;/strong&gt; Identify if your model provides consistent speed or if there are frequent "stalls."&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Error Rate Charts:&lt;/strong&gt; Track 429 (Rate Limit) errors to determine if you need to request a quota increase from your provider.&lt;/li&gt;
&lt;/ul&gt;








&lt;h2&gt;5. Taking Control: The Real-Time Dashboard&lt;/h2&gt;





&lt;p&gt;One limitation of the standard LLMeter library is that it primarily provides post-test results. To solve this, we’ve developed a &lt;strong&gt;Minimalist Live Dashboard&lt;/strong&gt; using Python.&lt;/p&gt;

&lt;h3&gt;Why a Live Dashboard?&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instant Feedback:&lt;/strong&gt; See the TPS and Cost update every second.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Safety Switch:&lt;/strong&gt; If you notice a model is hallucinating or costs are spiking unexpectedly, you can kill the test immediately.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Stakeholder Demos:&lt;/strong&gt; It’s much more impactful to show a live-updating graph of "Tokens Per Second" than a static CSV file.&lt;/li&gt;
&lt;/ul&gt;








&lt;h2&gt;Conclusion: Data-Driven AI Engineering&lt;/h2&gt;





&lt;p&gt;Choosing an LLM based on a leaderboard is a starting point but benchmarking it against &lt;em&gt;your&lt;/em&gt; specific prompts and &lt;em&gt;your&lt;/em&gt; expected user load is essential. LLMeter provides the framework; the insights it generates will save you from costly production bottlenecks.&lt;/p&gt;

&lt;h3&gt;Resources &amp;amp; Further Learning&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full Video Tutorial:&lt;/strong&gt; &lt;a href="https://www.youtube.com/watch?v=hBM44GNjURA" rel="noreferrer noopener"&gt;Watch the Hands-on Walkthrough&lt;/a&gt;
&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Source Code:&lt;/strong&gt; Visit the &lt;a href="https://github.com/QAInsights" rel="noreferrer noopener"&gt;QAInsights GitHub&lt;/a&gt; for the custom dashboard script.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Official Tool:&lt;/strong&gt; Explore the &lt;a href="https://github.com/awslabs/llmeter" rel="noreferrer noopener"&gt;AWS Labs LLMeter Repo&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Are you ready to stop guessing and start measuring? Download LLMeter today and baseline your AI stack.&lt;/strong&gt;&lt;/p&gt;





</description>
      <category>ai</category>
      <category>testing</category>
      <category>performance</category>
      <category>llm</category>
    </item>
    <item>
      <title>Proof of Humanity™</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Thu, 02 Apr 2026 16:56:37 +0000</pubDate>
      <link>https://dev.to/qainsights/proof-of-humanity-198m</link>
      <guid>https://dev.to/qainsights/proof-of-humanity-198m</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/aprilfools-2026"&gt;DEV April Fools Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;To prove you're human, you must assemble Flätpack furniture.&lt;br&gt;
One step is irrelevant. Robots cannot detect irony.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://kilo-challenge-8914.d.kiloapps.io/" rel="noopener noreferrer"&gt;https://kilo-challenge-8914.d.kiloapps.io/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/QAInsights/kilo-challenge" rel="noopener noreferrer"&gt;https://github.com/QAInsights/kilo-challenge&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ How I Built It
&lt;/h2&gt;

&lt;p&gt;I approached this project the way I tackle any complex system: break it down, understand the constraints, and build upward with tight feedback loops. Instead of jumping straight into coding, I started by mapping the experience I wanted users to have. From there, every technical decision flowed naturally.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍 1. Defining the Core Problem
&lt;/h3&gt;

&lt;p&gt;Before writing a single line of code, I clarified the “why.” What should this tool &lt;em&gt;feel&lt;/em&gt; like? What friction should it remove? What would make someone say, “Oh, that’s clever”?&lt;br&gt;&lt;br&gt;
This early framing helped me avoid feature creep and stay anchored to a crisp user experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧩 2. Designing the Architecture
&lt;/h3&gt;

&lt;p&gt;Once the problem was clear, I sketched the system architecture—data flow, state transitions, and the boundaries between components. I treated it like a mini system‑design exercise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What should run locally vs. remotely
&lt;/li&gt;
&lt;li&gt;How to keep the interface responsive
&lt;/li&gt;
&lt;li&gt;How to ensure the tool remains extensible
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This step saved me hours later because every component had a clear responsibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚙️ 3. Building the Core Logic
&lt;/h3&gt;

&lt;p&gt;With the architecture locked in, I implemented the core functionality. I built it incrementally, validating each piece before moving on. This iterative approach made debugging almost trivial and kept the project moving smoothly.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎨 4. Crafting the User Experience
&lt;/h3&gt;

&lt;p&gt;A tool is only as good as how it feels to use. I refined the UI/UX with small but meaningful touches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear feedback loops
&lt;/li&gt;
&lt;li&gt;Minimal cognitive load
&lt;/li&gt;
&lt;li&gt;Fast, predictable interactions
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted the tool to feel like something &lt;em&gt;I&lt;/em&gt; would enjoy using every day.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧪 5. Testing Like a User, Not a Developer
&lt;/h3&gt;

&lt;p&gt;I tested the project in real‑world scenarios—switching contexts, trying edge cases, and intentionally breaking things. This surfaced subtle issues that wouldn’t appear in a controlled environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  🚀 6. Polishing and Shipping
&lt;/h3&gt;

&lt;p&gt;Once the core was solid, I focused on polish:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cleaned up the codebase
&lt;/li&gt;
&lt;li&gt;Improved performance
&lt;/li&gt;
&lt;li&gt;Added small quality‑of‑life improvements
&lt;/li&gt;
&lt;li&gt;Wrote documentation that future‑me would appreciate
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Shipping wasn’t the end—it was the beginning of iteration.&lt;/p&gt;




&lt;p&gt;If you want, I can also help you write the &lt;strong&gt;“What I Learned”&lt;/strong&gt;, &lt;strong&gt;“Challenges I Faced”&lt;/strong&gt;, or &lt;strong&gt;“Future Improvements”&lt;/strong&gt; sections so your post looks complete and competition‑ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prize Category
&lt;/h2&gt;

&lt;p&gt;HTCPCP IYKYK&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>418challenge</category>
      <category>showdev</category>
    </item>
    <item>
      <title>GitHub Copilot CLI Challenge: bt: Modern BLE CLI Tool</title>
      <dc:creator>NaveenKumar Namachivayam ⚡</dc:creator>
      <pubDate>Sun, 15 Feb 2026 21:43:47 +0000</pubDate>
      <link>https://dev.to/qainsights/github-copilot-cli-challenge-bt-modern-ble-cli-tool-55lk</link>
      <guid>https://dev.to/qainsights/github-copilot-cli-challenge-bt-modern-ble-cli-tool-55lk</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-01-21"&gt;GitHub Copilot CLI Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;There is no clean, simple, cross‑platform, developer‑friendly BLE CLI exists today. Linux has &lt;code&gt;bluetoothctl&lt;/code&gt;, macOS has no official CLI, and Windows exposes only low‑level PowerShell APIs. &lt;/p&gt;

&lt;p&gt;Developers working with BLE devices — especially ESP32‑based prototypes — lack a simple, unified, ergonomic CLI. My goal is to create a minimal, ergonomic, script‑friendly CLI for scanning, connecting, and interacting with BLE devices. &lt;/p&gt;

&lt;p&gt;📝 Repo: &lt;a href="https://github.com/QAInsights/bt" rel="noopener noreferrer"&gt;https://github.com/QAInsights/bt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faixhbyjm0osgw5pma581.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faixhbyjm0osgw5pma581.JPG" alt="Copilot Connecting" width="800" height="1067"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://www.youtube.com/watch?feature=player_embedded&amp;amp;v=EJqVMSVTeJM" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/http%3A%2F%2Fimg.youtube.com%2Fvi%2FEJqVMSVTeJM%2F0.jpg" alt="IMAGE ALT TEXT HERE" width="480" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq0oj2pvxa95vcmglzdlx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq0oj2pvxa95vcmglzdlx.png" alt="Scan" width="764" height="718"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fisyxx43njmn0g5lhzbzf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fisyxx43njmn0g5lhzbzf.png" alt="Waves" width="800" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5y4j52v2m5sx4i9osee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5y4j52v2m5sx4i9osee.png" alt="Web" width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My Experience with GitHub Copilot CLI
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;bt&lt;/code&gt; was built using the &lt;code&gt;copilot&lt;/code&gt; which acted as a copilot for this project. It helped me in brainstorming, planning, rapid prototyping using natural language processing even with my typos :) &lt;/p&gt;

&lt;p&gt;🤖 &lt;code&gt;copilot&lt;/code&gt; helped in no context switching, focused development, acting as a pair programmer, testing, and more.&lt;/p&gt;

&lt;p&gt;As a beginner in Bluetooth modules, it helped in navigating the docs, debugging, beautifying the output, and testing it. Without leaving the ide , it helped in polishing the whole project. &lt;/p&gt;

&lt;p&gt;Here is how it started:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2pa1gtefspmr6d7c8b8p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2pa1gtefspmr6d7c8b8p.png" alt="How it started" width="800" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and here is how it ended:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdg0ht5v7zuk70pdssnl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdg0ht5v7zuk70pdssnl.png" alt="How it ended" width="800" height="196"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/QAInsights/bt" rel="noopener noreferrer"&gt;https://github.com/QAInsights/bt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faixhbyjm0osgw5pma581.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faixhbyjm0osgw5pma581.JPG" alt="Copilot Connecting" width="800" height="1067"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqgaxpyted0pmupo3fjs3.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqgaxpyted0pmupo3fjs3.JPG" alt="Copilot OLED" width="800" height="1067"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>cli</category>
      <category>githubcopilot</category>
    </item>
  </channel>
</rss>
