Building with Claude #2

From Desktop to CLI and back: what I learned running both Claude tools

February 2026 12 min read

Anthropic now has two ways to work with Claude on real tasks: Claude Desktop (with its new Cowork feature) and Claude Code (the terminal CLI). Both are powerful. Both can use MCP servers, read files, and execute multi-step work autonomously.

But they feel very different in practice. I've used both extensively: Desktop before Cowork existed, Claude Code since October 2025, and Cowork since it launched in January 2026. Here's what I've learned.

In this article:

The origin: a 200-page contract and a lot of manual threads

This story starts at Lexacon.AI, the construction document intelligence company I co-founded. We needed to extract every clause from a 200-page FIDIC construction contract so we can run it through our AI contract risk analysis tool.

If you're not familiar with construction contracts: they typically reference a standard form (like the FIDIC Red Book or Silver Book) that contains around 200 standard clauses. The actual project contract then modifies specific clauses, maybe 30 or 40 of them, while the rest remain as-is. The challenge is understanding the complete picture: all 200 clauses, which ones were modified, how they changed, and what the modifications mean for project risk.

At the time we were using Claude Desktop, pre-Cowork, so no agentic capabilities, no autonomous execution. The workflow was painful:

  1. Plan the extraction in one thread (which clauses to extract, what format)
  2. Process 10-20 clauses per thread because the context window would fill up and responses would degrade
  3. Start a new thread, re-explain the context, continue extracting
  4. Track modifications between the standard form and the project-specific changes across yet more threads
  5. Reassemble everything into a structured JSON file that could go into a database
  6. Validate manually because errors accumulated across threads, missed clauses, inconsistent formatting, modification tracking that didn't line up

It worked. We got the output we needed. But the manual thread management was brutal. Every time we hit a context limit we had to restart with fresh context, re-explain the schema, and hope the new thread picked up where the last one left off. The validation pass at the end regularly caught errors that crept in during thread transitions.

That was the moment I started looking for something better.

The switch: what the CLI actually unlocked

Claude Code changed everything about how I work with AI.

The first thing that hit me was the context management. The CLI handles its own context window, it compacts automatically when the conversation gets long, preserving the important parts and dropping the noise. No more manually starting new threads every 20 minutes. I could start a complex task and let it run.

But the real game changer was MCP servers.

MCP (Model Context Protocol) lets Claude connect to external tools and data sources. Claude Code, this is deeply integrated. We started connecting MCPs to everything:

  • NocoDB for database operations, Claude can query, insert, update records directly
  • n8n for workflow automation, Claude can create and manage automated workflows
  • AWS for cloud infrastructure
  • GitHub for repository management and to share our Claude Code environment to drive consistency in our team
  • OpenRouter so we can use the best model per function
  • Hunter and Lusha for contact enrichment

And then we built our own. For Lexacon, we created a custom MCP server that gives Claude direct access to our document processing pipeline: search, retrieval, analysis, all through structured tool calls to more quickly prototype features for clients with our real systems. Claude Code was the environment where we developed, tested, and iterated on that MCP server.

That contract extraction workflow? The one that took hours of manual thread management in Desktop? We automated it entirely. First using Claude Code sessions that could process the full document without losing context. Then we built it into Lexacon's product so it runs without Claude in the loop at all: automated clause extraction, modification tracking, structured output, no manual assembly.

The other thing I noticed immediately: running multiple sessions is effortless. I regularly have 2-3 Claude Code sessions open, each working on a different project. The CLI is lightweight, it's a terminal process, not an Electron app. It sends commands, gets responses, reads and writes files. My machine doesn't break a sweat. Even with local dev servers running alongside, there's no meaningful resource competition.

Cowork arrives: bridging the gap

When Anthropic launched Cowork in January 2026, I was genuinely curious. The main limitation of pre-Cowork Desktop, the inability to manage long-running tasks autonomously, was exactly what they were addressing.

And they did solve it. Cowork can now:

  • Compact context automatically, just like the CLI
  • Execute multi-step tasks without constant prompting
  • Break complex work into subtasks and coordinate them
  • Read and write files on your machine

If Cowork had existed when we were doing that contract extraction, it would have handled it far better than the old Desktop experience. The context management alone would have eliminated most of the thread-juggling pain.

But, and this matters if you're choosing between the two, it's noticeably slower in practice.

This isn't just my experience. GitHub issue #22543 documents a real problem: Cowork creates a VM bundle that can grow to 10GB, causing UI lag and response degradation over time. Multiple developers report performance getting worse within a single session. Even after clearing caches, the slowdown returns within minutes.

I've experienced this firsthand. In Claude Code, I can spin up a session, fire off a complex task, and responses come back fast. In Cowork, the same kind of task feels heavier. The GUI adds overhead, the Electron app consumes more resources, and when you're connected to several MCP servers, you feel it.

The honest comparison

After months of using both tools daily, here's where each one actually wins. Later in the article I've also included a Claude v OpenAI v Google comparison, and short guide on when to switch between CoWork and Code CLI.

Claude Code CLI wins at

Multi-session workflows. I run 2-3 CLI sessions simultaneously on different projects. Each is a lightweight terminal process. Some developers run 10-15 parallel instances. Try that in Desktop.

MCP flexibility. CLI supports all three MCP transports (HTTP, SSE, stdio), can act as an MCP server itself, supports OAuth, and lets teams share configurations via git. Desktop only supports stdio via a JSON config file.

Overnight autonomous work. CLI can run in loops: start a task before bed, wake up to results. Cowork requires the Desktop app to stay open on your machine.

Building and testing MCPs. If you're developing MCP servers (which I recommend for any serious automation), the CLI is where you do it. The feedback loop is faster, debugging is easier, and you can test tool calls directly.

Resource efficiency. A CLI session is a terminal process. An Electron app running a VM, a GUI, and multiple MCP connections is not.

Scripting and automation. CLI has --print mode for non-interactive use and an Agent SDK for building custom tools. Desktop doesn't have equivalents.

Cowork wins at

Accessibility. If you don't live in the terminal, Cowork is dramatically easier to start with. Open the app, describe your task, watch it work.

Visual feedback. Side-by-side diffs, visual file previews, interactive elements. If you're a visual thinker, this matters.

Non-coding knowledge work. Document processing, research, organization. Cowork was designed for this. The announcement explicitly positions it for knowledge work beyond coding.

Lower barrier to entry. No installation steps, no terminal knowledge, no configuration files. It's included with your Claude subscription.

Where this is heading

Cowork is in research preview. It's explicitly experimental. The performance issues are real but likely temporary: Anthropic is actively working on them, and the architecture improvements they've already shipped (context compaction, autonomous execution) show they understand the gap.

I expect Cowork will eventually match the CLI's speed and efficiency. The underlying model is the same. The capabilities are converging. What's different today is the overhead of the GUI layer and the maturity of the runtime.

But right now, today, if you're building real systems (automation, integrations, multi-project workflows) Claude Code CLI is the more capable tool. And if you're exploring what AI can do for your work without setting up a terminal, Cowork is a genuinely useful starting point.

I use both. Cowork for quick document tasks and client demos. Claude Code for everything else. Same AI, same files, different interfaces for different jobs.

Try it yourself

We packaged everything we've learned into a free starter kit that works in both Cowork and Claude Code. It includes an interactive walkthrough, a companion guide, and a sample project. Take the 2-minute quiz to get access.

If you want hands-on help building automation for your team, book a Discovery Workshop. We'll map your operations and show you what's possible in half a day.

If there's something specific you'd like us to compare or test between these tools, let us know.

What about the other tools?

This post is about the Anthropic ecosystem, but it's fair to ask: why Claude Code at all? The AI coding tool space moves fast. As of February 2026, most of the major players now support MCP servers, rules files, and some form of skills or reusable workflows. The feature gap between tools is narrower than it was six months ago. So the choice comes down to philosophy and fit more than raw capability.

Here's where things stand right now and what I've learned from trying a few of them.

The playing field has leveled

A year ago, Claude Code was one of the few tools with MCP support, a project-level rules system, and a skills architecture. That's no longer true.

OpenAI Codex CLI now has AGENTS.md (their equivalent of CLAUDE.md), a formal Skills system with SKILL.md files that use progressive disclosure, and full MCP support. It operates across interactive CLI, non-interactive exec mode, and IDE extension. The architecture leans more toward cloud-based sandboxes and async delegation than Claude Code's local-first approach. WaveSpeedAI's comparison puts it well: "Claude Code acts like a senior developer. Codex acts like a scripting-proficient intern. It is fast, minimal, and cheap." Different trade-offs for different workflows.

Google now has Gemini CLI (separate from Gemini Code Assist, their IDE plugin). It's open source, CLI-first, uses GEMINI.md for project context, supports MCP, and has a 1M token context window with Gemini 3 Pro. Weekly stable releases. When I tried Google's tooling earlier, I was using the IDE-centric Code Assist, and the MCP setup felt more involved than what I was used to. The CLI tool is a different experience. It's worth revisiting now that it's matured.

Cursor (v0.47 as of February) has sophisticated project rules via .cursor/rules/*.mdc with conditional triggers, mature MCP support, and features like Mission Control for monitoring parallel agents. No dedicated skills system, but the rules are flexible enough to encode workflows. It's the market leader for IDE-based AI coding, and many developers use Cursor for daily editing alongside Claude Code for autonomous multi-step work.

Windsurf had a turbulent year (the OpenAI acquisition collapsed, Google hired their founders) but kept shipping. Their Wave 13 update put them at #1 in LogRocket's February 2026 AI dev tool rankings. They now have Arena Mode (blind model comparison), Plan Mode, parallel multi-agent sessions, full MCP support, and recently added a .agents/skills/ directory for cross-platform skills. At $15/month it's the cheapest premium option.

Aider remains the open-source, model-agnostic choice. Git-native, works with 40+ models, excellent for structured refactors. The notable gap: it still doesn't have native MCP support (there's an open GitHub issue but nothing merged). It uses CONVENTIONS.md for project context but you have to load it explicitly. If you want full control and don't need MCP integrations, Aider is solid.

There's also an emerging cross-tool standard forming around AGENTS.md as a unified project instructions filename and SKILL.md as a portable skills format. Codex's documentation claims skills created for their platform run unchanged in Claude Code, Gemini CLI, Cursor, and others. I haven't verified that claim myself, but the direction is clear: these tools are converging on shared standards.

So why am I still on Claude Code?

Fair question, given the competition. A few honest reasons:

I started here and the switching cost is real. I have 16 rules files, 11 skills, and a CLAUDE.md that encodes my entire business operation. Could I port those to AGENTS.md for Codex or GEMINI.md for Gemini CLI? Probably, especially as the standards converge. But the Claude-specific ecosystem (the Anthropic model family, the Agent SDK, how skills interact with rules) is what I've built around. Migration isn't zero-effort.

The model quality for complex tasks is strong. On SWE-bench Verified, Claude scores 72.5% vs Codex's ~49% for complex bug-fixing tasks. For the kind of multi-step business workflows I run (research a company, enrich contacts, draft messages, check against policy rules, send via API) the model's reasoning quality matters more than speed. So when I'm building a prototype to do client validation or when I am scaling that up to go to production, quality is critical.

The non-coding use case. This is where my experience diverges from most comparisons. I use Claude Code to write software bu also use it to run business operations: CRM queries, email sending, lead scoring, proposal generation, pipeline management. The tools all support MCP now, but the Claude Code ecosystem (skills that orchestrate six-phase business workflows, rules that encode email sending gates and message quality standards) was designed with this use case in mind from early on. Most comparisons focus on coding benchmarks. My use case is closer to "AI-assisted ops."

Cowork for client demos. Having both the CLI and the Desktop GUI share configurations is genuinely useful. I run my business in the CLI and demo for clients in Cowork. No other vendor has that dual-interface approach with shared context.

I'd be lying if I said I wouldn't try Gemini CLI and Codex more seriously now that they've caught up on the fundamentals. The next time I start a greenfield project, I might run the same task across all three and see where the experience differs. If I do, I'll write about it.

Go deeper

I'm not going to rewrite the detailed comparisons that already exist. A few good ones from early 2026:

Start there if you want the full picture.

So when should you use which?

Here's the practical decision framework I use:

Start with Cowork if:

  • You're new to AI-assisted work and want to see what's possible
  • Your task is primarily document or research-based
  • You prefer a visual interface
  • You're doing a one-off task, not building a recurring workflow

Switch to Claude Code when:

  • You're connecting multiple MCP servers and need full control
  • You want to run parallel sessions on different projects
  • You're building automation that needs to run autonomously
  • Performance matters and you need fast responses on complex tasks
  • You're developing custom MCP servers or integrations
  • You want overnight or long-running task execution

Use both when:

  • You're prototyping in Cowork and scaling in CLI
  • Different team members have different technical comfort levels
  • Some tasks are visual (Cowork) and others are automated (CLI)

The good news: configurations are shared. CLAUDE.md files, MCP server configs, rules, skills, everything works in both tools. Nothing is wasted if you switch.

Want to know where you stand?

Take our AI readiness quiz. 10 questions, 2 minutes. You'll get a personalized recommendation for where to start with automation.