March 20, 2025
Cline is an autonomous coding agent plugin for VS Code. It's currently the best open-source equivalent to Cursor. > TL;DR: **Cline is a beast with Claude—but getting results from local models is more challenging.** I put Cline to the test on a real-world project: **dockerizing my Retrieval-Augmented Generation (RAG) system**—a FastAPI app that runs on Google Cloud. Here's the video version of this blog post: <iframe width="800" height="450" src="https://www.youtube.com/embed/f33Fw6NiPpw" frameborder="0" allowfullscreen></iframe> ### 🎥 Watch the Video https://youtu.be/f33Fw6NiPpw ### 🪷 Get the Source Code https://github.com/zazencodes/zazenbot-5000 ## 🐋 The Demo: Dockerizing a Python API The goal today is to take a Python function and Dockerize it: - Add an API layer using FastAPI - Build a Dockerfile and Docker Compose - Update my `README.md` with usage instructions Long story short: Cline did this wonderfully. But all great victories are earned through great struggle. In this post I'll first talk through my *failed attempts* at using Cline with Ollama local models like **Gemma 2** and **Qwen 2.5-coder**, and then walk you through the grand success that I had with **Claude 3.7**. ## 🧪 Installing Cline & First Tests with Local Models via Ollama [](https://www.youtube.com/watch?v=f33Fw6NiPpw&t=48) Setting up Cline couldn't be easier. I opened my project ([`ZazenBot 5000`](https://github.com/zazencodes/zazenbot-5000)) in VS Code, popped open the **Extensions Marketplace**, and installed Cline. Once installed, I selected a model to use with Cline. For the first attempt, I picked **Ollama**, running **Gemma 2**—a 9B parameter model. I used the default connection information for Ollama. I just had to make sure that it was running on my computer (this is done outside of VS Code). Then I gave Cline the prompt: ```text Use a fastapi in order to make @/zazenbot5k/query_rag_with_metadata available with HTTP. Using a POST request sending the question in the data Dockerize this application. Use docker compose and create a Dockerfile as well. Update the README with instructions to run it. ``` Using Cline's prompt context system, I attached the key file: `query_rag_with_metadata.py` (as seen above with the `@` notation). > This file contains the `process_question()` function—the core of the app that takes a user question, enriches it with metadata and a timestamp, and returns a string. **This is the function that I want to make available with an API and then Dockerize.** Cline took in all that info and got to work. <img width="800" src="/img/blog/2025_03_21_cline_with_ollama_and_claude/cline_gemma_test_zazencodes.png"> **The result?** At 100% GPU usage, system memory maxed out, and Cline was reduced to a crawl. Gemma 2 --- at 9B params --- struggled to run on my system. My GPU monitoring CLI tool `nvtop` showed my system under duress as Cline worked. And in the end, the task failed to complete. It seems that Cline and Gemma 2 just don't get along well. On the bright side, I had time to examine the prompts that Cline was writing: <img width="800" src="/img/blog/2025_03_21_cline_with_ollama_and_claude/cline_prompt_formatting_zazencodes.png"> I loved the use of **HTML tags to structure prompt sections**, and the particularly amazing thing I noticed was how Cline uses **HTML attributes to specify metadata** about the section. In the screenshot above, you can see it specify the file path `path="zazenbot5k/query_rag_with_metada?2.py"` in this way. How cool it that? ### 🔁 Swapping to Qwen 2.5-Coder Models #### Test #2: Qwen 0.5B I decided to switch things up and use a local model that was intended for coding. So I installed and spun up a very tiny model: **Qwen 2.5 Coder --- 0.5B**. I queued up the same prompt and *awwwway we went*. My laptop had no problem running this one as it started outputting a string of seemingly intelligent code and instructions. However it was in fact doing noting whatsoever. **Cline wasn't making any changes to my code**, and the outputs --- on closer investigation --- were largely incomprehensible. This was likely due to the model failing to generate outputs in the format the Cline was expecting. #### Test #3: Qwen 14B The final Ollama model that I tried was the **14B variant of Qwen 2.5 Coder**. As expected, my GPU maxed out again. The model slowly started generating output, but Cline never successfully created any of the files I asked for. At this point, the writing was on the wall: > Local models + "limited" hardware --- M4 chip 24gb RAM --- are not powerful enough for the autonomous, multi-file, full-stack generation Cline is built to do. ## ⚡️ Switching to Claude 3.7 — The Game Changer [](https://www.youtube.com/watch?v=f33Fw6NiPpw&t=594) Once I swapped to **Claude 3.7 Sonnet** via the Anthropic API, it was like unlocking Cline's final form. I kicked off the *exact same prompt* as before—no changes whatsoever—and Cline immediately sprang into action. The difference was obvious. And awesome: - **Context window tracking** came online - **Output tokens logging** started streaming in real time - **Input/output costs** started updating And most importantly: - **Files were being created and updated autonomously** --- and it was working at blazing speed And yes, Claude is expensive— as mentioned, **Cline showed me the cost in real-time**. - **$0.10 in API charges** before I could blink. - **$0.30 in API charges** by the time everything was done. But that cost came with excellent results (on the first try): - Generated a proper `Dockerfile` - Created a `docker-compose.yml` - Wrote a FastAPI wrapper around the `process_question()` function - Added a `README.md` section with example usage - Built a test script to verify the endpoint - Handled environment variables and credential mounting - And did **all of it** autonomously ### 🎯 Final Outputs Here's a look at the generated `Dockerfile`: ```dockerfile FROM python:3.10-slim WORKDIR /app # Copy requirements file COPY requirements.txt . # Install system dependencies RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/* # Install dependencies RUN pip install --no-cache-dir -r requirements.txt RUN pip install --no-cache-dir fastapi uvicorn # Copy application code COPY zazenbot5k/ /app/zazenbot5k/ # Set Python path to include the application directory ENV PYTHONPATH=/app WORKDIR /app/zazenbot5k EXPOSE 8000 CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] ``` And the corresponding `docker-compose.yml`: ```yaml version: '3.8' services: zazenbot-api: build: context: . dockerfile: Dockerfile ports: - "8000:8000" environment: - GCP_PROJECT_ID=${GCP_PROJECT_ID} - GCP_LOCATION=${GCP_LOCATION} - GOOGLE_APPLICATION_CREDENTIALS=/app/credentials/google-credentials.json volumes: - ./credentials:/app/credentials restart: unless-stopped healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 start_period: 10s ``` And here's the FastAPI application `app.py`: ```python import logging from fastapi import FastAPI, HTTPException, Request from fastapi.responses import PlainTextResponse from pydantic import BaseModel from query_rag_with_metadata import process_question # Configure logging logging.basicConfig( level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", ) logger = logging.getLogger(__name__) app = FastAPI( title="ZazenBot 5000 API", description="API for querying RAG with metadata", version="1.0.0", ) class QuestionRequest(BaseModel): question: str @app.post("/query", response_class=PlainTextResponse) async def query(request: QuestionRequest): """ Process a question through the RAG system and enhance with metadata and timestamp """ try: logger.info(f"Received question: {request.question}") response = process_question(request.question) return response except Exception as e: logger.error(f"Error processing question: {e}") raise HTTPException(status_code=500, detail=str(e)) @app.get("/health") async def health(): """ Health check endpoint """ return {"status": "healthy"} if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) ``` ### 😤 Running Cline's Dockerized app Running this API with Docker was a pretty special moment. **It worked flawlessly.** I ran: ```bash docker-compose up -d ``` And boom—everything built and ran clean. The `/health` endpoint returned `{"status": "healthy"}`, and my `POST /query` endpoint responded with it's RAG glory. Cline setup the proper environment variables for my Docker app and even mounted my Google application credentials with a volume, exactly like I would have done. ical. > There was one little problem: I was initially missing credentials. But since Cline handled things nicely by mounting them in a volume, this fix was trivial. It took me about 10 seconds. ## 🧠 Final Verdict on Cline — Powerful, But Practical? [](https://www.youtube.com/watch?v=f33Fw6NiPpw&t=1097) After watching Cline go from *zero to working API* using Claude 3.7, I was thoroughly impressed. It handled everything from environment setup to testing, with zero manual coding on my part. But here's the truth: **Cline is pretty damn expensive.** - **Claude comes at a cost**<br> I paid roughly **$0.30** for this one task. That's whacked. Sure, it was worthwhile in this case. But this doesn't scale out to real-world workflows. - **Local Models Fall Short**<br> Despite Cline supporting any model backend (Ollama, etc), none of the local models came close. Either they crashed, lagged, or suffered from incomprehensible output. I believe that further testing would yield some working results, but the speed would likely continue to be a limiting factor. ### 🆚 Cursor vs Cline: What's Better? Cline absolutely blew me away when paired with Claude --- but I'm not convinced that it's better than **Cursor**. Cursor offers a better UX, and is likely to be much cheaper overall, making it **more practical for daily development**. ### 👋 Wrapping Up If you're a builder, experimenter, or just curious about agent-powered development, give Cline a spin. I love it, and it's a powerful tool to have in my back pocket. 📌 **My newsletter is for AI Engineers. I only send one email a week when I upload a new video.** <iframe src="https://zazencodes.substack.com/embed" width="480" height="320" style="border:1px solid #EEE; background:white;" frameborder="0" scrolling="no"></iframe> As a bonus: I'll send you my *Zen Guide for Developers (PDF)* and access to my *2nd Brain Notes (Notion URL)* when you sign up. ### 🚀 Level-up in 2025 If you're serious about AI-powered development, then you'll enjoy my **new AI Engineering course** which is live on [ZazenCodes.com](https://zazencodes.com/courses). It covers: - **AI engineering fundamentals** - **LLM deployment strategies** - **Machine learning essentials** <br> <img width="800" src="/img/ai-engineer-roadmap/ai_engineer_roadmap_big_card.png"> Thanks for reading and happy coding!
Cline is an autonomous coding agent plugin for VS Code. It’s currently the best open-source equivalent to Cursor.
TL;DR: Cline is a beast with Claude—but getting results from local models is more challenging.
I put Cline to the test on a real-world project: dockerizing my Retrieval-Augmented Generation (RAG) system—a FastAPI app that runs on Google Cloud.
Here’s the video version of this blog post:
https://github.com/zazencodes/zazenbot-5000
The goal today is to take a Python function and Dockerize it:
README.md
with usage instructionsLong story short: Cline did this wonderfully.
But all great victories are earned through great struggle.
In this post I’ll first talk through my failed attempts at using Cline with Ollama local models like Gemma 2 and Qwen 2.5-coder, and then walk you through the grand success that I had with Claude 3.7.
Setting up Cline couldn’t be easier.
I opened my project (ZazenBot 5000
) in VS Code, popped open the Extensions Marketplace, and installed Cline.
Once installed, I selected a model to use with Cline. For the first attempt, I picked Ollama, running Gemma 2—a 9B parameter model.
I used the default connection information for Ollama. I just had to make sure that it was running on my computer (this is done outside of VS Code).
Then I gave Cline the prompt:
Use a fastapi in order to make
@/zazenbot5k/query_rag_with_metadata
available with HTTP. Using a POST request sending the question in the data
Dockerize this application. Use docker compose and create a Dockerfile as well.
Update the README with instructions to run it.
Using Cline’s prompt context system, I attached the key file: query_rag_with_metadata.py
(as seen above with the @
notation).
This file contains the
process_question()
function—the core of the app that takes a user question, enriches it with metadata and a timestamp, and returns a string. This is the function that I want to make available with an API and then Dockerize.
Cline took in all that info and got to work.
The result?
At 100% GPU usage, system memory maxed out, and Cline was reduced to a crawl.
Gemma 2 — at 9B params — struggled to run on my system. My GPU monitoring CLI tool nvtop
showed my system under duress as Cline worked.
And in the end, the task failed to complete. It seems that Cline and Gemma 2 just don’t get along well.
On the bright side, I had time to examine the prompts that Cline was writing:
I loved the use of HTML tags to structure prompt sections, and the particularly amazing thing I noticed was how Cline uses HTML attributes to specify metadata about the section.
In the screenshot above, you can see it specify the file path path="zazenbot5k/query_rag_with_metada?2.py"
in this way. How cool it that?
I decided to switch things up and use a local model that was intended for coding.
So I installed and spun up a very tiny model: Qwen 2.5 Coder — 0.5B.
I queued up the same prompt and awwwway we went.
My laptop had no problem running this one as it started outputting a string of seemingly intelligent code and instructions.
However it was in fact doing noting whatsoever. Cline wasn’t making any changes to my code, and the outputs — on closer investigation — were largely incomprehensible.
This was likely due to the model failing to generate outputs in the format the Cline was expecting.
The final Ollama model that I tried was the 14B variant of Qwen 2.5 Coder.
As expected, my GPU maxed out again.
The model slowly started generating output, but Cline never successfully created any of the files I asked for.
At this point, the writing was on the wall:
Local models + “limited” hardware — M4 chip 24gb RAM — are not powerful enough for the autonomous, multi-file, full-stack generation Cline is built to do.
Once I swapped to Claude 3.7 Sonnet via the Anthropic API, it was like unlocking Cline’s final form.
I kicked off the exact same prompt as before—no changes whatsoever—and Cline immediately sprang into action.
The difference was obvious. And awesome:
And most importantly:
And yes, Claude is expensive— as mentioned, Cline showed me the cost in real-time.
But that cost came with excellent results (on the first try):
Dockerfile
docker-compose.yml
process_question()
functionREADME.md
section with example usageHere’s a look at the generated Dockerfile
:
FROM python:3.10-slim
WORKDIR /app
# Copy requirements file
COPY requirements.txt .
# Install system dependencies
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir fastapi uvicorn
# Copy application code
COPY zazenbot5k/ /app/zazenbot5k/
# Set Python path to include the application directory
ENV PYTHONPATH=/app
WORKDIR /app/zazenbot5k
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
And the corresponding docker-compose.yml
:
version: '3.8'
services:
zazenbot-api:
build:
context: .
dockerfile: Dockerfile
ports:
- "8000:8000"
environment:
- GCP_PROJECT_ID=${GCP_PROJECT_ID}
- GCP_LOCATION=${GCP_LOCATION}
- GOOGLE_APPLICATION_CREDENTIALS=/app/credentials/google-credentials.json
volumes:
- ./credentials:/app/credentials
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
And here’s the FastAPI application app.py
:
import logging
from fastapi import FastAPI, HTTPException, Request
from fastapi.responses import PlainTextResponse
from pydantic import BaseModel
from query_rag_with_metadata import process_question
# Configure logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
logger = logging.getLogger(__name__)
app = FastAPI(
title="ZazenBot 5000 API",
description="API for querying RAG with metadata",
version="1.0.0",
)
class QuestionRequest(BaseModel):
question: str
@app.post("/query", response_class=PlainTextResponse)
async def query(request: QuestionRequest):
"""
Process a question through the RAG system and enhance with metadata and timestamp
"""
try:
logger.info(f"Received question: {request.question}")
response = process_question(request.question)
return response
except Exception as e:
logger.error(f"Error processing question: {e}")
raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health():
"""
Health check endpoint
"""
return {"status": "healthy"}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Running this API with Docker was a pretty special moment. It worked flawlessly.
I ran:
docker-compose up -d
And boom—everything built and ran clean. The /health
endpoint returned {"status": "healthy"}
, and my POST /query
endpoint responded with it’s RAG glory.
Cline setup the proper environment variables for my Docker app and even mounted my Google application credentials with a volume, exactly like I would have done.
ical.
There was one little problem: I was initially missing credentials. But since Cline handled things nicely by mounting them in a volume, this fix was trivial. It took me about 10 seconds.
After watching Cline go from zero to working API using Claude 3.7, I was thoroughly impressed. It handled everything from environment setup to testing, with zero manual coding on my part.
But here’s the truth: Cline is pretty damn expensive.
Claude comes at a cost
I paid roughly $0.30 for this one task. That’s whacked. Sure, it was worthwhile in this case. But this doesn’t scale out to real-world workflows.
Local Models Fall Short
Despite Cline supporting any model backend (Ollama, etc), none of the local models came close. Either they crashed, lagged, or suffered from incomprehensible output. I believe that further testing would yield some working results, but the speed would likely continue to be a limiting factor.
Cline absolutely blew me away when paired with Claude — but I’m not convinced that it’s better than Cursor.
Cursor offers a better UX, and is likely to be much cheaper overall, making it more practical for daily development.
If you’re a builder, experimenter, or just curious about agent-powered development, give Cline a spin.
I love it, and it’s a powerful tool to have in my back pocket.
📌 My newsletter is for AI Engineers. I only send one email a week when I upload a new video.
As a bonus: I’ll send you my Zen Guide for Developers (PDF) and access to my 2nd Brain Notes (Notion URL) when you sign up.
If you’re serious about AI-powered development, then you’ll enjoy my new AI Engineering course which is live on ZazenCodes.com.
It covers:
Thanks for reading and happy coding!