The Agentic Concept Series
- Agentic Chunking LangChain RAG
? DEVIKA, the AI Software Engineer
- CrewAI, an AI agent orchestration framework
...A practice run I did a month? two months ago... I kept putting it off and now I'm finally posting it.. but...meanwhile Devin went and launched? ;D ㅋㅋㅋ
The conclusion first?!
Devin, Open Devin, and DEVIKA — none of them should be approached merely as 'AI software engineers.' The point is the agent. You should treat them as a case study of how various LLMs can be agentically MOE-stitched, run through the hands-on yourself, and explore how to apply this from your own vantage point.
DEVIKA
Devin(AI software engineer) officially launched after this 'open source' came out, and like Devin its main focus is code generation.
A similar model is Open Devin.
Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI.
GitHub - stitionai/devika: Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break th
Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective...
github.com
Key features
- Supports Claude 3, GPT-4, GPT-3.5, and local LLMs via Ollama
- Advanced AI planning and reasoning capabilities
- Contextual keyword extraction for focused research
- Seamless web browsing and information gathering
- Code writing in multiple programming languages
- Dynamic agent state tracking and visualization
- Natural-language interaction via the chat interface
- Project-based organization and management
- An extensible architecture for adding new features and integrations
Architecture
- Agent Core:
Orchestrates the overall AI planning, reasoning and execution process. Communicates with various sub-agents.
Orchestrates the overall AI planning, reasoning, and execution process. Communicates with the various sub-agents. - Agents:
Specialized sub-agents that handle specific tasks like planning, research, coding, patching, reporting etc.
Specialized sub-agents that handle specific tasks such as planning, research, coding, patching, reporting, and so on. - Language Models:
Leverages large language models (LLMs) like Claude, GPT-4, GPT-3 for natural language understanding and generation.
Leverages large language models (LLMs) like Claude, GPT-4, and GPT-3 for natural language understanding and generation. - Browser Interaction:
Enables web browsing, information gathering, and interaction with web elements.
Enables web browsing, information gathering, and interaction with web elements. - Project Management:
Handles organization and persistence of project-related data.
Handles the organization and persistence of project-related data. - Agent State Management:
Tracks and persists the dynamic state of the AI agent across interactions.
Tracks and persists the dynamic state of the AI agent across interactions. - Services:
Integrations with external services like GitHub, Netlify for enhanced capabilities.
Integrates with external services such as GitHub and Netlify for enhanced capabilities. - Utilities:
Supporting modules for configuration, logging, vector search, PDF generation etc.
Supporting modules for configuration, logging, vector search, PDF generation, and so on.
Agents
Devika's cognitive abilities are powered by a collection of specialized sub-agents. Each agent is implemented as a separate Python class and communicates with the underlying LLM through a prompt template defined in Jinja2* format.
*Jinja2: a templating engine for Python. It has full Unicode support, an integrated optional sandboxed execution environment, and a BSD license that allows the software to be freely used, modified, and distributed.
- Planner : generates a step-by-step plan
- Researcher : extracts search keywords — gathers additional context from the user
- Coder : generates code — runs validation
- Action : maps the prompt to a matching action keyword
- Runner : executes code in a sandboxed environment — streams output in real time
- Feature : implements features — performs incremental testing
- Patcher : debugs based on error messages — identifies the cause — proposes fixes
- Reporter : generates a report including design — instructions — APIs
- Decision : maps to a specific feature such as browser interaction or git clone — runs the function with the provided arguments
Hands-on
1. Setting up the development environment
- Open LLM: install and run Ollama
- Install uv -> download. installs the Python package manager; replaces the pip, pip-tools, and virtualenv commands
- Install bun -> download for the JavaScript runtime
# On macOS and Linux.
curl -LsSf <https://astral.sh/uv/install.sh> | sh
# On Windows.
powershell -c "irm <https://astral.sh/uv/install.ps1> | iex"
# With pip.
pip install uv
2. Pull the DEVIKA code from GitHub
1. git clone <https://github.com/stitionai/devika.git>
2. cd devika #해당 폴더로 이동
3. Set up the virtual environment: uv venv, pip
1) Create
# Create a virtual environment at .venv.
uv venv
2) Activate
# On macOS and Linux.
source .venv/bin/activate
# On Windows.
.venv\\Scripts\\activate
3) Install packages
# On macOS and Linux.
uv pip install -r requirements.txt
playwright install --with-deps # installs browsers in playwright (and their deps) if required
4. Prepare the necessary API keys
1) Search
- Bing: https://www.microsoft.com/en-us/bing/
2) LLM
- Claude: https://console.anthropic.com/setting
3) Filling in the API keys
- Devika/config.toml
5. Turn ON
1) Run the server
python3 devika.py
2) Run the client
cd ui/
bun install
bun run dev
6. Run
1) Go to http://127.0.0.1:3000
2) Pick the model
3) Configure the search engine
- This is one of the main tools the AI engineer uses to research and share related information with the user.
- For reference, DuckDuckGo doesn't require a separate API key.
4) Specify a new project
- The code written by the AI engineer is saved in a folder named after that project.
4) Write the prompt
5) Process and result
1) Real-time review
- Inside the 'inner browser area', the AI engineer shares web search results in between research steps and reviews its (the AI's) own research findings and plans
2) Code generation result
- /[project name]/data/projects : under this folder, in step 3 a new folder is created using the new project name and the corresponding code is generated inside it
3) Real-time code review
- You can also write messages right inside the code the AI engineer wrote and communicate with it directly in real time!
- A file generated when I asked it to write up how to use the code
Review
I really like the interface. What I find personally striking is the inter-agent workflow that lets a person review and give feedback in between steps, not just on the final result.
It's still a bit early to post the actual outputs. Like with human work, the code-generation result is not the AI engineer's issue per se. It's a coordination problem between the two parties communicating. It's already shaping up into a useful tool. A lot of the examples around tackle Steam games... but personally, since that's not really my use case, I have been testing things that might apply to daily life, planning work, or studying coding. If something good comes out of it I'll post an update.
