Renewal·마흔의 생활코딩

Agentic | MoE (Mixture of Experts)

February 25, 2024·4 min read

cover image

I previously wrote about Sayonara, Prompt Engineering. Prompts for an LLM are best known to the LLM itself, and each LLM has its own optimal hyper-parameters, so it's not really suitable for ordinary, everyday users to keep learning and matching all of that themselves every single time.

사요나라, 프롬프트 엔지니어링

open ai 가 쏘아올린 chat gpt 열풍으로.. 프롬프트 엔지니어이라는 새롭게 창직?된 직업군이 있다. 생성형 인공 지능(생성형 AI) 솔루션을 안내하여 원하는 결과를 생성하는 프롬프트 엔지니어링 프

normalstory.tistory.com

More importantly, OpenAI is not yet the standard for LLMs. This market is just at the introduction stage. Personally, I think OpenAI resembles a PDA phone right before the smartphone era. Its interface and ecosystem, in spite of the original intent, are closed and cloud-based, so they depend on an online environment. On top of that, it has not yet fused with personal devices. There are also many other issues like security.

Personally, I think Apple, which built a software ecosystem on a hardware base, or Google, Amazon, and Tesla, which are building hardware ecosystems on a software base, are likely to be the main players.

So in the middle of these mega-trends and the moves of the big brothers, what should the individual be preparing?!

Rather than optimizing prompts, you should be thinking about your own agent.

In that sense, there is a basic term you should know — namely MoE, mixture of experts. For detailed explanations there are plenty of articles by experts in Korea and abroad if you Google it, so I'll just briefly summarize the basics here.

Terminology

MoE is an architectural pattern in neural networks that divides the computation of a single layer or operation (for example, a linear layer, an MLP, or attention projection) into multiple "expert" sub-networks. These sub-networks each independently perform their own computation, and their results are combined to produce the final output of the MoE layer. An MoE architecture can be either dense (where every expert is used for every input) or sparse (where only a subset of experts is used for each input).

Subjective interest

My personal interest in MoE is not a single, more-performant model, but rather an approach for organizing a collection of models that can collaborate and divide labor. Sometimes they are called agents, sometimes experts. I had the thought that this trend has a similar shape to how blockchain consensus algorithms (POW, POS, DPOS, POI, POC, etc.) — once?, after countless human trial and error — have come to resemble the consensus structures of evolving institutions and norms. If humans benchmark nature to advance science, then IT technologies like AI and blockchain seem to be increasingly imitating human institutions to advance themselves, forming a kind of loop. That's why I keep an eye on the relevant developments.

Subjective references

In the MoE architecture, the emphasis is on collaboration through organization and consensus structures. From the perspective not of a developer but of service planning, the point is how seamlessly you can organize collaboration between agents, collaboration with the individual, and the individual's creation of agents.

The MoE architect focuses on collaboration through organizational and consensus structures. From a service planning perspective, it is about how seamlessly you can organise collaboration between agents, collaboration with individuals and the creation of agents for individuals.

Related technical trends with similar concepts are evolving fast. Even though the Why and the How are the same, the What — that is, the way they are delivered, and the terminology — is all over the map. Let me organize a few products (sometimes a model, sometimes a framework, etc.) that share a similar architecture.

#Chunking Strategies: Agentic Chunking

https://normalstory.tistory.com/entry/RAG-Agentic-Chunking-ing

LLM | Five Levels of Chunking( 스압 주의!)

1. 개요 Chunking Chunking은 고품질의 응답에 많은 영향을 미치는 중요한 과정으로써 텍스트를 관리를 쉽고, 명확하게 중요한 부분으로 나누는 과정으로 맥락의 효율적인 처리와 검색을 위해 사용" data-og-host="normalstory.tistory.com">

normalstory.tistory.com

#Optimization at the in-model unit-process level

LangGraph - Building language agents as graphs

GitHub - langchain-ai/langgraph

Contribute to langchain-ai/langgraph development by creating an account on GitHub.

github.com

llamaindex - Agentic strategies

Agentic strategies - LlamaIndex ? v0.10.17

Previous Context-Augmented OpenAI Agent

docs.llamaindex.ai

Tencent - More Agents Is All You Need

GitHub - MoreAgentsIsAllYouNeed/More-Agents-Is-All-You-Need

Contribute to MoreAgentsIsAllYouNeed/More-Agents-Is-All-You-Need development by creating an account on GitHub.

github.com

#General-purpose, abstraction-based frameworks

AutoGen - A programming framework for agentic AI.

GitHub - microsoft/autogen: A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap - microsoft/autogen

github.com

Crew AI - Multi AI Agents systems

GitHub - joaomdmoura/crewAI: Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intellig

Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks. - joaomdmoura/cr...

github.com

#Code-generation optimization

Devika - Agentic AI Software Engineer

GitHub - stitionai/devika: Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break th

Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective...

github.com

This English version was translated by Claude.