LLM | Langchain
? 01 Overview
- 02 RAG practice by type/stage
About LM and LLM
- Architecture — Transformer (Decoder, Encoder)
- Learning algorithm — Language Model (LM)
- LM workflow: Foundation model* > RLHF technique**
- A model pre-trained using large-scale compute resources and data
- Unlike OpenAI or Google, Facebook's LLaMA 2 is open for commercial use as well
Foundation model*
- A model pre-trained using large-scale compute resources and data
- Unlike OpenAI or Google, Facebook's LLaMA 2 is also open for commercial use
RLHF technique**
- Human feedback algorithm — humans intervene on answers to prompts. Text proposals are collected from the internet, scored, with low rewards for legally or politically problematic utterances so they are filtered out, and the answers are fine-tuned to be the kind people prefer.
- A model pre-trained on a large corpus dataset can be fine-tuned (using, e.g., Facebook's LLaMA 2) for your purpose, producing a narrower model suited to your domain and data.
How to include (new external) knowledge not covered in LLM training
- fine-tuning (RLHF technique, etc.)
- Using text sources for new knowledge, run a fine-tuning process that updates LLM parameters in the existing pre-trained model to reflect that new text source.
- In the past this was difficult due to the massive data involved, but PEFT now enables sufficient fine-tuning performance even when only a portion of parameters is updated.
- RAG (retrieval-augmented generation)
- Embed the text source of new knowledge, store it in a vector store, and when composing a prompt, construct it together with the text data pulled from the external source, then get an answer from the LLM.
- Workflow: user prompt + external data storage/retrieval → prompt reconstruction → LLM Q&A (no separate fine-tuning required)
- LangChain: a framework that provides the modules needed to implement RAG (e.g., chat pdf)
- Document upload → document splitting → document embedding → embedding search → answer generation
Langchain
- Overview
- A framework for developing LM-powered apps
- Data awareness (language-model API calls) + agent capability (searching for supplementary info and other external-environment interactions)
- How to use
- GPT limitations
- Limited information access — based on data up to a certain date
- Token limits — 4096 (GPT-3.5), 8192 (GPT-4.0)
- Hallucinations — wrong answers
- Improvement methods
- fine-tuning — adjusts weights of an existing deep-learning model to update it for the desired purpose and improve outputs (high cost and time)
- n-shot learning — provides n output examples so the model doesn't answer in ways you don't want, controlling outputs to fit the purpose (hard to expect answers reflecting new external information)
- in-context learning — provides context and adjusts the model's output based on that context (e.g., langchain)
- langchain modules to use
- Limited information access → info search via vectorStore or combined with agent-based search
- Token limits → document splitting via textSplitter
- Hallucination — prompt the model to answer only about the given document
- GPT limitations
- Main features
- Engine: acts as the engine for various LLM generation models (GPT, PALM, LLAMA, StableVicuna, etc.)
- Prompts: Prompt templates for various LLMs, Chat Prompt templates (chatbot templates), example selectors (n-shot-like answer examples), and output parsers (format answers to match situation and use)
- Index: modules that structure documents for easy exploration (Document Loaders, Text Splitters, Vectorstores, Retrievers, ...)
- Memory: modules that let the model remember prior chat history so the conversation can build on previous content (ConversationBufferMemory, Entity Memory, Conversation Knowledge, Graph Memory, ...)
- Chain: modules that chain LLMs to enable sequential LLM calls (LLM Chain, Question Answering, Summarization, Retrieval Question/Answering, ...)
- Agents: a feature that lets the LLM decide on its own which tools to use for tasks that can't be done with regular Prompt Templates (web search, writing SQL queries to extract information) (Custom Agent, Custom MultiAction Agent, Conversation Agent, ...)
- Example of using main features (chat pdf — a chatbot based on attached documents)
- Upload document (PDF, using PyPDFLoader)
- Split document (text splitter): split into multiple texts (paragraphs)
- Embed document (string to vector — for ML; CPU can't process string data): embed as vectors (numeric representation of sentences) → store in a vector store by ChatGPT
- Embedding search: user enters a prompt → search vector store (vectorstore retriever), return relevant text (extract documents highly related to the question)
- Answer generation: QA Chain (Q + relevant text) → submit to LLM 01 (prompt reconstruction by ChatGPT) → submit to LLM 02 (by ChatGPT) → return answer (click reference link → view source in the original document)
Transformer
- Components
- Decoder — the part that speaks well (GPT-4, Bard, LLaMa)
- Encoder — the part that understands language (LLM)
- Trends
- evolution tree https://github.com/Mooler0410/LLMsPracticalGuide
- Type of Usage
- Closed Source
- Strong performance, convenient API usage
- Unguaranteed security, API cost
- Open Source
- Strong performance, strong security and low cost
- High development difficulty, GPU server needed
- Closed Source

