Back to feed
Renewal·마흔의 생활코딩

LangChain | Backup Notes

NS
normalstory
cover image
YouTube 영상 미리보기YouTube
external media

Video summaries

  1. No summary yet.

There are various LLM-related frameworks, but most of them depend on the OpenAI API. As far as possible I've been doing hands-on practice focused on examples that avoid this. Teddy Note posted a hands-on video covering several key items I was interested in, so I'm posting the related work here. 

Get a free Korean?? fine-tuned model and host my own local LLM (LangServe) + RAG too!! (Source: Teddy Note)

(For reference, tools that help you download an LLM to your local computer and use it easily include three representative options: ollama, AnythingLLM, LM Studio — and Teddy Note's example uses ollama.)

 

 

Hands-on procedure

1.  Get the Korean?? fine-tuned model from HuggingFace-Hub

1) Download the practice model (EEVE-Korean-Instruct-10.8B-v1.0)

      (1) Method A. CLI 
        a-1) Create and move into the folder you'll work in 
        a-2) From the terminal at that path, install the package: pip install huggingface-hub
        a-3) Then, the 'download' command to type into the terminal

         - How to write it

huggingface-cli download \\
  허깅페이스 모델페이지의 메인타이틀 \\
  모델목록중 다운받으려는 파일이름.gguf \\
  --local-dir 내 컴퓨터 안에 모델을 저장할 위치 \\
  --local-dir-use-symlinks 심볼릭(바로가기)링크로 사용여부

        - Example

huggingface-cli download \\
  heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF \\
  ggml-model-Q5_K_M.gguf \\
  --local-dir /Users/Charles/code/langserve_ollama/ollama-modelfile \\
  --local-dir-use-symlinks False

 

      (2) Method B. Download directly from the site
       b-1) Find the model and download it
       b-2) Move it to the folder you want  

 

2. Register the downloaded model with ollama 

1) Quick summary first:   

     (1) In the folder containing the downloaded .gguf file* (location not important)   
     (2) Create a file describing the model (Modelfile**, name not important)
     (3) Run ollama create to add it to ollama list 

*The .gguf file is a file format that's less than a year old (see related explanation). So sometimes when searching HuggingFace-Hub, you'll find some models only have the .ggml file format — knowing this is good for your sanity.
 

What is GGUF and GGML?

GGUF and GGML are file formats used for storing models for inference, especially in the context of language models like GPT (Generative…

medium.com

**Modelfile is a file with no extension. There are tools like VIM and so on, but it's healthier for your sanity to just write it in VSCode. 

***ollama create... is one of the ollama CLI commands. The official docs explain it in detail. 
 

ollama/docs/modelfile.md at main · ollama/ollama

Get up and running with Llama 3, Mistral, Gemma, and other large language models. - ollama/ollama

github.com

 

2) In the folder where the downloaded .gguf file is, create the Modelfile and write it:

FROM ggml-model-Q5_K_M.gguf

TEMPLATE """
        {{- if .System }}
        <s>{{ .System }}</s>
        {{- end }}
        <s>Human:
        {{ .Prompt }}</s>
        <s>Assistant:
    """

SYSTEM """
        A chat between a curious user and an artificial intelligence assistant. 
        The assistant gives helpful, detailed, and polite answers to the user's questions.
    """

PARAMETER temperature 0
PARAMETER num_predict 3000
PARAMETER num_ctx 4096
PARAMETER stop <s>
PARAMETER stop </s>
Modelfile writing rules — summary

FROM               - specifies the base model to use (this one — required
TEMPLATE      - the full prompt template that will be sent to the model (*not required, but the LLM may feel like it's drunk?)
SYSTEM          - specifies the system message to set in the template (*not required, but the LLM? may feel like it's ignoring me?)
PARAMETER   - sets parameters for how Ollama runs the model (*how much creativity/randomness to allow: 0–2)
ADAPTER        - absolute or relative path (*if you used --local-dir-use-symlinks False earlier or downloaded directly, you can omit it) 
LICENSE          - specifies the legal license. 
MESSAGE        - specifies a message history. (*you can define answer patterns for specific situations — see the official docs. The official docs example 
                             specifies that for "questions about the capital of a country," answer in "Yes/No short form.") 

*Order doesn't matter. Case-insensitive. Capital letters are used for human readability. 

 

3) From the terminal, run ollama create:    

    (1) Example:

ollama create EEVE-Korean-10.8B -f ollama-modelfile/Modelfile-V02

    (2) Usage explanation: 

       Case 1) If the terminal is at the same path as the model file, 

ollama create your-desired-model-name -f model-filename

       Case 2) If the terminal is at a different folder path than the model file, 

ollama create your-desired-model-name -f folder-where-model-lives (varies per person)/model-filename (varies per person, usually Modelfile is used)

 

4) Verify it was registered correctly — from the terminal, search for your-registered-model with ollama *for reference, if ollama is installed on your computer, you can run ollama-related commands from any location. (Reason: it is registered in your environment variables at install time.)

ollama list

If it appears in the list, it was successful

 

 

3. Coding for real

0) Create an empty folder, then set up a virtual environment (official docs)

(1) Create the virtual environment: python -m venv streamlit
(2) Activate the virtual environment: source streamlit/bin/activate

*'streamlit' is just a name — set it to whatever you like

 

1) Front-end setup 

(1) Install the package 

Install: pip install streamlit
Run: streamlit hello

(2) Write .gitignore, create & link a GitHub repo

.cache/
.streamlit
**/__pycache__/**

*.DS_Store
*.pyc
*.gguf

(3) Write and run the test code: front_main.py

# main.py
import streamlit as st

st.text('hello Streamlit!')
test run: streamlit run front_main.py 

 

 

2) Back-end setup 

(1) Install packages

Python-based web framework: pip install fastapi
Python-based web server: pip install uvicorn 

(2) Write and run the test code, and test the API: server_main.py

from typing import Union
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
async def read_root():
return {"Hello": "World"}

@app.get("/items/{item_id}")
async def read_item(item_id: int, q: Union[str, None] = None):
return {"item_id": item_id, "q": q}
test run: uvicorn server_main:app --reload

http://127.0.0.1:8000
http://127.0.0.1:8000/docs

 

3) LangServe setup 

 

 

3) Deployment with NGROK 

 

This English version was translated by Claude.

친절한 찰쓰씨
Written by
친절한 찰쓰씨

Pleasant Charles — UI/UX researcher at AIT. Keeping notes on design, planning, and slow days here since 2010.

More on the author's page

Keep reading

Renewal

Steadily, for the long haul, without burning out

Mar 31, 2026·9 min
Renewal

Tech-life balance

Feb 7, 2026·3 min
Renewal

Humanality, by Park Jeong-ryeol

Feb 7, 2026·11 min