There are various LLM-related frameworks, but most of them depend on the OpenAI API. As far as possible I've been doing hands-on practice focused on examples that avoid this. Teddy Note posted a hands-on video covering several key items I was interested in, so I'm posting the related work here.
Get a free Korean?? fine-tuned model and host my own local LLM (LangServe) + RAG too!! (Source: Teddy Note)
(For reference, tools that help you download an LLM to your local computer and use it easily include three representative options: ollama, AnythingLLM, LM Studio — and Teddy Note's example uses ollama.)
Hands-on procedure
1. Get the Korean?? fine-tuned model from HuggingFace-Hub
1) Download the practice model (EEVE-Korean-Instruct-10.8B-v1.0)
(1) Method A. CLI
a-1) Create and move into the folder you'll work in
a-2) From the terminal at that path, install the package: pip install huggingface-hub
a-3) Then, the 'download' command to type into the terminal
- How to write it
huggingface-cli download \\
허깅페이스 모델페이지의 메인타이틀 \\
모델목록중 다운받으려는 파일이름.gguf \\
--local-dir 내 컴퓨터 안에 모델을 저장할 위치 \\
--local-dir-use-symlinks 심볼릭(바로가기)링크로 사용여부
- References
- Main title: heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF · Hugging Face
- Filename: heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF at main
- HuggingFace model download detailed guide: Downloading files from the Hub
- Example
huggingface-cli download \\
heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF \\
ggml-model-Q5_K_M.gguf \\
--local-dir /Users/Charles/code/langserve_ollama/ollama-modelfile \\
--local-dir-use-symlinks False
(2) Method B. Download directly from the site
b-1) Find the model and download it
b-2) Move it to the folder you want
2. Register the downloaded model with ollama
1) Quick summary first:
(1) In the folder containing the downloaded .gguf file* (location not important)
(2) Create a file describing the model (Modelfile**, name not important)
(3) Run ollama create to add it to ollama list
*The .gguf file is a file format that's less than a year old (see related explanation). So sometimes when searching HuggingFace-Hub, you'll find some models only have the .ggml file format — knowing this is good for your sanity.
What is GGUF and GGML?
GGUF and GGML are file formats used for storing models for inference, especially in the context of language models like GPT (Generative…
medium.com
**Modelfile is a file with no extension. There are tools like VIM and so on, but it's healthier for your sanity to just write it in VSCode.
***ollama create... is one of the ollama CLI commands. The official docs explain it in detail.
ollama/docs/modelfile.md at main · ollama/ollama
Get up and running with Llama 3, Mistral, Gemma, and other large language models. - ollama/ollama
github.com
2) In the folder where the downloaded .gguf file is, create the Modelfile and write it:
FROM ggml-model-Q5_K_M.gguf
TEMPLATE """
{{- if .System }}
<s>{{ .System }}</s>
{{- end }}
<s>Human:
{{ .Prompt }}</s>
<s>Assistant:
"""
SYSTEM """
A chat between a curious user and an artificial intelligence assistant.
The assistant gives helpful, detailed, and polite answers to the user's questions.
"""
PARAMETER temperature 0
PARAMETER num_predict 3000
PARAMETER num_ctx 4096
PARAMETER stop <s>
PARAMETER stop </s>
Modelfile writing rules — summary
FROM - specifies the base model to use (this one — required)
TEMPLATE - the full prompt template that will be sent to the model (*not required, but the LLM may feel like it's drunk?)
SYSTEM - specifies the system message to set in the template (*not required, but the LLM? may feel like it's ignoring me?)
PARAMETER - sets parameters for how Ollama runs the model (*how much creativity/randomness to allow: 0–2)
ADAPTER - absolute or relative path (*if you used --local-dir-use-symlinks False earlier or downloaded directly, you can omit it)
LICENSE - specifies the legal license.
MESSAGE - specifies a message history. (*you can define answer patterns for specific situations — see the official docs. The official docs example
specifies that for "questions about the capital of a country," answer in "Yes/No short form.")
*Order doesn't matter. Case-insensitive. Capital letters are used for human readability.
3) From the terminal, run ollama create:
(1) Example:
ollama create EEVE-Korean-10.8B -f ollama-modelfile/Modelfile-V02
(2) Usage explanation:
Case 1) If the terminal is at the same path as the model file,
ollama create your-desired-model-name -f model-filename
Case 2) If the terminal is at a different folder path than the model file,
ollama create your-desired-model-name -f folder-where-model-lives (varies per person)/model-filename (varies per person, usually Modelfile is used)
4) Verify it was registered correctly — from the terminal, search for your-registered-model with ollama *for reference, if ollama is installed on your computer, you can run ollama-related commands from any location. (Reason: it is registered in your environment variables at install time.)
ollama list
If it appears in the list, it was successful
3. Coding for real
0) Create an empty folder, then set up a virtual environment (official docs)
(1) Create the virtual environment: python -m venv streamlit
(2) Activate the virtual environment: source streamlit/bin/activate
*'streamlit' is just a name — set it to whatever you like
1) Front-end setup
(1) Install the package
Install: pip install streamlit
Run: streamlit hello
(2) Write .gitignore, create & link a GitHub repo
(3) Write and run the test code: front_main.py
test run: streamlit run front_main.py
2) Back-end setup
(1) Install packages
Python-based web framework: pip install fastapi
Python-based web server: pip install uvicorn
(2) Write and run the test code, and test the API: server_main.py
test run: uvicorn server_main:app --reload
3) LangServe setup
3) Deployment with NGROK

