A Python data analysis library that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data easy. Together with NumPy, it's one of the go-to tools for data analysis.. and unless you're truly an expert, it's arguably the package you'll run into even more often than NumPy.
Built on top of that package, pandas-ai is an open-source AI agent for data analysis that applies an LLM on top of pandas.
? 1. Pandas AI | PandasAI meets a local LLM (feat. ollama)
2. Pandas AI | PandasAI with Agent, OpenAI, MySQL
The first hands-on session walks through using a local LLM as the backend. There's an even easier way to get started, but to avoid spending unnecessary API fees on a tool that isn't yet perfect in many ways, I decided to start with the local LLM I already had installed.
0. References
An open-source AI agent for data analysis
PandasAI - Conversational Data Analysis
PandasAI is a Python library that integrates generative artificial intelligence capabilities into pandas, making dataframes conversational
pandas-ai.com
GitHub - Sinaptik-AI/pandas-ai: Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data ana
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG. - Sinaptik-AI/pandas-ai
github.com
1. Install ollama on your computer and download an individual model
Ollama
Get up and running with large language models.
ollama.com
2. Set up the virtual environment
python -m venv ENV_NAME
*ENV_NAME is just a name I pick myself.
3. Activate the virtual environment
- Mac
source ENV_NAME/bin/activate
- Windows
ENV_NAME\\bin\\activate
4. Existing package list: requirements.txt
pandas
pandasai
streamlit
5. Install the packages
pip install -r requirements.txt
6. Check the default Streamlit screen
from pandasai.llm.local_llm import LocalLLM
import streamlit as st
model = LocalLLM(
api_base="<http://localhost:11434/v1>",
model="llama3"
)
#st.UI-타이틀
st.title("Data analysis with PandasAI")
7. Build the file-upload UI
...
#st.UI-타이틀
st.title("Data analysis with PandasAI")
#st.UI-데이터 세트로드
upload_file = st.file_uploader("upload a CSV file", type=['csv'])
Upload a file by drag-and-drop.
lol
Streamlit is a Python-based framework. Which means you can render Python code straight to the screen. Just like running a single line of Python in the interpreter (or print/printf), you can output any specific code you want to the page.
If you upload an Excel file, let's read it and show the first 3 rows on the screen.
...
import pandas as pd
if uploaded_file is not None:
df = pd.read_csv(uploaded_file)
st.write(df.head(3))
What's surprising and impressive is that you can search and re-sort inside the table like the example below. Wow.. back in my day.. how many cups of coffee did I have to drink with the senior devs just to get a tiny admin feature like this added.. lol
Now let's build a prompt (QA) input UI. The key idea is that we connect the LLM to the dataframe and wire it up via SmartDataframe.
...
from pandasai import SmartDataframe
if uploaded_file is not None:
data = pd.read_csv(uploaded_file)
st.write(data.head(3))
df = SmartDataframe(data, config={"llm":model})
prompt = st.text_area("Enter your prompt:")
if st.button("Generate"):
if prompt:
with st.spinner("Generating response..."):
st.write(df.chat(prompt))
And then if you leave a message in the prompt, you can get results like the ones below, based on the Excel file you uploaded.
The important thing here is that, just like with any database, PandasAI also follows the rule of 'garbage in, garbage out'.
Garbage in, garbage out
Garbage in, garbage out
Back in my day, OA-related certifications — most representatively Computer Proficiency, ITQ, MOS, and the Office Automation Industrial Engineer license — were once basically required. Looking back now (though many places still demand them??..), it can feel kind of absurd. Because they feel so obvious. And while each of those tools offers an enormous range of features.. in real-world work, instead of memorizing every fine detail, people understand the basic functions, and when something specific is needed, they research it on the spot and pick it up right then and there.
I get the feeling that AI-related tools (or services) won't be any different.
Recently a lot of content is being consumed around prompt engineering (for both generative images and text), AI-powered YouTube channels, blogs, training programs, and so on, blah blah — and I feel like before long we're going to keep seeing déjà vu-like situations.
