AI-powered, context-aware data analysis with Pandas or PySpark
zenalyze turns Large Language Models into a practical coding assistant designed specifically for data analysis.
It loads datasets, extracts metadata, builds intelligent prompts, tracks analysis history, and generates fully executable Python code and even executes directly in your environment.
Works seamlessly with:
Effortlessly load and describe datasets through:
PandasDataLoadSparkDataLoadSupports:
Extracts and formats metadata such as:
Metadata is automatically embedded into prompts so the LLM generates context-aware code.
Every call to .do() builds a complete LLM prompt with:
The LLM returns procedural Python code that:
The main controller:
globals()Example:
zen.do("calculate total revenue per customer")
Long analysis sessions stay manageable.
This allows large multi-step workflows without losing cont
A conversational assistant for explanations and walkthroughs.
Examples:
zen.buddy("What have we done so far?")
zen.buddy("Explain the last transformation in simple words")
It reads from summarized history and replies naturally.
A fully offline, deterministic, API-free mode.
Example:
from zenalyze import create_testzen_object_with_env_var_and_last5_hist
zent = create_testzen_object_with_env_var_and_last5_hist(globals(), "./data")
zent.do("show me something")
Pandas backend
from zenalyze import create_zenalyze_object_with_env_var_and_last5_hist
zen = create_zenalyze_object_with_env_var_and_last5_hist(globals(), "./data")
Spark backend
from pyspark.sql import SparkSession
from zenalyze import create_zenalyze_object_with_env_var_and_last5_hist
spark = SparkSession.builder.getOrCreate()
zen = create_zenalyze_object_with_env_var_and_last5_hist(
globals(),
"./data",
spark_session=spark
)
From PyPI (when available)
pip install zenalyze
From GitHub
pip install git+https://github.com/tuhindutta/Zenalyze.git
Initialize
from zenalyze import create_zenalyze_object_with_env_var_and_last5_hist
zen = create_zenalyze_object_with_env_var_and_last5_hist(globals(), "./data")
Run a query
zen.do("show unique customers and total sales per region")
Continue analysis
zen.do("plot distribution of order quantities per product")
Ask the Buddy assistant
zen.buddy("Summarize steps 1 to 4")
zenalyze/
βββ data/
β βββ pandas/
β β βββ data.py
β β βββ metadata.py
β βββ spark/
β β βββ data.py
β β βββ metadata.py
β βββ data_base_class.py
β βββ __init__.py
β
βββ chat/
β βββ llm.py
β βββ summarizer_llm.py
β βββ buddy_llm.py
β βββ __init__.py
β
βββ prompt.py
βββ zenalyze.py
βββ _quick_obj.py
βββ __init__.py
Data Layer
| Component | Purpose |
|---|---|
Data |
Base dataset wrapper |
PandasData, PandasDataLoad |
Pandas backend |
SparkData, SparkDataLoad |
Spark backend |
metadata.py |
Extracts table metadata |
LLM Layer
| Component | Purpose |
|---|---|
LLM |
Primary code-generation model |
SummarizerLLM |
History compression and summarization |
BuddyLLM |
Natural-language assistant |
Prompt Layer
Execution Layer
zen.do("give me customer count by city")
zen.do("merge customers with orders and compute order totals")
zen.do("show the top 5 highest spending customers")
zen.buddy("Explain the main insights so far")
Tuhin Kumar Dutta
Pull requests and issues are welcome.
git clone https://github.com/tuhindutta/Zenalyze.git
Letβs build the most capable AI-driven data analysis toolkit together.