Zenalyze

πŸš€ Zenalyze

AI-powered, context-aware data analysis with Pandas or PySpark

zenalyze turns Large Language Models into a practical coding assistant designed specifically for data analysis.
It loads datasets, extracts metadata, builds intelligent prompts, tracks analysis history, and generates fully executable Python code and even executes directly in your environment.

Public API Usage Guide

Works seamlessly with:


🌟 Features Overview

1. Automatic Data Loading + Metadata Extraction

Effortlessly load and describe datasets through:

Supports:

Extracts and formats metadata such as:

Metadata is automatically embedded into prompts so the LLM generates context-aware code.


2. Intelligent Prompt Engineering

Every call to .do() builds a complete LLM prompt with:

The LLM returns procedural Python code that:


3. Zenalyze: End-to-End LLM-Powered Analysis

The main controller:

Example:

zen.do("calculate total revenue per customer")

4. Automatic History Summarization

Long analysis sessions stay manageable.

This allows large multi-step workflows without losing cont


5. Buddy Assistant (BuddyLLM)

A conversational assistant for explanations and walkthroughs.

Examples:

zen.buddy("What have we done so far?")
zen.buddy("Explain the last transformation in simple words")

It reads from summarized history and replies naturally.


6. Test Mode (TestZen)

A fully offline, deterministic, API-free mode.

Example:

from zenalyze import create_testzen_object_with_env_var_and_last5_hist

zent = create_testzen_object_with_env_var_and_last5_hist(globals(), "./data")
zent.do("show me something")

7. One-Line Quick Constructors

Pandas backend

from zenalyze import create_zenalyze_object_with_env_var_and_last5_hist

zen = create_zenalyze_object_with_env_var_and_last5_hist(globals(), "./data")

Spark backend

from pyspark.sql import SparkSession
from zenalyze import create_zenalyze_object_with_env_var_and_last5_hist

spark = SparkSession.builder.getOrCreate()

zen = create_zenalyze_object_with_env_var_and_last5_hist(
    globals(),
    "./data",
    spark_session=spark
)

πŸ“¦ Installation

From PyPI (when available)

pip install zenalyze

From GitHub

pip install git+https://github.com/tuhindutta/Zenalyze.git

πŸ“˜ Basic Usage

  1. Initialize

     from zenalyze import create_zenalyze_object_with_env_var_and_last5_hist
        
     zen = create_zenalyze_object_with_env_var_and_last5_hist(globals(), "./data")
    
  2. Run a query

     zen.do("show unique customers and total sales per region")
    
  3. Continue analysis

     zen.do("plot distribution of order quantities per product")
    
  4. Ask the Buddy assistant

     zen.buddy("Summarize steps 1 to 4")
    

πŸ“‚ Package Structure

zenalyze/
 β”œβ”€β”€ data/
 β”‚    β”œβ”€β”€ pandas/
 β”‚    β”‚    β”œβ”€β”€ data.py
 β”‚    β”‚    └── metadata.py
 β”‚    β”œβ”€β”€ spark/
 β”‚    β”‚    β”œβ”€β”€ data.py
 β”‚    β”‚    └── metadata.py
 β”‚    β”œβ”€β”€ data_base_class.py
 β”‚    └── __init__.py
 β”‚
 β”œβ”€β”€ chat/
 β”‚    β”œβ”€β”€ llm.py
 β”‚    β”œβ”€β”€ summarizer_llm.py
 β”‚    β”œβ”€β”€ buddy_llm.py
 β”‚    └── __init__.py
 β”‚
 β”œβ”€β”€ prompt.py
 β”œβ”€β”€ zenalyze.py
 β”œβ”€β”€ _quick_obj.py
 └── __init__.py

🧱 Core Components

Data Layer

Component Purpose
Data Base dataset wrapper
PandasData, PandasDataLoad Pandas backend
SparkData, SparkDataLoad Spark backend
metadata.py Extracts table metadata

LLM Layer

Component Purpose
LLM Primary code-generation model
SummarizerLLM History compression and summarization
BuddyLLM Natural-language assistant

Prompt Layer

Execution Layer


πŸ§ͺ Example Workflow

zen.do("give me customer count by city")
zen.do("merge customers with orders and compute order totals")
zen.do("show the top 5 highest spending customers")
zen.buddy("Explain the main insights so far")

πŸ”’ Security Notes


πŸ‘€ Maintainer

Tuhin Kumar Dutta


⭐ Contribute

Pull requests and issues are welcome.

git clone https://github.com/tuhindutta/Zenalyze.git

Let’s build the most capable AI-driven data analysis toolkit together.