Building Simple AI Models with Python (Beginner Code Examples)
A hands-on path to your first working ML pipeline: data loading, splits, baselines, feature engineering, training, evaluation, and saving a small model. Includes tabular classification, text classification, and a minimal API.
Environment & project structure
Install (local or Colab):
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -U scikit-learn pandas numpy matplotlib fastapi uvicorn joblib
Project layout:
ai-starter/
├─ data/ # CSVs go here
├─ notebooks/ # exploration (optional)
├─ src/
│ ├─ features.py
│ ├─ train_tabular.py
│ ├─ train_text.py
│ └─ serve_api.py
└─ models/ # saved pipelines
Tabular classification (churn-like example)
This pattern fits many business problems: predict a binary outcome (e.g., churn, purchase, fraud). We’ll build a leakage-safe pipeline using scikit-learn. Assume a CSV with columns: user_id
, tenure_days
, sessions_7d
, country
, plan
, late_payments
, churned
(0/1).
# src/train_tabular.py
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.metrics import classification_report, roc_auc_score, average_precision_score
from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingClassifier
import joblib, os
df = pd.read_csv("data/users.csv")
# Drop leaks: anything not available at prediction time (e.g., "churned_next_month" labels inside features)
y = df["churned"].astype(int)
X = df.drop(columns=["churned","user_id"], errors="ignore")
num_cols = X.select_dtypes(include=["float64","int64","float32","int32"]).columns.tolist()
cat_cols = X.select_dtypes(include=["object","category"]).columns.tolist()
pre = ColumnTransformer(
transformers=[
("num", StandardScaler(), num_cols),
("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols),
],
remainder="drop",
)
clf = GradientBoostingClassifier(random_state=42)
pipe = Pipeline([("pre", pre), ("clf", clf)])
# Time-aware split is better in production; here we use a simple holdout for demo
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
pipe.fit(Xtr, ytr)
pred = pipe.predict(Xte)
proba = pipe.predict_proba(Xte)[:,1]
print(classification_report(yte, pred, digits=3))
print("ROC-AUC:", roc_auc_score(yte, proba))
print("PR-AUC :", average_precision_score(yte, proba))
os.makedirs("models", exist_ok=True)
joblib.dump(pipe, "models/churn_pipe.joblib")
print("Saved → models/churn_pipe.joblib")
Notes: In production, prefer a temporal split: train on earlier months, test on later months to mimic real deployment. Use class weights or thresholds when the positive rate is small.
Text classification (news sentiment)
For quick wins, classic text pipelines (TF-IDF + linear model) are fast and strong. Suppose a CSV with headline
, label
in {negative, neutral, positive}.
# src/train_text.py
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, f1_score
import joblib, os
df = pd.read_csv("data/news.csv") # columns: headline,label
X = df["headline"].astype(str)
y = df["label"].astype("category")
pipe = Pipeline([
("tfidf", TfidfVectorizer(
lowercase=True,
ngram_range=(1,2),
min_df=2,
max_df=0.9
)),
("clf", LogisticRegression(max_iter=1000))
])
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
pipe.fit(Xtr, ytr)
pred = pipe.predict(Xte)
print(classification_report(yte, pred, digits=3))
print("Macro F1:", f1_score(yte, pred, average="macro"))
os.makedirs("models", exist_ok=True)
joblib.dump(pipe, "models/sentiment_pipe.joblib")
print("Saved → models/sentiment_pipe.joblib")
Why this works: TF-IDF captures token importance across documents; bigrams learn short phrases like “price surge” or “hack reported”. If you later switch to transformers, keep this classic baseline for sanity checks.
Evaluation & cross-validation
Don’t trust a single split. Use cross-validation or a rolling (time-aware) scheme, and report multiple metrics. For imbalanced classes (fraud, churn), prefer PR-AUC and recall at fixed precision.
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer, average_precision_score
import numpy as np
# Example with PR-AUC as scorer for a binary classifier
def pr_auc(estimator, X, y):
proba = estimator.predict_proba(X)[:,1]
return average_precision_score(y, proba)
scores = cross_val_score(pipe, X, y, cv=5, scoring=make_scorer(pr_auc, needs_proba=True))
print("PR-AUC CV mean:", np.mean(scores), "±", np.std(scores))
Slice analysis: Break metrics down by cohort (country, device, plan). If a model fails on a slice, you’ve found a path to improvement or a reason to add guardrails.
Saving & loading models
Save the entire pipeline (preprocessing + model) so you don’t mismatch transforms at inference time.
import joblib
pipe = joblib.load("models/churn_pipe.joblib")
proba = pipe.predict_proba(example_df)[:,1]
Track versions: include date, git commit hash, and a short change note in the filename or a metadata sidecar.
Serving a tiny API with FastAPI
Expose a minimal endpoint for internal prototypes. Keep authentication, rate limits, and logging in mind for production.
# src/serve_api.py
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
app = FastAPI(title="AI Starter API")
pipe = joblib.load("models/churn_pipe.joblib")
class Payload(BaseModel):
tenure_days: float
sessions_7d: float
country: str
plan: str
late_payments: int
@app.post("/score")
def score(p: Payload):
import pandas as pd
df = pd.DataFrame([p.dict()])
proba = float(pipe.predict_proba(df)[:,1])
label = int(proba >= 0.5)
return {"prob": proba, "label": label}
# Run: uvicorn src.serve_api:app --reload --port 8000
Production notes: add auth, input validation, schema enforcement, request limits, and structured logs (with PII minimization). For high-impact decisions, return a reason code (top features) and route edge cases to human review.
What to try next
- Imbalanced learning: class weights, focal loss (if you move beyond scikit-learn), or threshold tuning.
- Time-aware validation: rolling windows; detect performance drift and retrain triggers.
- Text + tabular fusion: combine numeric features with TF-IDF features using a FeatureUnion or by stacking predictions.
- Model cards: write a 1-pager for each model: use cases, metrics by cohort, limitations, and rollback plan.
- Applied AI track: Plug your outputs into dashboards/alerts and pair with prompts from our Prompt Libraries for decision support.