AI vs Machine Learning vs Deep Learning

Understand how these terms relate, when each approach is most useful, and how modern products combine rules, classic ML, deep learning, and human oversight.

The nested-circles mental model

Picture three circles nested together. The biggest circle is AI, encompassing any method that produces intelligent behavior, rules, search, planning, optimization, and learning. Inside that is ML, techniques that learn from examples rather than explicit instructions. Inside ML is DL, which uses deep neural networks to learn representations from raw inputs. So: all DL is ML, and all ML is AI but not all AI is ML, and not all ML is DL.

Classic AI vs data-driven AI

Classic (symbolic) AI: Knowledge captured as rules or logic (expert systems, search). Deterministic and debuggable, but brittle when the world changes.
Data-driven AI (ML/DL): Learns patterns from data. Flexible and accurate with enough data/compute, but needs careful evaluation, monitoring, and guardrails.

In real products, you often need both: rules for hard constraints (“never send to a blocked address”), and ML/DL for fuzzy judgments (“is this transaction risky?”).

What counts as machine learning?

Supervised learning: Inputs + labels → predict labels (spam detection, demand forecasting, churn risk).
Unsupervised learning: No labels → discover structure (clustering customers, topic discovery, anomaly detection).
Semi/self-supervised: Learn general patterns from unlabeled data, then fine-tune on small labeled sets (how modern language models gain broad skill).
Reinforcement learning: Learn actions by trial-and-error rewards (robotics, ad bidding, game playing).

On tabular data (columns like income, balance, device score), classic models (logistic regression, random forests, gradient-boosted trees) are fast, strong, and easier to explain. They often beat deep nets when data is modest and features are well-engineered.

Why deep learning changed the game

Deep learning excels on unstructured data, text, images, audio, because it learns features automatically. In images, early layers detect edges and textures; deeper layers detect objects. In language, embeddings and attention map words to meaning and context. With more data and compute, performance scales impressively, up to limits.

Strengths: State-of-the-art on perception and language; flexible across tasks (classification, generation, retrieval).
Costs: Training can be expensive; inference latency and cost must be engineered down (quantization, distillation, caching).
Explainability: Harder than classic models; mitigate with reason codes, evaluations on slices, and human review.

Transformers and large language models (plain English)

Transformers use attention to decide which parts of the input matter for predicting the next token. Large language models (LLMs) are transformers trained on large corpora to predict text. With careful prompting, they summarize, translate, extract, and write code. To ground answers in your data, teams use retrieval-augmented generation (RAG): fetch relevant docs, then ask the LLM to answer using only those sources and cite them.

Choosing the right approach: data, latency, cost, explainability

Data type: Tabular → start with classic ML. Text/image/audio → DL/LLMs.
Label scarcity: Few labels → transfer learning, weak supervision, or RAG; many labels → supervised training works well.
Latency/cost: Real-time constraints may require small models, distillation, or on-device inference.
Explainability/regulation: Regulated decisions (credit, hiring) may prefer simpler models or require reason codes + human approval.
Iteration speed: Start simple; only add complexity if metrics demand it.

Hybrid systems: rules + ML + humans

Robust systems mix deterministic components with learned decisions and human oversight:

Rules: Hard constraints (policy, safety, compliance).
ML/DL: Probabilistic judgments (fraud scores, intent classification, anomaly detection).
Humans: Review edge cases, handle appeals, provide labeled feedback for improvement.

Case studies (tabular vs unstructured)

Fraud detection (tabular): Use gradient-boosted trees on features like velocity, merchant type, device fingerprint, and graph signals (shared IPs/wallets). Provide reason codes to analysts and customers. Deep models add lift on complex behavioral patterns.

Support triage (text): LLMs classify intent (billing, technical, refund), summarize context, and draft replies grounded in policy. High-impact cases route to human agents for review.

Vision enhancement (images): Cameras denoise and perform HDR fusion using deep models; classic image ops still help, but DL gives the leap in quality.

Pitfalls and anti-patterns

Deep learning by default: Overkill for small tabular problems; hurts explainability and cost.
Ignoring data quality: Better features beat fancier models; fix leaks, label noise, and drift.
Single metric obsession: Track fairness, latency, cost, and user satisfaction alongside accuracy.
No monitoring: Models silently degrade without drift checks and live evals.

Mini-exercises

Tabular vs unstructured: Pick a problem from your work. Is the data mainly columns or text/images? Choose a baseline accordingly.
Prompting practice: Take a messy email thread and prompt for a 5-bullet summary + 2 actions + owners and deadlines.
Hybrid sketch: Draw where rules, ML, and humans live in your flow. Label success/fairness metrics and alert thresholds.