MLSemester 8

Unit 3: Supervised Learning

k-NN, NaΓ―ve Bayes, Decision Trees, Linear Regression, Logistic Regression, and Support Vector Machines β€” definitions, working, formulas, and examples.

Author: Deepak Modi
Last Updated: 2026-05-10

Syllabus:

Supervised Learning: Definition, how it works. Types of Supervised learning algorithms: k-Nearest Neighbours, NaΓ―ve Bayes, Decision Trees, Linear Regression, Logistic Regression, Support Vector Machines.


🎯 PYQ Analysis for Unit 3

PYQs will be added after analysis β€” check back soon.


Section 1: Supervised Learning β€” Overview

1.1 Definition

Supervised Learning is a type of ML where the model is trained on a labeled dataset β€” every input (X) has a known correct output (Y). The model learns to map X β†’ Y, and then uses that mapping to predict Y for new, unseen inputs.

How It Works:

  Training Phase:
  ──────────────
  Labeled Data (X, Y) ──► ML Algorithm ──► Trained Model

  Prediction Phase:
  ─────────────────
  New Input (X_new) ──► Trained Model ──► Predicted Output (ΕΆ)

1.2 Two Types of Supervised Learning

TypeOutputExample
ClassificationDiscrete class labelSpam / Not Spam
RegressionContinuous numeric valueHouse price prediction

1.3 General Workflow

  1. Collect labeled data.
  2. Split into Training set and Test set (e.g., 80/20 split).
  3. Choose an algorithm.
  4. Train the model on the training set.
  5. Evaluate on the test set.
  6. Tune and deploy.

Section 2: k-Nearest Neighbours (k-NN)

2.1 What is k-NN?

Definition:

k-Nearest Neighbours (k-NN) is a simple, non-parametric supervised learning algorithm. To classify a new data point, it looks at the k closest training points (neighbours) and assigns the majority class among them.

k-NN is called a lazy learner β€” it does not build an explicit model during training. It memorizes the entire training dataset.

Simple Analogy:

To guess which neighbourhood a house belongs to, you look at the 3 nearest houses (k=3). If 2 of 3 are in "Area A", you classify the house as "Area A".


2.2 How k-NN Works

Algorithm Steps:

1. Store all training data points.

2. For a new input point X_new:
   a. Calculate distance from X_new to every training point.
   b. Sort distances in ascending order.
   c. Select the top k nearest neighbours.
   d. Count the class labels of the k neighbours.
   e. Assign the majority class as the prediction.

Diagram:

         ●  β–                   Legend:
       ●   ●                   ● = Class A
        ✦ ← new point          β–  = Class B
       β–    ●                   ✦ = query point
         β–   ●

   k=3 nearest neighbours: ●, ●, β– 
   Majority = ●  β†’ Predict Class A

2.3 Distance Metrics

The "closeness" between two points is measured by a distance function.

Euclidean Distance (most common):

d(A, B) = √[(a₁-b₁)Β² + (aβ‚‚-bβ‚‚)Β² + ... + (aβ‚™-bβ‚™)Β²]

For 2D:
d(A, B) = √[(a₁-b₁)Β² + (aβ‚‚-bβ‚‚)Β²]

Manhattan Distance:

d(A, B) = |a₁-b₁| + |aβ‚‚-bβ‚‚| + ... + |aβ‚™-bβ‚™|

Minkowski Distance (generalizes both):

d(A, B) = [Ξ£|aα΅’-bα΅’|α΅–]^(1/p)

p=1 β†’ Manhattan
p=2 β†’ Euclidean

2.4 Worked Example

Dataset:

Pointx₁xβ‚‚Class
A12C1
B23C1
C31C2
D54C2

Query point: Q = (3, 3), k = 3

Calculate Euclidean distances from Q to each point:

d(Q, A) = √[(3-1)Β² + (3-2)Β²] = √[4+1]   = √5   β‰ˆ 2.24
d(Q, B) = √[(3-2)² + (3-3)²] = √[1+0]   = √1   = 1.00
d(Q, C) = √[(3-3)² + (3-1)²] = √[0+4]   = √4   = 2.00
d(Q, D) = √[(3-5)Β² + (3-4)Β²] = √[4+1]   = √5   β‰ˆ 2.24

Sort and pick k=3 nearest:

1. B β€” distance 1.00 β†’ Class C1
2. C β€” distance 2.00 β†’ Class C2
3. A β€” distance 2.24 β†’ Class C1  (tie broken by order)

Vote: C1: 2, C2: 1 β†’ Predict Class C1


2.5 Choosing k

k valueEffect
k=1Very sensitive to noise, can overfit
k=largeSmoother boundary, may underfit
Odd kAvoids ties in binary classification
Best practiceUse cross-validation to find optimal k

2.6 Advantages and Disadvantages

Advantages:

βœ… Simple, easy to understand and implement.
βœ… No training phase (lazy learner).
βœ… Naturally handles multi-class problems.
βœ… Adapts automatically to new training data.

Disadvantages:

❌ Slow at prediction time (computes all distances).
❌ High memory usage (stores entire training set).
❌ Sensitive to irrelevant features and scale.
❌ Struggles with high-dimensional data.

Applications:

  • Recommendation systems
  • Image recognition
  • Medical diagnosis
  • Anomaly detection

Section 3: NaΓ―ve Bayes

3.1 What is NaΓ―ve Bayes?

Definition:

NaΓ―ve Bayes is a probabilistic classification algorithm based on Bayes' Theorem. It assumes that all features are independent of each other given the class β€” this is the "naΓ―ve" assumption.


3.2 Bayes' Theorem

Formula:

          P(X | C) Γ— P(C)
P(C | X) = ──────────────────
                P(X)

Where:
  P(C | X) = Posterior β€” probability of class C given features X
  P(X | C) = Likelihood β€” probability of seeing X given class C
  P(C)     = Prior β€” probability of class C in training data
  P(X)     = Evidence β€” probability of seeing X (normalizing constant)

For Classification (we compare classes, so P(X) cancels):

P(C | X) ∝ P(X | C) Γ— P(C)

Pick class C that maximizes:
  P(C | X₁, Xβ‚‚, ..., Xβ‚™)

NaΓ―ve independence assumption:
  P(X₁, Xβ‚‚, ..., Xβ‚™ | C) = P(X₁|C) Γ— P(Xβ‚‚|C) Γ— ... Γ— P(Xβ‚™|C)

3.3 How NaΓ―ve Bayes Works

Steps:

1. Training:
   a. Calculate P(C) for each class (prior).
   b. For each feature and each class, calculate P(Xα΅’ | C) (likelihood).

2. Prediction (for new point X):
   a. For each class C, compute:
         Score(C) = P(C) Γ— P(X₁|C) Γ— P(Xβ‚‚|C) Γ— ... Γ— P(Xβ‚™|C)
   b. Predict class with highest Score.

3.4 Worked Example

Dataset β€” Play Tennis:

DayOutlookHumidityWindPlay?
1SunnyHighWeakNo
2SunnyHighStrongNo
3OvercastHighWeakYes
4RainNormalWeakYes
5RainNormalStrongNo
6OvercastNormalStrongYes
7SunnyNormalWeakYes

Predict: Outlook=Sunny, Humidity=Normal, Wind=Weak β†’ Play?

Step 1: Prior probabilities

P(Yes) = 4/7 β‰ˆ 0.571
P(No)  = 3/7 β‰ˆ 0.429

Step 2: Likelihoods

For Yes (4 examples):
  P(Outlook=Sunny  | Yes) = 1/4 = 0.25
  P(Humidity=Normal| Yes) = 3/4 = 0.75
  P(Wind=Weak      | Yes) = 3/4 = 0.75

For No (3 examples):
  P(Outlook=Sunny  | No)  = 2/3 β‰ˆ 0.67
  P(Humidity=Normal| No)  = 1/3 β‰ˆ 0.33
  P(Wind=Weak      | No)  = 1/3 β‰ˆ 0.33

Step 3: Posterior scores

Score(Yes) = P(Yes) Γ— 0.25 Γ— 0.75 Γ— 0.75
           = 0.571 Γ— 0.1406
           β‰ˆ 0.0803

Score(No)  = P(No) Γ— 0.67 Γ— 0.33 Γ— 0.33
           = 0.429 Γ— 0.0729
           β‰ˆ 0.0313

Prediction: Yes (0.0803 > 0.0313) βœ…


3.5 Laplace Smoothing

Problem: If any P(Xα΅’|C) = 0 (feature never seen in training), the whole product = 0.

Solution β€” Laplace Smoothing (add-1 smoothing):

            count(Xα΅’, C) + 1
P(Xα΅’ | C) = ─────────────────────
             count(C) + |vocab|

3.6 Types of NaΓ―ve Bayes

TypeFeature DistributionUse Case
Gaussian NBContinuous, normal distributionIris flower classification
Multinomial NBDiscrete countsText classification
Bernoulli NBBinary featuresSpam detection

3.7 Advantages and Disadvantages

Advantages:

βœ… Very fast β€” simple multiplication of probabilities.
βœ… Works well with small datasets.
βœ… Handles high-dimensional data (text classification).
βœ… Naturally handles multi-class problems.

Disadvantages:

❌ Naïve independence assumption is rarely true in practice.
❌ Poor estimator of probabilities (scores, not calibrated).
❌ Struggles with feature interactions.

Applications:

  • Spam filtering
  • Sentiment analysis
  • Document classification
  • Medical diagnosis

Section 4: Decision Trees

4.1 What is a Decision Tree?

Definition:

A Decision Tree is a tree-structured model where:

  • Each internal node represents a test on a feature.
  • Each branch represents the outcome of the test.
  • Each leaf node represents a class label (classification) or a value (regression).

Simple Analogy:

Think of a decision tree like a flowchart or a 20-questions game β€” at each step you ask a yes/no question and follow the appropriate branch until you reach an answer.


4.2 Structure of a Decision Tree

                    [Outlook?]               ← Root Node
                   /    |    \
             Sunny/  Overcast\ Rain\
                /       |       \
        [Humidity?]  [Yes βœ“]  [Wind?]      ← Internal Nodes
          /    \               /    \
       High   Normal       Strong  Weak
        /        \            |      |
     [No βœ—]   [Yes βœ“]    [No βœ—] [Yes βœ“]   ← Leaf Nodes

4.3 How to Build a Decision Tree β€” Key Concepts

Entropy (Measure of Impurity)

Definition: Entropy measures the impurity or disorder in a dataset. A pure node (all same class) has entropy = 0.

Formula:

           c
H(S) = -  Ξ£  pα΅’ Γ— logβ‚‚(pα΅’)
          i=1

Where:
  S  = dataset
  c  = number of classes
  pα΅’ = proportion of class i in S

Examples:

Pure node (all Yes):    H = -(1Γ—logβ‚‚1) = 0
50-50 split:            H = -(0.5Γ—logβ‚‚0.5 + 0.5Γ—logβ‚‚0.5) = 1

Information Gain (Choosing the Best Feature)

Definition: Information Gain measures how much a feature reduces entropy (disorder). The feature with the highest information gain is chosen as the splitting node.

Formula:

IG(S, A) = H(S) - Ξ£ [ |Sα΅₯|/|S| Γ— H(Sα΅₯) ]
                 v∈A

Where:
  A  = feature being tested
  Sα΅₯ = subset where feature A = value v

Gini Impurity (Alternative to Entropy)

Formula:

          c
Gini = 1 - Ξ£ pα΅’Β²
          i=1

Pure node: Gini = 0
50-50:     Gini = 1 - (0.5Β² + 0.5Β²) = 0.5

4.4 ID3 Algorithm

ID3 (Iterative Dichotomiser 3) is the classic decision tree building algorithm using Information Gain.

Steps:

1. If all examples have same class β†’ return leaf node with that class.
2. If no features left β†’ return leaf with majority class.
3. Else:
   a. Calculate Information Gain for each feature.
   b. Select feature with highest IG as root.
   c. For each value of that feature:
      - Create a sub-branch.
      - Recursively apply steps 1–3 on the subset.

4.5 Worked Example β€” Entropy & Info Gain

Dataset (14 examples, 9 Yes, 5 No):

Outlook:    Sunny(5):  2Yes, 3No
            Overcast(4): 4Yes, 0No
            Rain(5):   3Yes, 2No

Step 1: Entropy of full dataset S:

H(S) = -(9/14)Γ—logβ‚‚(9/14) - (5/14)Γ—logβ‚‚(5/14)
     = -(0.643Γ—(-0.637)) - (0.357Γ—(-1.485))
     = 0.410 + 0.530
     = 0.940

Step 2: Entropy of each subset for Outlook:

H(Sunny)    = -(2/5)logβ‚‚(2/5) - (3/5)logβ‚‚(3/5)
            = -(0.4Γ—(-1.322)) - (0.6Γ—(-0.737))
            = 0.529 + 0.442 = 0.971

H(Overcast) = -(4/4)logβ‚‚(4/4) = -(1Γ—0) = 0.000   (pure node)

H(Rain)     = -(3/5)logβ‚‚(3/5) - (2/5)logβ‚‚(2/5)
            = 0.971

Step 3: Information Gain for Outlook:

IG(S, Outlook) = H(S) - [5/14Γ—H(Sunny) + 4/14Γ—H(Overcast) + 5/14Γ—H(Rain)]
               = 0.940 - [5/14Γ—0.971 + 4/14Γ—0 + 5/14Γ—0.971]
               = 0.940 - [0.347 + 0 + 0.347]
               = 0.940 - 0.694
               = 0.246

If Outlook has the highest IG among all features β†’ it becomes the root node.


4.6 Overfitting and Pruning

  • A decision tree can grow too deep and overfit (memorize training data).
  • Pruning cuts unnecessary branches:
    • Pre-pruning: Stop growing when improvement is small.
    • Post-pruning: Grow full tree, then remove low-value branches.

4.7 Advantages and Disadvantages

Advantages:

βœ… Easy to understand and visualize.
βœ… No need for feature scaling.
βœ… Handles both numerical and categorical data.
βœ… Implicitly performs feature selection.

Disadvantages:

❌ Prone to overfitting (especially deep trees).
❌ Unstable β€” small changes in data can change the tree.
❌ Biased toward features with more values.

Applications:

  • Medical diagnosis
  • Credit risk scoring
  • Fraud detection
  • Customer segmentation

Section 5: Linear Regression

5.1 What is Linear Regression?

Definition:

Linear Regression is a supervised learning algorithm used for regression (predicting continuous values). It models the relationship between input features (X) and the output (Y) as a straight line (linear equation).


5.2 The Linear Equation

Simple Linear Regression (one feature):

ΕΆ = wβ‚€ + w₁X

Where:
  ΕΆ  = predicted value
  X  = input feature
  wβ‚€ = intercept (bias) β€” where the line crosses Y-axis
  w₁ = slope (weight) β€” how much Y changes per unit of X

Multiple Linear Regression (n features):

ΕΆ = wβ‚€ + w₁X₁ + wβ‚‚Xβ‚‚ + ... + wβ‚™Xβ‚™

Or in matrix form:
ΕΆ = Xw

Diagram:

  Y
  β”‚           /
  β”‚         /  ← Regression Line (ΕΆ = wβ‚€ + w₁X)
  β”‚       / ●
  β”‚     /●
  β”‚   /   ●
  β”‚ /  ●
  β”‚/●
  └────────────→ X

5.3 Cost Function β€” Mean Squared Error (MSE)

We measure how well the line fits using Mean Squared Error:

         1   m
MSE  =  ─── Ξ£ (yα΅’ - Ε·α΅’)Β²
         m  i=1

Where:
  yα΅’  = actual value
  Ε·α΅’  = predicted value
  m   = number of samples

The goal of training is to minimize MSE by finding the best values of wβ‚€ and w₁.


5.4 Finding Optimal Weights

Method 1: Ordinary Least Squares (Closed-form)

Analytical solution:
  w = (Xα΅€X)⁻¹ Xα΅€y

Best for small datasets with few features.

Method 2: Gradient Descent (Iterative)

Repeat until convergence:

  wβ‚€ := wβ‚€ - Ξ± Γ— βˆ‚MSE/βˆ‚wβ‚€
  w₁ := w₁ - Ξ± Γ— βˆ‚MSE/βˆ‚w₁

Where Ξ± = learning rate

Partial derivatives:
  βˆ‚MSE/βˆ‚wβ‚€ = (-2/m) Ξ£(yα΅’ - Ε·α΅’)
  βˆ‚MSE/βˆ‚w₁ = (-2/m) Ξ£(yα΅’ - Ε·α΅’)Γ—xα΅’

5.5 Worked Example

Data:

X (hours studied)Y (marks)
150
260
370
480

Using formulas:

n = 4
Ξ£X  = 1+2+3+4 = 10
Ξ£Y  = 50+60+70+80 = 260
Ξ£XY = (1Γ—50)+(2Γ—60)+(3Γ—70)+(4Γ—80) = 50+120+210+320 = 700
Ξ£XΒ² = 1+4+9+16 = 30

w₁ = (nΞ£XY - Ξ£XΞ£Y) / (nΞ£XΒ² - (Ξ£X)Β²)
   = (4Γ—700 - 10Γ—260) / (4Γ—30 - 100)
   = (2800 - 2600) / (120 - 100)
   = 200 / 20
   = 10

wβ‚€ = (Ξ£Y - w₁ΣX) / n
   = (260 - 10Γ—10) / 4
   = (260 - 100) / 4
   = 40

Equation:  ΕΆ = 40 + 10X

Predict for X=5:  ΕΆ = 40 + 10Γ—5 = 90 marks

5.6 Advantages and Disadvantages

Advantages:

βœ… Simple and fast.
βœ… Easily interpretable (slope tells effect of each feature).
βœ… Works well when relationship is truly linear.

Disadvantages:

❌ Assumes a linear relationship (fails on non-linear data).
❌ Sensitive to outliers.
❌ Assumes features are independent (multicollinearity hurts it).


Section 6: Logistic Regression

6.1 What is Logistic Regression?

Definition:

Logistic Regression is a supervised learning algorithm used for binary classification (output is 0 or 1). Despite the name, it is a classification algorithm, not regression.

It models the probability that an input belongs to a class using the sigmoid function.


6.2 The Sigmoid Function

         1
Οƒ(z) = ──────────
        1 + e^(-z)

Where:
  z = wβ‚€ + w₁X₁ + wβ‚‚Xβ‚‚ + ... = linear combination of inputs

Output: always between 0 and 1 (interpreted as probability)

6.3 Decision Rule

        β”Œ 1  (Class 1)  if P(Y=1|X) β‰₯ 0.5   [i.e., z β‰₯ 0]
ΕΆ  =   β”‚
        β”” 0  (Class 0)  if P(Y=1|X) < 0.5   [i.e., z < 0]

The decision boundary is the line where z = 0.


6.4 Cost Function β€” Log Loss (Binary Cross-Entropy)

J(w) = -(1/m) Ξ£ [yα΅’ log(Ε·α΅’) + (1-yα΅’) log(1-Ε·α΅’)]

Where:
  yα΅’  = actual label (0 or 1)
  Ε·α΅’  = predicted probability

Weights are updated using gradient descent to minimize J(w).


6.5 Linear Regression vs Logistic Regression

FeatureLinear RegressionLogistic Regression
TaskRegressionClassification
OutputContinuous valueProbability (0–1)
FunctionLinear (y = wx + b)Sigmoid (Οƒ(z))
LossMSELog Loss
DecisionNo thresholdThreshold at 0.5
ExamplePredict house pricePredict spam or not

6.6 Advantages and Disadvantages

Advantages:

βœ… Probabilistic output (useful for confidence scores).
βœ… Fast to train and predict.
βœ… Easy to interpret with feature weights.
βœ… Works well for linearly separable data.

Disadvantages:

❌ Assumes linear decision boundary.
❌ Fails on complex, non-linear problems.
❌ Sensitive to outliers and correlated features.

Applications:

  • Email spam detection
  • Disease prediction (diabetes, cancer risk)
  • Credit approval
  • Customer churn prediction

Section 7: Support Vector Machines (SVM)

7.1 What is SVM?

Definition:

A Support Vector Machine (SVM) is a supervised learning algorithm that finds the best hyperplane (decision boundary) that maximally separates the two classes.

Key Idea: Don't just find any line that separates classes β€” find the one with the largest margin (gap) between classes.


7.2 Key Concepts

Hyperplane

A hyperplane is a decision boundary that separates two classes.

  • In 2D: a line.
  • In 3D: a plane.
  • In n-D: a hyperplane.

Equation:

wΒ·x + b = 0

Where:
  w = weight vector (normal to hyperplane)
  x = input feature vector
  b = bias

Support Vectors

Support Vectors are the data points closest to the hyperplane (from each class). They are the critical points β€” they "support" or define the hyperplane.


Margin

The margin is the total distance between the two parallel boundary lines (one touching each class's closest points).

       Margin
      ←──────→
───────────────────   ← Upper boundary  (wΒ·x + b = +1)
                  ●
    ●  ●  ●
──────────────────── ← Optimal Hyperplane  (wΒ·x + b = 0)
         β–   β–   β– 
              β– 
───────────────────   ← Lower boundary  (wΒ·x + b = -1)

● = Class +1    β–  = Class -1

Margin width:

        2
M  =  ─────
       ||w||

SVM maximizes M β†’ minimizes ||w||.


7.3 Hard Margin vs Soft Margin

Hard Margin SVMSoft Margin SVM
AssumptionData is perfectly separableAllows some misclassifications
ConstraintAll points outside marginSome points can be inside margin
ParameterNoneC (penalty parameter)
SensitivityVery sensitive to outliersMore robust

C parameter (regularization):

  • High C β†’ Small margin, fewer errors on training data (risk of overfit).
  • Low C β†’ Wide margin, allows more errors (better generalization).

7.4 The Kernel Trick

Problem: Data is not always linearly separable in the original feature space.

Solution: The kernel trick maps data to a higher-dimensional space where it becomes separable, without explicitly computing the transformation.

Original 2D space         Higher-dimensional space
   (not separable)            (separable)
       ●●  β– β–                  ●  ●
     β–     ●   β–       ────►        ─────────
       ●●  β– β–                β–   β– 

Common Kernels:

KernelFormulaUse Case
LinearK(x,y) = xα΅€yLinearly separable data
PolynomialK(x,y) = (xα΅€y + c)ᡈPolynomial boundaries
RBF / GaussianK(x,y) = exp(-Ξ³
SigmoidK(x,y) = tanh(Ξ±xα΅€y + c)Neural network-like

7.5 SVM for Multi-Class

SVM is inherently binary. Two strategies to extend it:

  • One-vs-One (OvO): Train one SVM per pair of classes.
  • One-vs-All (OvA): Train one SVM per class vs all others.

7.6 Advantages and Disadvantages

Advantages:

βœ… Works well with high-dimensional data.
βœ… Effective when number of features > number of samples.
βœ… Memory efficient (only support vectors matter).
βœ… Kernel trick handles non-linear problems.
βœ… Robust to overfitting (especially with soft margin).

Disadvantages:

❌ Slow on large datasets (quadratic programming).
❌ Sensitive to feature scaling (needs normalization).
❌ Choosing the right kernel and C is tricky.
❌ Hard to interpret compared to Decision Trees.

Applications:

  • Image classification
  • Text and document categorization
  • Bioinformatics (protein classification)
  • Face detection

Quick Revision Points

Algorithms at a Glance:

AlgorithmTypeKey IdeaKey Formula
k-NNClassificationMajority vote of k nearest neighboursEuclidean distance
Naïve BayesClassificationBayes theorem + independence assumptionP(C|X) ∝ P(X|C)P(C)
Decision TreeBothSplit on highest info gainIG = H(S) - Ξ£H(Sα΅₯)
Linear RegressionRegressionFit a line to minimize errorΕΆ = wβ‚€ + w₁X
Logistic RegressionClassificationSigmoid maps to probabilityσ(z) = 1/(1+e^-z)
SVMBothMaximize margin between classesMaximize 2/||w||

Key Formulas:

Euclidean Distance:   d = √[Σ(aᡒ-bᡒ)²]
Entropy:              H = -Ξ£ pα΅’ logβ‚‚(pα΅’)
Information Gain:     IG = H(S) - Ξ£|Sα΅₯|/|S| H(Sα΅₯)
Sigmoid:              Οƒ(z) = 1 / (1 + e^(-z))
SVM Margin:           M = 2 / ||w||
Linear Regression:    ΕΆ = wβ‚€ + w₁X
MSE:                  (1/m) Ξ£(yα΅’ - Ε·α΅’)Β²

Classification vs Regression:

AlgorithmClassificationRegression
k-NNβœ…βœ…
NaΓ―ve Bayesβœ…βŒ
Decision Treeβœ…βœ…
Linear RegressionβŒβœ…
Logistic Regressionβœ…βŒ
SVMβœ…βœ… (SVR)

Expected Exam Questions

PYQs will be added after analysis β€” check back soon.


These notes were compiled by Deepak Modi
Last updated: May 2026

Found an error or want to contribute?

This content is open-source and maintained by the community. Help us improve it!