Welcome to the AI Training Exercise
This tool walks you through building a real machine learning model, a spam classifier, from scratch. No programming knowledge is needed. Each step mirrors what a data scientist actually does.
A developer writes explicit rules: "if the message contains 'winner' and a URL, mark it as spam." These rules are fast and predictable, but they must be written and maintained by hand. Attackers can evade them simply by rewording their messages.
Instead of writing rules, you collect labelled examples and let the model find the patterns itself. Classic code still measures properties of each message (that is what Step 3, Features, does), but the model decides how to weigh them based on the training data.
Step 2: Inspect Data
Before training any model, you need to understand your data. Review the dataset below. Some rows have problems. Can you spot them? The summary panel shows what was found automatically.
- By hand. A human reviews each message and assigns a label. This is called data annotation. It is reliable but slow: it can take hours or weeks of work depending on dataset size.
- By a program. A developer writes rules such as keyword lists, regular expressions, and sender blocklists that automatically assign labels. This is faster, but the labels are only as good as the rules, and edge cases are easy to miss.
Either way, the quality of the labels directly determines how well the model learns. A wrong label teaches the model the wrong thing, which is exactly why the next step exists.
| # | Message | Label | Issues |
|---|
Step 3: Clean Data
Fix all data quality issues before training. Use the controls to correct mislabels, remove duplicates, and assign missing labels. All issues must be resolved before you can proceed.
| # | Message | Label | Action |
|---|
Step 4: Select Features
Features are the measurable properties of a message that the model will learn from. Select at least 2 features. The preview panel shows computed values for a sample of your data.
Step 5: Train the Model
Configure and train a neural network on your cleaned data. Watch the loss and accuracy curves update in real time as the model learns.
Step 6: Test the Model
Try your trained model on new messages. Type or paste a message below, then click Classify to see the prediction.