This project trains a binary classification neural network to distinguish between spam and ham (non-spam) SMS messages using TensorFlow and Keras.
The data used is the SMS Spam Collection dataset from FreeCodeCamp, consisting of two .tsv files:
train-data.tsv— training setvalid-data.tsv— validation/test set
Each line is formatted as:
<label> <TAB> <message>
Where <label> is either "ham" or "spam".
bash pip install tensorflow pip install tensorflow-datasets pip install pandas numpy matplotlib
!wget https://cdn.freecodecamp.org/project-data/sms/train-data.tsv
!wget https://cdn.freecodecamp.org/project-data/sms/valid-data.tsv
- Input: Raw SMS text string
- TextVectorization Layer: tokenizes and pads the input
- Embedding Layer: converts tokens to dense vectors (dim=16)
- GlobalAveragePooling1D: reduces sequence
- Dense Layer (24 units, ReLU): learns non-linear combinations
- Dropout (0.5): for regularization
- Output Layer (sigmoid): predicts probability of spam
- Loss: BinaryCrossentropy
- Optimizer: Adam
- Epochs: 20
- Metrics: Accuracy
- Validation Set:
valid-data.tsv
This function accepts a raw string message and returns:
[probability (0–1), "ham" or "spam"]predict_message("how are you doing today?")
# Output: [0.0123, "ham"]def predict_message(pred_text):
input_data = np.array([pred_text], dtype=object)
prediction = model.predict(input_data)
spam_prob = float(prediction[0][0])
label = 'spam' if spam_prob > 0.5 else 'ham'
return [spam_prob, label]Your model must correctly classify the following test messages:
[
"how are you doing today",
"sale today! to stop texts call 98912460324",
"i dont want to go. can we try it a different day? available sat",
"our new mobile video service is live. just install on your phone to start watching.",
"you have won £1000 cash! call to claim your prize.",
"i'll bring it tomorrow. don't forget the milk.",
"wow, is your arm alright. that happened to me one time too"
]All predictions must match expected labels ("ham" or "spam") for success.
https://colab.research.google.com/drive/1s7oLDkVU_kBd1biHSb7nmPi6Coq1gkze?authuser=2#scrollTo=lMHwYXHXCar3
Upon passing the test suite, FreeCodeCamp awards a certificate for this project as part of their Machine Learning curriculum.