šÆ Project Overview
Sentiment analysis powers product reviews, social media monitoring, customer feedback analysis, and brand reputation management. In this project, you'll build an LSTM-based sentiment classifier achieving 90%+ accuracy on IMDB movie reviews.
Real-World Applications
- Customer Feedback: Analyze millions of reviews automatically
- Social Media Monitoring: Track brand sentiment on Twitter, Reddit
- Financial Markets: Predict stock movements from news sentiment
- Product Development: Identify pain points from user feedback
- Political Analysis: Gauge public opinion on policies
What You'll Build
- Text Preprocessing Pipeline: Tokenization, padding, vocabulary building
- Word Embeddings: Learn dense vector representations
- LSTM Model: Capture sequential dependencies in text
- Bidirectional RNN: Process text forward and backward
- Attention Mechanism: Focus on important words
- Model Comparison: Simple RNN, LSTM, GRU, Bi-LSTM
š High Demand: Sentiment analysis is the #1 NLP task in industry. This project demonstrates your ability to build production-ready text classifiers!
š Dataset & Setup
1 Install Dependencies
pip install tensorflow numpy matplotlib seaborn scikit-learn wordcloud
2 Load IMDB Dataset
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix
from wordcloud import WordCloud
# Load IMDB dataset (50,000 movie reviews)
(X_train, y_train), (X_test, y_test) = keras.datasets.imdb.load_data(num_words=10000)
print("Dataset Info:")
print(f"Training samples: {len(X_train)}") # 25,000
print(f"Test samples: {len(X_test)}") # 25,000
print(f"Classes: Binary (0=negative, 1=positive)")
print(f"\nClass distribution:")
print(f"Train - Positive: {sum(y_train)}, Negative: {len(y_train) - sum(y_train)}")
print(f"Test - Positive: {sum(y_test)}, Negative: {len(y_test) - sum(y_test)}")
# Example review (encoded as integers)
print(f"\nExample review (first 10 words): {X_train[0][:10]}")
print(f"Label: {y_train[0]} ({'Positive' if y_train[0] == 1 else 'Negative'})")
š” IMDB Dataset: 50,000 movie reviews (25k train, 25k test), perfectly balanced between positive and negative. Reviews are already tokenized as integer sequences. Vocabulary limited to 10,000 most frequent words.
š Part 1: Text Preprocessing
Decode and Analyze Reviews
# Get word index (word ā integer mapping)
word_index = keras.datasets.imdb.get_word_index()
reverse_word_index = {value: key for key, value in word_index.items()}
def decode_review(encoded_review):
"""Convert integer sequence back to text"""
# Note: indices are offset by 3 (0=padding, 1=start, 2=unknown)
return ' '.join([reverse_word_index.get(i - 3, '?') for i in encoded_review])
# Display sample reviews
print("SAMPLE REVIEWS:")
print("="*80)
for i in range(3):
sentiment = "POSITIVE" if y_train[i] == 1 else "NEGATIVE"
print(f"\n{sentiment} Review {i+1}:")
print(decode_review(X_train[i])[:300] + "...")
# Review length statistics
review_lengths = [len(review) for review in X_train]
print(f"\nReview Length Statistics:")
print(f"Mean: {np.mean(review_lengths):.0f} words")
print(f"Median: {np.median(review_lengths):.0f} words")
print(f"Max: {max(review_lengths)} words")
print(f"Min: {min(review_lengths)} words")
# Visualize length distribution
plt.figure(figsize=(12, 5))
plt.hist(review_lengths, bins=50, edgecolor='black', alpha=0.7, color='#06b6d4')
plt.axvline(np.mean(review_lengths), color='red', linestyle='--', label=f'Mean: {np.mean(review_lengths):.0f}')
plt.axvline(250, color='green', linestyle='--', label='Max length: 250')
plt.xlabel('Review Length (words)')
plt.ylabel('Frequency')
plt.title('Review Length Distribution')
plt.legend()
plt.tight_layout()
plt.show()
Pad Sequences
# Pad sequences to uniform length
max_length = 250 # Truncate/pad to 250 words
X_train_padded = pad_sequences(X_train, maxlen=max_length, padding='post', truncating='post')
X_test_padded = pad_sequences(X_test, maxlen=max_length, padding='post', truncating='post')
print(f"Shape after padding:")
print(f"X_train: {X_train_padded.shape}") # (25000, 250)
print(f"X_test: {X_test_padded.shape}") # (25000, 250)
print(f"\nExample padded review:")
print(X_train_padded[0][:20]) # First 20 tokens
Word Cloud Visualization
# Create word clouds for positive and negative reviews
positive_reviews = [decode_review(X_train[i]) for i in range(len(X_train)) if y_train[i] == 1]
negative_reviews = [decode_review(X_train[i]) for i in range(len(X_train)) if y_train[i] == 0]
positive_text = ' '.join(positive_reviews[:1000]) # Sample 1000 reviews
negative_text = ' '.join(negative_reviews[:1000])
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
# Positive word cloud
wc_pos = WordCloud(width=800, height=400, background_color='white', colormap='Greens').generate(positive_text)
axes[0].imshow(wc_pos, interpolation='bilinear')
axes[0].set_title('Positive Reviews Word Cloud', fontsize=16, fontweight='bold')
axes[0].axis('off')
# Negative word cloud
wc_neg = WordCloud(width=800, height=400, background_color='white', colormap='Reds').generate(negative_text)
axes[1].imshow(wc_neg, interpolation='bilinear')
axes[1].set_title('Negative Reviews Word Cloud', fontsize=16, fontweight='bold')
axes[1].axis('off')
plt.tight_layout()
plt.show()
ā Checkpoint 1: Text Preprocessing Complete
Data preparation done:
- 25,000 training reviews, 25,000 test reviews
- Balanced dataset (50% positive, 50% negative)
- Sequences padded to 250 words
- Vocabulary: 10,000 most frequent words
šļø Part 2: Build LSTM Model
Simple LSTM Architecture
# Build LSTM model
vocab_size = 10000
embedding_dim = 128
lstm_units = 64
lstm_model = keras.Sequential([
layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
layers.LSTM(lstm_units, return_sequences=False),
layers.Dropout(0.5),
layers.Dense(32, activation='relu'),
layers.Dropout(0.5),
layers.Dense(1, activation='sigmoid') # Binary classification
])
lstm_model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
lstm_model.summary()
# Train model
print("\nš Training LSTM model...")
history_lstm = lstm_model.fit(
X_train_padded, y_train,
batch_size=128,
epochs=5,
validation_split=0.2,
verbose=1
)
# Evaluate
test_loss, test_accuracy = lstm_model.evaluate(X_test_padded, y_test, verbose=0)
print(f"\nš LSTM Test Accuracy: {test_accuracy:.2%}")
Bidirectional LSTM
# Bidirectional LSTM (processes text forward and backward)
bi_lstm_model = keras.Sequential([
layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
layers.Bidirectional(layers.LSTM(lstm_units, return_sequences=False)),
layers.Dropout(0.5),
layers.Dense(32, activation='relu'),
layers.Dropout(0.5),
layers.Dense(1, activation='sigmoid')
])
bi_lstm_model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
print("\nš Training Bidirectional LSTM...")
history_bilstm = bi_lstm_model.fit(
X_train_padded, y_train,
batch_size=128,
epochs=5,
validation_split=0.2,
verbose=1
)
# Evaluate
bilstm_test_loss, bilstm_test_accuracy = bi_lstm_model.evaluate(X_test_padded, y_test, verbose=0)
print(f"\nš Bi-LSTM Test Accuracy: {bilstm_test_accuracy:.2%}")
print(f"Improvement: +{(bilstm_test_accuracy - test_accuracy)*100:.1f}%")
GRU Model (Faster Alternative)
# GRU model (fewer parameters than LSTM)
gru_model = keras.Sequential([
layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
layers.GRU(lstm_units, return_sequences=False),
layers.Dropout(0.5),
layers.Dense(32, activation='relu'),
layers.Dropout(0.5),
layers.Dense(1, activation='sigmoid')
])
gru_model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
print("\nš Training GRU model...")
history_gru = gru_model.fit(
X_train_padded, y_train,
batch_size=128,
epochs=5,
validation_split=0.2,
verbose=1
)
# Evaluate
gru_test_loss, gru_test_accuracy = gru_model.evaluate(X_test_padded, y_test, verbose=0)
print(f"\nš GRU Test Accuracy: {gru_test_accuracy:.2%}")
Model Comparison
# Compare all models
import pandas as pd
comparison_df = pd.DataFrame({
'Model': ['LSTM', 'Bidirectional LSTM', 'GRU'],
'Test Accuracy': [test_accuracy, bilstm_test_accuracy, gru_test_accuracy],
'Parameters': [
lstm_model.count_params(),
bi_lstm_model.count_params(),
gru_model.count_params()
]
})
print("\n" + "="*60)
print("MODEL COMPARISON")
print("="*60)
print(comparison_df.to_string(index=False))
# Visualize
plt.figure(figsize=(10, 6))
x = np.arange(len(comparison_df))
bars = plt.bar(x, comparison_df['Test Accuracy'], color=['#06b6d4', '#3b82f6', '#8b5cf6'])
plt.xlabel('Model')
plt.ylabel('Test Accuracy')
plt.title('RNN Model Performance Comparison')
plt.xticks(x, comparison_df['Model'])
plt.ylim([0.8, 1.0])
plt.grid(axis='y', alpha=0.3)
# Add value labels on bars
for i, bar in enumerate(bars):
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height + 0.005,
f'{height:.2%}', ha='center', va='bottom', fontweight='bold')
plt.tight_layout()
plt.show()
# Best model
best_idx = comparison_df['Test Accuracy'].idxmax()
print(f"\nš Best Model: {comparison_df.loc[best_idx, 'Model']}")
print("Typically Bi-LSTM achieves 88-91% accuracy")
ā Checkpoint 2: RNN Models Trained
Model training complete:
- LSTM: ~87-89% accuracy
- Bidirectional LSTM: ~88-91% accuracy (best)
- GRU: ~87-89% accuracy (faster training)
- All models outperform baseline (50% random)
š Part 3: Evaluation & Analysis
Confusion Matrix
# Use best model (Bi-LSTM)
y_pred_probs = bi_lstm_model.predict(X_test_padded, verbose=0)
y_pred = (y_pred_probs > 0.5).astype(int).flatten()
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=['Negative', 'Positive'],
yticklabels=['Negative', 'Positive'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix - Bi-LSTM')
plt.show()
# Classification report
print("\n" + "="*60)
print("CLASSIFICATION REPORT")
print("="*60)
print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))
Sample Predictions
# Test on new reviews
test_reviews = [
"This movie was absolutely fantastic! Best film I've seen all year. The acting was superb.",
"Terrible waste of time. The plot made no sense and the acting was awful.",
"It was okay, nothing special but not terrible either.",
"I loved every minute of it! Highly recommend to everyone.",
"Boring and predictable. Couldn't wait for it to end."
]
def predict_sentiment(review_text, model=bi_lstm_model):
"""Predict sentiment for new review"""
# Tokenize
sequence = [[word_index.get(word, 2) for word in review_text.lower().split()]]
# Pad
padded = pad_sequences(sequence, maxlen=max_length, padding='post', truncating='post')
# Predict
prob = model.predict(padded, verbose=0)[0][0]
sentiment = "POSITIVE" if prob > 0.5 else "NEGATIVE"
confidence = prob if prob > 0.5 else (1 - prob)
return sentiment, confidence
print("\n" + "="*60)
print("SAMPLE PREDICTIONS")
print("="*60)
for i, review in enumerate(test_reviews, 1):
sentiment, confidence = predict_sentiment(review)
print(f"\n{i}. Review: \"{review[:60]}...\"")
print(f" Prediction: {sentiment} ({confidence:.1%} confidence)")
Error Analysis
# Find misclassified examples
misclassified_idx = np.where(y_pred != y_test)[0]
print("\n" + "="*60)
print(f"MISCLASSIFIED EXAMPLES ({len(misclassified_idx)} total)")
print("="*60)
# Show 5 examples
for idx in misclassified_idx[:5]:
review_text = decode_review(X_test[idx])
true_sentiment = "POSITIVE" if y_test[idx] == 1 else "NEGATIVE"
pred_sentiment = "POSITIVE" if y_pred[idx] == 1 else "NEGATIVE"
confidence = y_pred_probs[idx][0]
print(f"\nTrue: {true_sentiment} | Predicted: {pred_sentiment} ({confidence:.2f})")
print(f"Review: {review_text[:200]}...")
print("-" * 60)
ā Checkpoint 3: Evaluation Complete
Model performance analyzed:
- 88-91% accuracy on test set
- Balanced precision and recall
- Model works on new unseen reviews
- Misclassifications often on ambiguous reviews
šÆ Part 4: Attention Mechanism (Advanced)
# Custom attention layer
class AttentionLayer(layers.Layer):
def __init__(self, **kwargs):
super(AttentionLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.W = self.add_weight(name='attention_weight',
shape=(input_shape[-1], 1),
initializer='random_normal',
trainable=True)
self.b = self.add_weight(name='attention_bias',
shape=(input_shape[1], 1),
initializer='zeros',
trainable=True)
super(AttentionLayer, self).build(input_shape)
def call(self, x):
# Compute attention scores
e = keras.backend.tanh(keras.backend.dot(x, self.W) + self.b)
a = keras.backend.softmax(e, axis=1)
# Weighted sum
output = x * a
return keras.backend.sum(output, axis=1)
# LSTM with Attention
attention_model = keras.Sequential([
layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
layers.Bidirectional(layers.LSTM(lstm_units, return_sequences=True)),
AttentionLayer(),
layers.Dropout(0.5),
layers.Dense(32, activation='relu'),
layers.Dropout(0.5),
layers.Dense(1, activation='sigmoid')
])
attention_model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
print("\nš Training LSTM with Attention...")
history_attention = attention_model.fit(
X_train_padded, y_train,
batch_size=128,
epochs=5,
validation_split=0.2,
verbose=1
)
# Evaluate
attention_test_loss, attention_test_accuracy = attention_model.evaluate(X_test_padded, y_test, verbose=0)
print(f"\nšÆ Attention Model Test Accuracy: {attention_test_accuracy:.2%}")
print(f"Improvement over Bi-LSTM: +{(attention_test_accuracy - bilstm_test_accuracy)*100:.1f}%")
š¾ Part 5: Model Deployment
# Save best model
bi_lstm_model.save('sentiment_bilstm_model.h5')
print("ā
Model saved as: sentiment_bilstm_model.h5")
# Complete prediction pipeline
def analyze_sentiment(review_text, model_path='sentiment_bilstm_model.h5'):
"""
Full sentiment analysis pipeline
Parameters:
-----------
review_text : str
Text to analyze
model_path : str
Path to saved model
Returns:
--------
dict with sentiment, confidence, and explanation
"""
# Load model
model = keras.models.load_model(model_path)
# Preprocess
sequence = [[word_index.get(word, 2) for word in review_text.lower().split()]]
padded = pad_sequences(sequence, maxlen=max_length, padding='post', truncating='post')
# Predict
prob = model.predict(padded, verbose=0)[0][0]
# Interpret
if prob > 0.8:
sentiment = "Very Positive"
emoji = "š"
elif prob > 0.6:
sentiment = "Positive"
emoji = "š"
elif prob > 0.4:
sentiment = "Neutral"
emoji = "š"
elif prob > 0.2:
sentiment = "Negative"
emoji = "š"
else:
sentiment = "Very Negative"
emoji = "š "
confidence = max(prob, 1 - prob)
return {
'sentiment': sentiment,
'emoji': emoji,
'probability': float(prob),
'confidence': float(confidence),
'review': review_text
}
# Example analysis
sample_review = "This movie exceeded all my expectations! The storyline was compelling and the cinematography was breathtaking."
result = analyze_sentiment(sample_review)
print("\n" + "="*60)
print("SENTIMENT ANALYSIS RESULT")
print("="*60)
print(f"Review: {result['review']}")
print(f"\nSentiment: {result['emoji']} {result['sentiment']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Positivity Score: {result['probability']:.2f}")
šÆ Project Summary
š Incredible Work!
You've built a production-ready sentiment analysis system using state-of-the-art RNN architectures!
š Key Accomplishments
- ā Processed 50,000 reviews: Text tokenization, padding, and vocabulary building
- ā Trained 4 models: LSTM, Bi-LSTM, GRU, and attention-enhanced LSTM
- ā Achieved 88-91% accuracy: Bi-LSTM outperforms baseline models
- ā Added attention mechanism: Further 1-2% accuracy boost
- ā Built prediction API: Ready for real-world deployment
- ā Error analysis: Identified edge cases and limitations
š Next Level Enhancements
- Use Pre-trained Embeddings: GloVe or Word2Vec for better representations
- Try BERT/Transformers: Achieve 93-95% accuracy with modern NLP
- Multi-class Sentiment: 1-5 star rating prediction
- Deploy as API: Flask/FastAPI for real-time analysis
- Aspect-Based Sentiment: Identify sentiment for specific product features
š¼ Interview Talking Points:
- "Built sentiment analysis system achieving 90% accuracy using Bidirectional LSTM"
- "Processed 50,000 IMDB reviews with text tokenization and sequence padding"
- "Implemented attention mechanism improving model interpretability by 2%"
- "Deployed production-ready API for real-time sentiment prediction"