Home → Deep Learning → Sentiment Analysis RNN Project

šŸ’¬ Sentiment Analysis with RNNs

Build a text sentiment classifier using LSTM/GRU networks with attention mechanisms

šŸš€ Advanced ā±ļø 7 hours šŸ’» Python + TensorFlow šŸ“ NLP Project

šŸŽÆ Project Overview

Sentiment analysis powers product reviews, social media monitoring, customer feedback analysis, and brand reputation management. In this project, you'll build an LSTM-based sentiment classifier achieving 90%+ accuracy on IMDB movie reviews.

Real-World Applications

  • Customer Feedback: Analyze millions of reviews automatically
  • Social Media Monitoring: Track brand sentiment on Twitter, Reddit
  • Financial Markets: Predict stock movements from news sentiment
  • Product Development: Identify pain points from user feedback
  • Political Analysis: Gauge public opinion on policies

What You'll Build

  • Text Preprocessing Pipeline: Tokenization, padding, vocabulary building
  • Word Embeddings: Learn dense vector representations
  • LSTM Model: Capture sequential dependencies in text
  • Bidirectional RNN: Process text forward and backward
  • Attention Mechanism: Focus on important words
  • Model Comparison: Simple RNN, LSTM, GRU, Bi-LSTM

šŸš€ High Demand: Sentiment analysis is the #1 NLP task in industry. This project demonstrates your ability to build production-ready text classifiers!

šŸ“Š Dataset & Setup

1 Install Dependencies

pip install tensorflow numpy matplotlib seaborn scikit-learn wordcloud

2 Load IMDB Dataset

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix
from wordcloud import WordCloud

# Load IMDB dataset (50,000 movie reviews)
(X_train, y_train), (X_test, y_test) = keras.datasets.imdb.load_data(num_words=10000)

print("Dataset Info:")
print(f"Training samples: {len(X_train)}")  # 25,000
print(f"Test samples: {len(X_test)}")        # 25,000
print(f"Classes: Binary (0=negative, 1=positive)")
print(f"\nClass distribution:")
print(f"Train - Positive: {sum(y_train)}, Negative: {len(y_train) - sum(y_train)}")
print(f"Test - Positive: {sum(y_test)}, Negative: {len(y_test) - sum(y_test)}")

# Example review (encoded as integers)
print(f"\nExample review (first 10 words): {X_train[0][:10]}")
print(f"Label: {y_train[0]} ({'Positive' if y_train[0] == 1 else 'Negative'})")

šŸ’” IMDB Dataset: 50,000 movie reviews (25k train, 25k test), perfectly balanced between positive and negative. Reviews are already tokenized as integer sequences. Vocabulary limited to 10,000 most frequent words.

šŸ“Š Part 1: Text Preprocessing

Decode and Analyze Reviews

# Get word index (word → integer mapping)
word_index = keras.datasets.imdb.get_word_index()
reverse_word_index = {value: key for key, value in word_index.items()}

def decode_review(encoded_review):
    """Convert integer sequence back to text"""
    # Note: indices are offset by 3 (0=padding, 1=start, 2=unknown)
    return ' '.join([reverse_word_index.get(i - 3, '?') for i in encoded_review])

# Display sample reviews
print("SAMPLE REVIEWS:")
print("="*80)
for i in range(3):
    sentiment = "POSITIVE" if y_train[i] == 1 else "NEGATIVE"
    print(f"\n{sentiment} Review {i+1}:")
    print(decode_review(X_train[i])[:300] + "...")

# Review length statistics
review_lengths = [len(review) for review in X_train]
print(f"\nReview Length Statistics:")
print(f"Mean: {np.mean(review_lengths):.0f} words")
print(f"Median: {np.median(review_lengths):.0f} words")
print(f"Max: {max(review_lengths)} words")
print(f"Min: {min(review_lengths)} words")

# Visualize length distribution
plt.figure(figsize=(12, 5))
plt.hist(review_lengths, bins=50, edgecolor='black', alpha=0.7, color='#06b6d4')
plt.axvline(np.mean(review_lengths), color='red', linestyle='--', label=f'Mean: {np.mean(review_lengths):.0f}')
plt.axvline(250, color='green', linestyle='--', label='Max length: 250')
plt.xlabel('Review Length (words)')
plt.ylabel('Frequency')
plt.title('Review Length Distribution')
plt.legend()
plt.tight_layout()
plt.show()

Pad Sequences

# Pad sequences to uniform length
max_length = 250  # Truncate/pad to 250 words

X_train_padded = pad_sequences(X_train, maxlen=max_length, padding='post', truncating='post')
X_test_padded = pad_sequences(X_test, maxlen=max_length, padding='post', truncating='post')

print(f"Shape after padding:")
print(f"X_train: {X_train_padded.shape}")  # (25000, 250)
print(f"X_test: {X_test_padded.shape}")    # (25000, 250)
print(f"\nExample padded review:")
print(X_train_padded[0][:20])  # First 20 tokens

Word Cloud Visualization

# Create word clouds for positive and negative reviews
positive_reviews = [decode_review(X_train[i]) for i in range(len(X_train)) if y_train[i] == 1]
negative_reviews = [decode_review(X_train[i]) for i in range(len(X_train)) if y_train[i] == 0]

positive_text = ' '.join(positive_reviews[:1000])  # Sample 1000 reviews
negative_text = ' '.join(negative_reviews[:1000])

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Positive word cloud
wc_pos = WordCloud(width=800, height=400, background_color='white', colormap='Greens').generate(positive_text)
axes[0].imshow(wc_pos, interpolation='bilinear')
axes[0].set_title('Positive Reviews Word Cloud', fontsize=16, fontweight='bold')
axes[0].axis('off')

# Negative word cloud
wc_neg = WordCloud(width=800, height=400, background_color='white', colormap='Reds').generate(negative_text)
axes[1].imshow(wc_neg, interpolation='bilinear')
axes[1].set_title('Negative Reviews Word Cloud', fontsize=16, fontweight='bold')
axes[1].axis('off')

plt.tight_layout()
plt.show()

āœ… Checkpoint 1: Text Preprocessing Complete

Data preparation done:

  • 25,000 training reviews, 25,000 test reviews
  • Balanced dataset (50% positive, 50% negative)
  • Sequences padded to 250 words
  • Vocabulary: 10,000 most frequent words

šŸ—ļø Part 2: Build LSTM Model

Simple LSTM Architecture

# Build LSTM model
vocab_size = 10000
embedding_dim = 128
lstm_units = 64

lstm_model = keras.Sequential([
    layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    layers.LSTM(lstm_units, return_sequences=False),
    layers.Dropout(0.5),
    layers.Dense(32, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')  # Binary classification
])

lstm_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

lstm_model.summary()

# Train model
print("\nšŸš€ Training LSTM model...")
history_lstm = lstm_model.fit(
    X_train_padded, y_train,
    batch_size=128,
    epochs=5,
    validation_split=0.2,
    verbose=1
)

# Evaluate
test_loss, test_accuracy = lstm_model.evaluate(X_test_padded, y_test, verbose=0)
print(f"\nšŸ“Š LSTM Test Accuracy: {test_accuracy:.2%}")

Bidirectional LSTM

# Bidirectional LSTM (processes text forward and backward)
bi_lstm_model = keras.Sequential([
    layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    layers.Bidirectional(layers.LSTM(lstm_units, return_sequences=False)),
    layers.Dropout(0.5),
    layers.Dense(32, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

bi_lstm_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("\nšŸš€ Training Bidirectional LSTM...")
history_bilstm = bi_lstm_model.fit(
    X_train_padded, y_train,
    batch_size=128,
    epochs=5,
    validation_split=0.2,
    verbose=1
)

# Evaluate
bilstm_test_loss, bilstm_test_accuracy = bi_lstm_model.evaluate(X_test_padded, y_test, verbose=0)
print(f"\nšŸ“Š Bi-LSTM Test Accuracy: {bilstm_test_accuracy:.2%}")
print(f"Improvement: +{(bilstm_test_accuracy - test_accuracy)*100:.1f}%")

GRU Model (Faster Alternative)

# GRU model (fewer parameters than LSTM)
gru_model = keras.Sequential([
    layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    layers.GRU(lstm_units, return_sequences=False),
    layers.Dropout(0.5),
    layers.Dense(32, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

gru_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("\nšŸš€ Training GRU model...")
history_gru = gru_model.fit(
    X_train_padded, y_train,
    batch_size=128,
    epochs=5,
    validation_split=0.2,
    verbose=1
)

# Evaluate
gru_test_loss, gru_test_accuracy = gru_model.evaluate(X_test_padded, y_test, verbose=0)
print(f"\nšŸ“Š GRU Test Accuracy: {gru_test_accuracy:.2%}")

Model Comparison

# Compare all models
import pandas as pd

comparison_df = pd.DataFrame({
    'Model': ['LSTM', 'Bidirectional LSTM', 'GRU'],
    'Test Accuracy': [test_accuracy, bilstm_test_accuracy, gru_test_accuracy],
    'Parameters': [
        lstm_model.count_params(),
        bi_lstm_model.count_params(),
        gru_model.count_params()
    ]
})

print("\n" + "="*60)
print("MODEL COMPARISON")
print("="*60)
print(comparison_df.to_string(index=False))

# Visualize
plt.figure(figsize=(10, 6))
x = np.arange(len(comparison_df))
bars = plt.bar(x, comparison_df['Test Accuracy'], color=['#06b6d4', '#3b82f6', '#8b5cf6'])
plt.xlabel('Model')
plt.ylabel('Test Accuracy')
plt.title('RNN Model Performance Comparison')
plt.xticks(x, comparison_df['Model'])
plt.ylim([0.8, 1.0])
plt.grid(axis='y', alpha=0.3)

# Add value labels on bars
for i, bar in enumerate(bars):
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height + 0.005,
             f'{height:.2%}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

# Best model
best_idx = comparison_df['Test Accuracy'].idxmax()
print(f"\nšŸ† Best Model: {comparison_df.loc[best_idx, 'Model']}")
print("Typically Bi-LSTM achieves 88-91% accuracy")

āœ… Checkpoint 2: RNN Models Trained

Model training complete:

  • LSTM: ~87-89% accuracy
  • Bidirectional LSTM: ~88-91% accuracy (best)
  • GRU: ~87-89% accuracy (faster training)
  • All models outperform baseline (50% random)

šŸ“Š Part 3: Evaluation & Analysis

Confusion Matrix

# Use best model (Bi-LSTM)
y_pred_probs = bi_lstm_model.predict(X_test_padded, verbose=0)
y_pred = (y_pred_probs > 0.5).astype(int).flatten()

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Negative', 'Positive'],
            yticklabels=['Negative', 'Positive'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix - Bi-LSTM')
plt.show()

# Classification report
print("\n" + "="*60)
print("CLASSIFICATION REPORT")
print("="*60)
print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))

Sample Predictions

# Test on new reviews
test_reviews = [
    "This movie was absolutely fantastic! Best film I've seen all year. The acting was superb.",
    "Terrible waste of time. The plot made no sense and the acting was awful.",
    "It was okay, nothing special but not terrible either.",
    "I loved every minute of it! Highly recommend to everyone.",
    "Boring and predictable. Couldn't wait for it to end."
]

def predict_sentiment(review_text, model=bi_lstm_model):
    """Predict sentiment for new review"""
    # Tokenize
    sequence = [[word_index.get(word, 2) for word in review_text.lower().split()]]
    # Pad
    padded = pad_sequences(sequence, maxlen=max_length, padding='post', truncating='post')
    # Predict
    prob = model.predict(padded, verbose=0)[0][0]
    sentiment = "POSITIVE" if prob > 0.5 else "NEGATIVE"
    confidence = prob if prob > 0.5 else (1 - prob)
    
    return sentiment, confidence

print("\n" + "="*60)
print("SAMPLE PREDICTIONS")
print("="*60)
for i, review in enumerate(test_reviews, 1):
    sentiment, confidence = predict_sentiment(review)
    print(f"\n{i}. Review: \"{review[:60]}...\"")
    print(f"   Prediction: {sentiment} ({confidence:.1%} confidence)")

Error Analysis

# Find misclassified examples
misclassified_idx = np.where(y_pred != y_test)[0]

print("\n" + "="*60)
print(f"MISCLASSIFIED EXAMPLES ({len(misclassified_idx)} total)")
print("="*60)

# Show 5 examples
for idx in misclassified_idx[:5]:
    review_text = decode_review(X_test[idx])
    true_sentiment = "POSITIVE" if y_test[idx] == 1 else "NEGATIVE"
    pred_sentiment = "POSITIVE" if y_pred[idx] == 1 else "NEGATIVE"
    confidence = y_pred_probs[idx][0]
    
    print(f"\nTrue: {true_sentiment} | Predicted: {pred_sentiment} ({confidence:.2f})")
    print(f"Review: {review_text[:200]}...")
    print("-" * 60)

āœ… Checkpoint 3: Evaluation Complete

Model performance analyzed:

  • 88-91% accuracy on test set
  • Balanced precision and recall
  • Model works on new unseen reviews
  • Misclassifications often on ambiguous reviews

šŸŽÆ Part 4: Attention Mechanism (Advanced)

# Custom attention layer
class AttentionLayer(layers.Layer):
    def __init__(self, **kwargs):
        super(AttentionLayer, self).__init__(**kwargs)
    
    def build(self, input_shape):
        self.W = self.add_weight(name='attention_weight',
                                 shape=(input_shape[-1], 1),
                                 initializer='random_normal',
                                 trainable=True)
        self.b = self.add_weight(name='attention_bias',
                                 shape=(input_shape[1], 1),
                                 initializer='zeros',
                                 trainable=True)
        super(AttentionLayer, self).build(input_shape)
    
    def call(self, x):
        # Compute attention scores
        e = keras.backend.tanh(keras.backend.dot(x, self.W) + self.b)
        a = keras.backend.softmax(e, axis=1)
        # Weighted sum
        output = x * a
        return keras.backend.sum(output, axis=1)

# LSTM with Attention
attention_model = keras.Sequential([
    layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    layers.Bidirectional(layers.LSTM(lstm_units, return_sequences=True)),
    AttentionLayer(),
    layers.Dropout(0.5),
    layers.Dense(32, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

attention_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("\nšŸš€ Training LSTM with Attention...")
history_attention = attention_model.fit(
    X_train_padded, y_train,
    batch_size=128,
    epochs=5,
    validation_split=0.2,
    verbose=1
)

# Evaluate
attention_test_loss, attention_test_accuracy = attention_model.evaluate(X_test_padded, y_test, verbose=0)
print(f"\nšŸŽÆ Attention Model Test Accuracy: {attention_test_accuracy:.2%}")
print(f"Improvement over Bi-LSTM: +{(attention_test_accuracy - bilstm_test_accuracy)*100:.1f}%")

šŸ’¾ Part 5: Model Deployment

# Save best model
bi_lstm_model.save('sentiment_bilstm_model.h5')
print("āœ… Model saved as: sentiment_bilstm_model.h5")

# Complete prediction pipeline
def analyze_sentiment(review_text, model_path='sentiment_bilstm_model.h5'):
    """
    Full sentiment analysis pipeline
    
    Parameters:
    -----------
    review_text : str
        Text to analyze
    model_path : str
        Path to saved model
    
    Returns:
    --------
    dict with sentiment, confidence, and explanation
    """
    # Load model
    model = keras.models.load_model(model_path)
    
    # Preprocess
    sequence = [[word_index.get(word, 2) for word in review_text.lower().split()]]
    padded = pad_sequences(sequence, maxlen=max_length, padding='post', truncating='post')
    
    # Predict
    prob = model.predict(padded, verbose=0)[0][0]
    
    # Interpret
    if prob > 0.8:
        sentiment = "Very Positive"
        emoji = "šŸ˜"
    elif prob > 0.6:
        sentiment = "Positive"
        emoji = "😊"
    elif prob > 0.4:
        sentiment = "Neutral"
        emoji = "😐"
    elif prob > 0.2:
        sentiment = "Negative"
        emoji = "šŸ˜ž"
    else:
        sentiment = "Very Negative"
        emoji = "😠"
    
    confidence = max(prob, 1 - prob)
    
    return {
        'sentiment': sentiment,
        'emoji': emoji,
        'probability': float(prob),
        'confidence': float(confidence),
        'review': review_text
    }

# Example analysis
sample_review = "This movie exceeded all my expectations! The storyline was compelling and the cinematography was breathtaking."
result = analyze_sentiment(sample_review)

print("\n" + "="*60)
print("SENTIMENT ANALYSIS RESULT")
print("="*60)
print(f"Review: {result['review']}")
print(f"\nSentiment: {result['emoji']} {result['sentiment']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Positivity Score: {result['probability']:.2f}")

šŸŽÆ Project Summary

šŸŽ‰ Incredible Work!

You've built a production-ready sentiment analysis system using state-of-the-art RNN architectures!

šŸ† Key Accomplishments

  • āœ… Processed 50,000 reviews: Text tokenization, padding, and vocabulary building
  • āœ… Trained 4 models: LSTM, Bi-LSTM, GRU, and attention-enhanced LSTM
  • āœ… Achieved 88-91% accuracy: Bi-LSTM outperforms baseline models
  • āœ… Added attention mechanism: Further 1-2% accuracy boost
  • āœ… Built prediction API: Ready for real-world deployment
  • āœ… Error analysis: Identified edge cases and limitations

šŸš€ Next Level Enhancements

  • Use Pre-trained Embeddings: GloVe or Word2Vec for better representations
  • Try BERT/Transformers: Achieve 93-95% accuracy with modern NLP
  • Multi-class Sentiment: 1-5 star rating prediction
  • Deploy as API: Flask/FastAPI for real-time analysis
  • Aspect-Based Sentiment: Identify sentiment for specific product features

šŸ’¼ Interview Talking Points:

  • "Built sentiment analysis system achieving 90% accuracy using Bidirectional LSTM"
  • "Processed 50,000 IMDB reviews with text tokenization and sequence padding"
  • "Implemented attention mechanism improving model interpretability by 2%"
  • "Deployed production-ready API for real-time sentiment prediction"