NLP_P/README.md at main · 1as-sys/NLP_P · GitHub

31 lines (17 loc) · 788 Bytes

Dataset & Prep

Dataset: IMDb Movie Reviews.

Why: Widely used for sentiment tasks.

Prep: Lowercasing, punctuation removal, tokenization.

Prompt Engineering

Prompt 1: “Classify the sentiment of this review: [text]” → direct label.

Prompt 2: “Does the reviewer sound happy or upset? Review: [text]” → natural response.

Prompt 3: “Return JSON with {label, confidence} for review: [text]” → structured output.

Evaluation

Model: distilbert-base-uncased-finetuned-sst-2-english.

Metrics (tiny dataset): Accuracy 1.00, Precision 1.00, Recall 1.00, F1 1.00.

Troubleshooting

Issue: Sarcasm & negation confuse models.

Fix: Add sarcastic examples or prompt model to “consider sarcasm.”

Run pip install transformers scikit-learn python assignment.py