This project automates the process of scraping, analyzing, and summarizing customer reviews of the top-selling laptops on Amazon.nl, with a focus on extracting product-specific insights using GPT-4o.


🚀 Pipeline Overview

1. Scrape Bestselling Laptops

2. Collect Reviews

  • For each ASIN, fetch customer reviews using Selenium and BeautifulSoup.
  • Output: data/laptop_reviews/{ASIN}.json

3. Summarize Reviews with GPT-4o

  • GPT-4o is used to extract product-specific praises and complaints, excluding unrelated content (e.g. shipping, seller).
  • Input: JSON reviews per product.
  • Output: results/laptop_review_summaries/{ASIN}.json

4. Merge Summaries with Metadata

  • Combines review summaries with price, rating, and ASIN from original CSV.
  • Adds brand and model parsed from title.
  • Final output: results/final_laptop_dataset.csv

💡 Highlight: GPT-4o Summarization

We use GPT-4o to:

  • Parse dozens of user reviews.
  • Extract only meaningful, product-specific comments.
  • Output structured, interpretable summaries for business insight.

This approach avoids noisy data from non-product-related content and makes the dataset highly usable for ML or analysis.

You can use it for downstream analysis, dashboards, or ML pipelines.