Python Web Automation, Automated Product Review Aggregation for Retail

Client Background

A home‑goods brand wanted to consolidate customer reviews from Amazon, eBay, and specialty marketplaces into its BI tool for sentiment analysis.

The Challenge

Each platform used different review widgets, some of which were loaded via JS after interaction. Review pagination and “load more” buttons vary by site.

Objectives

✦ Scrape review text, rating, date, and reviewer metadata
✦ Handle pagination and dynamic loading
✦ Output unified JSON for NLP pipelines

Our Approach

𝐏𝐥𝐚𝐧𝐧𝐢𝐧𝐠: Defined common review schema; mapped DOM selectors per site
𝐒𝐜𝐫𝐚𝐩𝐢𝐧𝐠: Combined Scrapy + Selenium spiders to click through “load more” and infinite lists
𝐑𝐚𝐭𝐞 𝐋𝐢𝐦𝐢𝐭𝐢𝐧𝐠: Implemented per‑site request throttling and proxy pools
𝐕𝐚𝐥𝐢𝐝𝐚𝐭𝐢𝐨𝐧: Applied schema checks; flagged missing or malformed entries

Results & Impact

✦ Collected 50K+ reviews weekly, 100% schema compliance
✦ Provided real‑time review feeds into the client’s sentiment dashboard
✦ Cut review gathering time from 40+ hours to under 2 hours/month

Tools & Technologies

Python, Scrapy, Selenium, JSON Schema, AWS Lambda

Client Testimonial

“Good understanding and great coding skills! Job was done perfectly”

Add your Comment