An end-to-end data analytics project focused on understanding user interaction behavior, quantifying funnel performance, segmenting customer purchase patterns, and providing actionable strategic recommendations to optimize sales and conversion for an e-commerce platform.
In modern e-commerce, driving traffic to a site is only half the battle. Maximizing revenue requires a deep, quantitative understanding of the customer journeyβfrom the initial product discovery (views) to cart interactions and, ultimately, purchase completion.
This project explores user activity on an e-commerce platform using event-level data. By parsing, cleaning, and aggregating raw interaction events, the project reveals exactly where users drop off in the conversion funnel, identifies high-performing products, brands, and categories, segments users based on purchase intent, and exposes critical catalog data gaps.
The primary goal of this case study is to diagnose conversion friction points, identify revenue drivers, and recommend management interventions to improve overall conversion and customer lifetime value (LTV).
The analysis is structured to answer seven critical business questions:
- Funnel Performance: How many unique users view products, add items to carts, and complete purchases?
- Merchandising Drivers: Which specific products, brands, and product categories drive the highest volume of purchases and revenue?
- Temporal Patterns: Which months, days, and hours exhibit the highest platform activity and transaction volumes?
- Funnel Drop-offs: Where is the largest percentage drop-off in the user purchasing journey?
- Cart Abandonment: Which users show high intent (adding to cart) but fail to complete a purchase, and what brands do they abandon?
- Catalog Incompleteness: To what extent are missing attributes (e.g., brand, category) impacting reporting quality and business decisions?
- Strategic Interventions: What concrete, high-priority actions should management implement to increase sales?
The primary data source is events.csv, an event-level dataset where each row captures a single user action on the platform.
| Column | Data Type | Description |
|---|---|---|
event_time |
Datetime (UTC) | The precise timestamp when the event occurred. |
event_type |
Categorical | The action taken by the user (view, cart, purchase). |
product_id |
String (ID) | Unique identifier for the product being interacted with. |
category_id |
String (ID) | Unique identifier for the product category. |
category_code |
String (Path) | Human-readable category breadcrumbs (e.g., computers.components.videocards). |
brand |
String | Brand name of the product (e.g., msi, gigabyte, asus). |
price |
Float64 | Price of the product in USD. |
user_id |
String (ID) | Unique identifier representing an individual customer. |
user_session |
String (UUID) | Unique identifier grouping events in a single browsing session. |
- Raw Volume: Over 100+ MB of raw event logs.
- Starting Point: 1,000,000+ raw interaction records before deduplication.
- Temporal Window: Late 2020 through early 2021.
To translate raw transactional logs into business indicators, the following standardized Key Performance Indicators (KPIs) are calculated:
-
Total Users (Reach):
$\text{nunique}(user_id)$ -
Total Sessions (Visits):
$\text{nunique}(user_session)$ -
Total Views, Carts, & Purchases: Event counts grouped by
event_type. -
Total Revenue:
$\sum(price \text{ where } event_type == 'purchase')$ . -
Average Order Value (AOV):
$\frac{\text{Total Revenue}}{\text{Total Purchase Events}}$ . -
User Conversion Rate:
$\frac{\text{Unique Purchasing Users}}{\text{Total Unique Users}}$ . -
View-to-Cart Conversion:
$\frac{\text{Unique Users who Added to Cart}}{\text{Unique Users who Viewed a Product}}$ . -
Cart-to-Purchase Conversion:
$\frac{\text{Unique Users who Purchased}}{\text{Unique Users who Added to Cart}}$ . -
Overall Funnel Conversion:
$\frac{\text{Unique Users who Purchased}}{\text{Unique Users who Viewed a Product}}$ .
To ensure the integrity of the downstream analysis, a strict cleaning pipeline is executed:
- Deduplication: Exact duplicates are removed to avoid artificial inflation of KPIs and funnel metrics.
- Price Validation: Non-positive and zero prices are filtered out (all pricing is confirmed positive).
-
Timestamp Standardization:
event_timeis parsed as UTC-aware datetimes and localized to naive timezone formats for clean aggregation. -
Feature Engineering: Derived fields are generated for chronological grouping:
-
event_date(normalized date) -
event_month(e.g.,2021-01) -
event_day(ordered categorical factor, e.g.,Monday$\rightarrow$ Sunday) -
event_hour(integer 0β23)
-
-
Null Imputation: Missing attributes in categorical columns are assigned descriptive strings:
- Missing
brand$\rightarrow$ "Unknown" - Missing
category_code$\rightarrow$ "Unknown"
- Missing
-
Pipeline Quality Control: The data volume progression is formally audited:
- Raw starting points: Full dataset ingestion.
- Deduplication/price clean: Excludes duplicates and bad prices.
- Clean data shape: Ready for analytical aggregation.
Based on the finalized data run, the platform's core commercial and behavioral KPIs are:
- Total Unique Users: 407,283
- Total Unique Sessions: 490,398
- Total Revenue: $5,125,113.92
- Total Purchases: 37,343
- Average Order Value (AOV): $137.24
- Overall User Conversion Rate: 5.23%
- View-to-Cart User Rate: 9.08%
- Cart-to-Purchase User Rate: 57.65%
- Cart-to-Purchase Event Rate: 69.11%
A user-level funnel reveals the exact point where customer drop-off is most severe.
graph TD
A["Viewed Users<br>406,863 (100%)"] -->|90.92% Drop-off| B["Cart Users<br>36,952 (9.08%)"]
B -->|42.35% Drop-off| C["Purchasing Users<br>21,304 (5.24% of total)"]
| Funnel Stage | Unique Users | Conversion from Previous Stage | Total Drop-off from Previous | Overall Funnel Share |
|---|---|---|---|---|
| 1. Product Views | 406,863 | 100.00% | 0.00% | 100.00% |
| 2. Cart Additions | 36,952 | 9.08% | 90.92% | 9.08% |
| 3. Purchase Completion | 21,304 | 57.65% | 42.35% | 5.24% |
Important
The Critical Friction Point: The largest funnel drop-off occurs between Product View and Cart Addition. Over 90.92% of active users browse products but never add a single item to their cart. Conversely, once a user adds an item to their cart, they are highly motivated to buy: 57.65% of cart-using customers complete a purchase.
Behavioral segmentation groups users into actionable marketing and merchandising cohorts:
- View-Only Users (Browsers): 369,083 users (90.62%)
- Behavior: Browse products but never add to cart or purchase.
- Action: Retarget with personalized recommendations or product interest campaigns.
- Cart Abandoners (High Intent, Low Conversion): 16,896 users (4.15%)
- Behavior: Added products to cart but exited before checkout.
- Action: Trigger email/SMS recovery workflows, offer small incentive discounts, or resolve checkout issues.
- Single-Purchase Buyers: 13,598 users (3.34%)
- Behavior: Completed exactly one transaction.
- Action: Onboard into post-purchase loyalty programs, gather reviews, and cross-sell related categories.
- Loyal Multi-Buyers: 7,706 users (1.89%)
- Behavior: Completed 2 or more separate transactions.
- Action: VIP offers, early product access, and refer-a-friend programs.
- High-Value Users: 2,131 users (0.52%)
- Behavior: Top 10% of all purchasing customers based on cumulative revenue.
- Action: High-priority customer support, premium VIP tier membership.
The platform's sales are heavily concentrated in computer-centric equipment, specifically DIY hardware.
- Computers, Components, Video Cards: $2,604,764.80 (6,888 purchases)
- Category Code Missing (Unknown): $493,896.91 (7,566 purchases)
- Computers, Peripherals, Printers: $364,566.80 (2,557 purchases)
- Computers, Components, CPUs: $219,168.98 (1,213 purchases)
- Electronics, Telephones: $150,915.69 (4,119 purchases)
- Stationery, Cartridges: $103,595.99 (2,739 purchases)
- Computers, Components, Motherboards: $102,871.16 (1,266 purchases)
- Electronics, Audio, Acoustics: $84,724.80 (848 purchases)
- Auto Accessories, Media Players: $81,558.66 (685 purchases)
- Computers, Notebooks: $63,604.79 (1,297 purchases)
- MSI: $643,492.34 (1,788 purchases)
- Brand Name Missing (Unknown): $594,720.15 (9,500 purchases)
- Gigabyte: $556,183.04 (1,741 purchases)
- Palit: $484,210.58 (1,077 purchases)
- Asus: $330,147.11 (1,261 purchases)
- Sapphire: $306,192.94 (917 purchases)
- AMD: $191,987.40 (1,020 purchases)
- Canon: $137,964.79 (1,045 purchases)
- PowerColor: $98,804.38 (272 purchases)
- Epson: $98,262.95 (462 purchases)
Warning
Data Quality Risk: Missing brands account for $594,720.15 (11.60% of total revenue) and missing categories account for $493,896.91 (9.64% of total revenue). This represents a significant reporting gap. Management cannot accurately assess the market share of brand-specific campaigns or execute inventory planning when ~10% of revenue flows through unattributed fields.
The analysis identified a highly critical subset of products: items that attract thousands of views but achieve 0 sales.
| Product ID | Views | Purchases | Purchase Rate | View-Purchase Gap |
|---|---|---|---|---|
| 229036 | 1,150 | 0 | 0.00% | 1,150 |
| 775032 | 1,086 | 0 | 0.00% | 1,086 |
| 83737 | 995 | 0 | 0.00% | 995 |
| 1674260 | 946 | 0 | 0.00% | 946 |
| 589968 | 891 | 0 | 0.00% | 891 |
| 1255165 | 879 | 0 | 0.00% | 879 |
| 698803 | 724 | 0 | 0.00% | 724 |
| 261923 | 615 | 0 | 0.00% | 615 |
| 3830671 | 610 | 0 | 0.00% | 610 |
| 145379 | 513 | 0 | 0.00% | 513 |
Tip
Page-Level Optimization Opportunity: These products represent massive pre-qualified traffic that is failing to convert. The 0% purchase rate strongly points to catalog-specific errors: broken checkout buttons, out-of-stock listings, uncompetitive pricing compared to competitors, or highly negative user reviews displayed on the product detail page.
Based on the quantitative outputs, the following four strategic initiatives should be prioritized:
- The Issue: 90.92% drop-off from view to cart.
- Action Plan:
- Incentivize Add-to-Cart Action: Add trust elements (free shipping indicators, warranty badges) next to the "Add to Cart" button.
- Dynamic Social Proof: Implement live interest indicators (e.g., "14 users viewed this in the last hour") to create urgency.
- Enhance Product Persuasion: Streamline product image galleries, integrate high-quality user reviews, and clarify technical specifications.
- The Issue: 16,896 users (4.15% of all users) abandoned their carts.
- Action Plan:
- Behavioral Email Triggers: Set up an automated three-part cart abandonment email sequence:
- Email 1 (Hour 1): Customer service angle ("Did you experience a checkout error?") and item reminder.
- Email 2 (Day 1): Social proof focusing on the specific product's benefits.
- Email 3 (Day 3): Urgency-focused, offering a one-time 5% discount code valid for 24 hours.
- Targeted Retargeting Ads: Use social media pixels to retarget these users with dynamic carousel ads featuring the specific brand they abandoned (e.g.,
MSI,Gigabyte,Palitare highly represented in abandoned carts).
- Behavioral Email Triggers: Set up an automated three-part cart abandonment email sequence:
- The Issue: Products like
229036and775032have 1,000+ views but 0 purchases. - Action Plan:
- Automated Alert Systems: Build an analyst alert that flags any product exceeding 200 views in 30 days with a 0% conversion rate.
- Catalog Audit Checklist: Review flagged pages for:
- Uncompetitive pricing.
- Incorrect or low-quality product descriptions.
- Inventory out-of-stock statuses.
- Display or checkout rendering errors on specific browsers.
- The Issue: Missing brand or category attributes represent over $1.08M in unattributed revenue (~21% combined).
- Action Plan:
- Upstream Catalog Constraints: Make brand and category code mandatory fields in the product information management (PIM) system before a product can go live on the platform.
- Backfill Cleaning: Run regular data backfills mapping raw
category_idvalues to their missingcategory_codeandbrandnames through a lookup dictionary.
βββ .gitignore
βββ README.md # Extensive project documentation & scope overview (this file)
βββ requirements.txt # Required Python library dependencies
βββ notebooks/
β βββ ecommerce_case_study_analysis.ipynb # Main Jupyter notebook containing calculations, EDA, and plots
βββ scripts/
β βββ generate_case_study_notebook.py # Developer script used to compile and generate the notebook cells
βββ outputs/
βββ events_cleaned.csv # Final cleaned and formatted dataset
βββ kpi_summary.csv # Core calculated KPIs
βββ funnel_summary.csv # User counts and drop-offs per funnel stage
βββ user_segments.csv # Users counts and share per behavior cohort
βββ brand_revenue.csv # Revenue and purchases by brand
βββ category_revenue.csv # Revenue and purchases by category
βββ product_purchases.csv # Revenue and purchases by individual product
βββ low_conversion_products.csv # High-view, zero-purchase product analysis
To replicate the dataset cleaning, visualizations, and CSV exports, follow these steps:
Ensure you have Python 3.9+ installed. Clone or copy the workspace files into your local directory.
Create a virtual environment and install the required analysis libraries:
# Create environment
python -m venv .venv
# Activate environment (Windows PowerShell)
.venv\Scripts\Activate.ps1
# Activate environment (Mac/Linux Bash)
source .venv/bin/activate
# Install requirements
pip install -r requirements.txtStart Jupyter notebook or launch it within your preferred IDE (e.g., VS Code):
jupyter notebook notebooks/ecommerce_case_study_analysis.ipynbRun all cells. The notebook will:
- Ingest
events.csv - Process the cleaning logic
- Plot hourly activity, monthly revenue, daily purchases, top brands/categories, and funnel bar charts
- Export the raw data and aggregations into the
/outputsdirectory as analytical CSVs.