-
Notifications
You must be signed in to change notification settings - Fork 0
[Kaggle Competition] Predicting the next product page a user will visit based on their previous actions during the same browsing session.
eric15342335/rental-product-recommendation-system-code
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Rental product recommendation system
Overview
Predicting the next product page a user will visit based on their previous actions during the same browsing session.
This is a session-based next-item prediction task built on real user behavior from an online rental marketplace. More precisely, the platform focuses on children’s products, including strollers, car seats, toys, and other baby and toddler items that families rent for short- or long-term use. As a result we want to improve our perfomance metrics: primary, number of daily orders.
Start
2 days ago
Close
a month to go
Description
Your task is to build a model that, given the metadata of a browsing session and the sequence of visited pages, predicts which product pages the user will visit next.
Evaluation
Submissions are evaluated using Recall@6
For each session in the test dataset, your task is to predict which product_id values will appear in the user’s browsing history after the last recorded event in the test session fragment.
Your model must output exactly 6 different product IDs per session.
Submission Format
Your submission must contain a header:
visit_id,product_ids
Each row must contain:
visit_id — session identifier
product_ids — exactly 6 different product IDs, space-delimited
Example:
visit_id,product_ids
6970478551166091000,463480210 463480211 463480999 463480998 463480997 463480996
Only visit_id values from the test set should appear in your file.
Dataset Description
At the beginning of 2025, the service underwent a full rebranding: the name, domain, and website were changed.
We refer to the website used before the rebranding as the old site, and the website used after the rebranding as the new site.
The new site was built from scratch using a different technology stack. Data from the two sites is not linked.
However, we transformed and aligned the datasets to a unified structure so that participants can work with them conveniently without diving into implementation-specific details.
User interactions on both websites were tracked using Yandex Metrica, which is an analogue of Google Analytics:
https://yandex.ru/support/metrica/en/
Product and order information for the old site was exported from the backend connected to the old CRM.
For the new site, this data was exported directly from the new CRM.
On the old site, e-commerce events were not configured in Yandex Metrica. This significantly complicates the mapping between Metrica logs and backend or CRM data. The details of this issue are discussed in the description of the file metrika_hits.csv.
If you have questions or suggestions while exploring the data, feel free to reach out in our Discord channel.
Files
metrika_visits.csv
Contains user session data collected from Yandex Metrica using the Logs API:
https://yandex.ru/dev/metrika/en/logs/
This file includes a subset of fields that are likely to be useful for training and production usage.
Field descriptions can be found here:
https://yandex.ru/dev/metrika/en/logs/fields/visits
Additional field:
project_id — site identifier:
0 for the new site
1 for the old site
metrika_hits.csv
Contains per-event user activity logs from Yandex Metrica, also collected via Logs API:
https://yandex.ru/dev/metrika/en/logs/fields/hits/
Events include:
Page view events (is_page_view = 1)
E-commerce events (ecommerce contains JSON).
JSON format documentation:
https://yandex.ru/support/metrica/en/ecommerce/data
These events exist only for the new site.
Non-bounce events (not_bounce = 1).
According to Yandex Metrica, this event is automatically triggered when a user views a second page or stays on the site for more than 15 seconds:
https://yandex.ru/support/metrica/en/general/glossary#bounce
Other system and technical events
The goals_id field may contain identifiers of goals achieved during the event.
We do not provide a description of these goals because they either explicitly correspond to page views or e-commerce events, or are not relevant for the purposes of this competition.
We replaced the original url field with two fields:
page_type
slug (a human-readable URL identifier)
This was done to unify the old and new site structures, which originally used different URL patterns.
Yandex Metrica does not receive data from backend or CRM systems.
A reliable way to link these datasets is by using the order confirmation page (page_type = 'ORDER'):
On the old site, this page included a cartId GET parameter representing the cart ID in the database.
On the new site, the page included an order number in the browser tab title.
Since Metrica logs do not contain the URL or title fields, we extracted this information into:
cart_id
order_number
Note: Until March 12, 2022, due to an implementation bug, the first page view event in a session could be duplicated.
old_site_products.csv and new_site_products.csv
Contain product catalogs for the old and new sites.
Fields include:
id
Old site: internal database ID
New site: CRM product ID
name
brand
main_category
Old site: explicitly stored
New site: inferred from site_category_paths as the deepest category
categories — list of categories associated with the product
site_category_paths (new site only)
Raw category paths from the CRM
Delimiter: " ## "
May include hidden categories
color (new site only)
price_per_period_<period> — rental prices
description
description_additional (new site only)
weight
size
slug
age_<code> — boolean flags indicating age suitability
param: <name> — product parameters and filter attributes (arrays of strings)
old_site_new_site_products.csv
Contains product matching between the two sites.
Fields:
old_site_id
new_site_id
old_site_orders.csv and new_site_orders.csv
Contain detailed order information.
Each row corresponds to one product within an order.
Notes:
Orders were manually processed via CRM.
Product prices at the time of export may differ from the actual historical prices.
Some inconsistencies with discounts and totals may occur.
Fields include:
id
number (new site only) — CRM order number
user_id
create_date
start_date
end_date
rental_days
status_code
total_amount (new site only)
rental_amount
promo_discount_percent
discount_amount (new site only)
shipping_type
shipping_amount
return_amount
source_type — SITE or OTHER
product_id
product_price (new site only)
product_discount_amount (new site only)
color (new site only)
modified_on
old_site_carts.csv
Contains cart data from the old site.
If a cart contained N different products, the CSV contains N rows for that cart.
Important notes:
Both the backend and CRM modified cart data, which caused inconsistencies.
After February 2024, the CRM stopped updating cart data.
Some fields may contradict each other due to complex real operational logic.
Fields include:
id — cart ID
user_id
order_id
rental_days
promo_discount_percent
delivery_date
modified_on (CRM only)
cart_item_id
cart_item_deleted
product_id
cart_item_price
cart_item_order_id
cart_item_modified_on
About
[Kaggle Competition] Predicting the next product page a user will visit based on their previous actions during the same browsing session.