Skip to content

eric15342335/rental-product-recommendation-system-code

Repository files navigation

Rental product recommendation system

Overview

Predicting the next product page a user will visit based on their previous actions during the same browsing session.

This is a session-based next-item prediction task built on real user behavior from an online rental marketplace. More precisely, the platform focuses on children’s products, including strollers, car seats, toys, and other baby and toddler items that families rent for short- or long-term use. As a result we want to improve our perfomance metrics: primary, number of daily orders.

Start
2 days ago
Close
a month to go
Description

Your task is to build a model that, given the metadata of a browsing session and the sequence of visited pages, predicts which product pages the user will visit next.
Evaluation

Submissions are evaluated using Recall@6

For each session in the test dataset, your task is to predict which product_id values will appear in the user’s browsing history after the last recorded event in the test session fragment.

Your model must output exactly 6 different product IDs per session.
Submission Format

Your submission must contain a header:

visit_id,product_ids

Each row must contain:

    visit_id — session identifier
    product_ids — exactly 6 different product IDs, space-delimited

Example:

visit_id,product_ids
6970478551166091000,463480210 463480211 463480999 463480998 463480997 463480996

Only visit_id values from the test set should appear in your file.

Dataset Description

At the beginning of 2025, the service underwent a full rebranding: the name, domain, and website were changed.
We refer to the website used before the rebranding as the old site, and the website used after the rebranding as the new site.

The new site was built from scratch using a different technology stack. Data from the two sites is not linked.
However, we transformed and aligned the datasets to a unified structure so that participants can work with them conveniently without diving into implementation-specific details.

User interactions on both websites were tracked using Yandex Metrica, which is an analogue of Google Analytics:
https://yandex.ru/support/metrica/en/

Product and order information for the old site was exported from the backend connected to the old CRM.
For the new site, this data was exported directly from the new CRM.

On the old site, e-commerce events were not configured in Yandex Metrica. This significantly complicates the mapping between Metrica logs and backend or CRM data. The details of this issue are discussed in the description of the file metrika_hits.csv.

If you have questions or suggestions while exploring the data, feel free to reach out in our Discord channel.
Files
metrika_visits.csv

Contains user session data collected from Yandex Metrica using the Logs API:
https://yandex.ru/dev/metrika/en/logs/

This file includes a subset of fields that are likely to be useful for training and production usage.
Field descriptions can be found here:
https://yandex.ru/dev/metrika/en/logs/fields/visits

Additional field:

    project_id — site identifier:
        0 for the new site
        1 for the old site

metrika_hits.csv

Contains per-event user activity logs from Yandex Metrica, also collected via Logs API:
https://yandex.ru/dev/metrika/en/logs/fields/hits/

Events include:

    Page view events (is_page_view = 1)
    E-commerce events (ecommerce contains JSON).
    JSON format documentation:
    https://yandex.ru/support/metrica/en/ecommerce/data
    These events exist only for the new site.
    Non-bounce events (not_bounce = 1).
    According to Yandex Metrica, this event is automatically triggered when a user views a second page or stays on the site for more than 15 seconds:
    https://yandex.ru/support/metrica/en/general/glossary#bounce
    Other system and technical events

The goals_id field may contain identifiers of goals achieved during the event.
We do not provide a description of these goals because they either explicitly correspond to page views or e-commerce events, or are not relevant for the purposes of this competition.

We replaced the original url field with two fields:

    page_type
    slug (a human-readable URL identifier)

This was done to unify the old and new site structures, which originally used different URL patterns.

Yandex Metrica does not receive data from backend or CRM systems.
A reliable way to link these datasets is by using the order confirmation page (page_type = 'ORDER'):

    On the old site, this page included a cartId GET parameter representing the cart ID in the database.
    On the new site, the page included an order number in the browser tab title.

Since Metrica logs do not contain the URL or title fields, we extracted this information into:

    cart_id
    order_number

Note: Until March 12, 2022, due to an implementation bug, the first page view event in a session could be duplicated.
old_site_products.csv and new_site_products.csv

Contain product catalogs for the old and new sites.

Fields include:

    id
        Old site: internal database ID
        New site: CRM product ID
    name
    brand
    main_category
        Old site: explicitly stored
        New site: inferred from site_category_paths as the deepest category
    categories — list of categories associated with the product
    site_category_paths (new site only)
        Raw category paths from the CRM
        Delimiter: " ## "
        May include hidden categories
    color (new site only)
    price_per_period_<period> — rental prices
    description
    description_additional (new site only)
    weight
    size
    slug
    age_<code> — boolean flags indicating age suitability
    param: <name> — product parameters and filter attributes (arrays of strings)

old_site_new_site_products.csv

Contains product matching between the two sites.

Fields:

    old_site_id
    new_site_id

old_site_orders.csv and new_site_orders.csv

Contain detailed order information.
Each row corresponds to one product within an order.

Notes:

    Orders were manually processed via CRM.
    Product prices at the time of export may differ from the actual historical prices.
    Some inconsistencies with discounts and totals may occur.

Fields include:

    id
    number (new site only) — CRM order number
    user_id
    create_date
    start_date
    end_date
    rental_days
    status_code
    total_amount (new site only)
    rental_amount
    promo_discount_percent
    discount_amount (new site only)
    shipping_type
    shipping_amount
    return_amount
    source_type — SITE or OTHER
    product_id
    product_price (new site only)
    product_discount_amount (new site only)
    color (new site only)
    modified_on

old_site_carts.csv

Contains cart data from the old site.
If a cart contained N different products, the CSV contains N rows for that cart.

Important notes:

    Both the backend and CRM modified cart data, which caused inconsistencies.
    After February 2024, the CRM stopped updating cart data.
    Some fields may contradict each other due to complex real operational logic.

Fields include:

    id — cart ID
    user_id
    order_id
    rental_days
    promo_discount_percent
    delivery_date
    modified_on (CRM only)
    cart_item_id
    cart_item_deleted
    product_id
    cart_item_price
    cart_item_order_id
    cart_item_modified_on

About

[Kaggle Competition] Predicting the next product page a user will visit based on their previous actions during the same browsing session.

Topics

Resources

Stars

Watchers

Forks

Languages