{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "1c01b419", "metadata": {}, "outputs": [], "source": [ "import sys\n", "sys.path.append('C://Users/11max/PycharmProjects/Mapping_ML/')\n", "import ML_mapping" ] }, { "cell_type": "markdown", "id": "888336a2", "metadata": {}, "source": [ "Arguments to initialize object:\n", "- reference_classification: a string with the reference classification name. Choices:\n", " - openIO-Canada\n", " - exiobase\n", " - USEEIO 2.0\n", " - GTAP 10\n", " - IOCC\n", " - NACE Rev.1.1\n", " - NACE Rev.2\n", " - CPA 2008\n", " - CPA 2.1\n", " - NAPCS 2017\n", " - NAPCS 2022\n", " - NAICS 2017\n", " - NAICS 2022\n", " - ISIC Rev.4\n", " - CPC 2.1\n", " - COICOP 2018\n", " - ecoinvent 3.8 technosphere\n", " - ecoinvent 3.9 technosphere\n", " - ecoinvent 3.8 elementary flows\n", " - ecoinvent 3.9 elementary flows\n", " - IMPACT World+ 2.0\n", " - USEtox 2\n", " - EF 3.0\n", " - EF 3.1\n", "- transformer_model: a string with the name of the machine learning model to use for word association. Available models: https://www.sbert.net/docs/pretrained_models.html\n", "- number_of_guessed: an integer giving the number of guesses by the model that will be displayed in the final dataframe" ] }, { "cell_type": "code", "execution_count": 2, "id": "391cde42", "metadata": {}, "outputs": [], "source": [ "self = ML_mapping.Mapping(reference_classification='exiobase',\n", " transformer_model='all-MiniLM-L6-v2',\n", " number_of_guesses=5)" ] }, { "cell_type": "markdown", "id": "1ba505c4", "metadata": {}, "source": [ "Enter a list of words to match to classifications and pass it as an argument to self.match_inputs(). Then calculate similarity scores and format/display results." ] }, { "cell_type": "code", "execution_count": 9, "id": "5a8702d1", "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
| \n", " | \n", " | sector | \n", "similarity | \n", "
|---|---|---|---|
| product | \n", "order | \n", "\n", " | \n", " |
| ADPE System Configuration | \n", "1 | \n", "Computer and related services (72) | \n", "0.229096 | \n", "
| 2 | \n", "Post and telecommunication services (64) | \n", "0.221951 | \n", "|
| 3 | \n", "White Spirit & SBP | \n", "0.182888 | \n", "|
| 4 | \n", "Research and development services (73) | \n", "0.165665 | \n", "|
| 5 | \n", "Other services (93) | \n", "0.160532 | \n", "|
| 6 | \n", "Education services (80) | \n", "0.158538 | \n", "|
| 7 | \n", "Air transport services (62) | \n", "0.148643 | \n", "|
| 8 | \n", "Health and social work services (85) | \n", "0.140428 | \n", "|
| 9 | \n", "Plastics, basic | \n", "0.140398 | \n", "|
| 10 | \n", "Real estate services (70) | \n", "0.133257 | \n", "|
| Chocolate | \n", "1 | \n", "Sugar | \n", "0.575643 | \n", "
| 2 | \n", "Dairy products | \n", "0.491422 | \n", "|
| 3 | \n", "Beverages | \n", "0.483462 | \n", "|
| 4 | \n", "Gas Coke | \n", "0.455178 | \n", "|
| 5 | \n", "Raw milk | \n", "0.449589 | \n", "|
| 6 | \n", "Coke Oven Coke | \n", "0.448786 | \n", "|
| 7 | \n", "Charcoal | \n", "0.43323 | \n", "|
| 8 | \n", "Vegetables, fruit, nuts | \n", "0.425549 | \n", "|
| 9 | \n", "Wheat | \n", "0.406313 | \n", "|
| 10 | \n", "Sugar cane, sugar beet | \n", "0.403211 | \n", "|
| Renting a film | \n", "1 | \n", "Renting services of machinery and equipment wi... | \n", "0.311251 | \n", "
| 2 | \n", "Motor vehicles, trailers and semi-trailers (34) | \n", "0.259165 | \n", "|
| 3 | \n", "Hotel and restaurant services (55) | \n", "0.202232 | \n", "|
| 4 | \n", "Real estate services (70) | \n", "0.199813 | \n", "|
| 5 | \n", "Private households with employed persons (95) | \n", "0.170975 | \n", "|
| 6 | \n", "Construction work (45) | \n", "0.169991 | \n", "|
| 7 | \n", "Printed matter and recorded media (22) | \n", "0.154569 | \n", "|
| 8 | \n", "Services auxiliary to financial intermediation... | \n", "0.14115 | \n", "|
| 9 | \n", "Cement, lime and plaster | \n", "0.136944 | \n", "|
| 10 | \n", "Electricity by solar photovoltaic | \n", "0.134443 | \n", "