Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# `Intel® Low Precision Optimization Tool (Intel® LPOT)` Sample for TensorFlow*
# `Intel® Neural Compressor` Sample for TensorFlow*

## Background
Low-precision inference can speed up inference obviously, by converting the fp32 model to int8 or bf16 model. Intel provides Intel® Deep Learning Boost technology in the Second Generation Intel® Xeon® Scalable Processors and newer Xeon®, which supports to speed up int8 and bf16 model by hardware.

Intel® Low Precision Optimization Tool (Intel LPOT) helps the user to simplify the processing to convert the fp32 model to int8/bf16.
Intel® Neural Compressor helps the user to simplify the processing to convert the fp32 model to int8/bf16.

At the same time, Intel LPOT will tune the quanization method to reduce the accuracy loss, which is a big blocker for low-precision inference.
At the same time, Intel® Neural Compressor will tune the quanization method to reduce the accuracy loss, which is a big blocker for low-precision inference.

Intel LPOT is released in Intel® AI Analytics Toolkit and works with Intel® Optimization of TensorFlow*.
Intel® Neural Compressor is released in Intel® AI Analytics Toolkit and works with Intel® Optimization of TensorFlow*.

Please refer to the official website for detailed info and news: [https://github.com/intel/lp-opt-tool](https://github.com/intel/lp-opt-tool)
Please refer to the official website for detailed info and news: [https://github.com/intel/neural-compressor](https://github.com/intel/neural-compressor)

## License

Expand All @@ -19,18 +19,18 @@ Code samples are licensed under the MIT license. See
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt)

## Purpose
This sample will show a whole process to build up a CNN model to recognize handwriting number and speed up it by Intel LPOT.
This sample will show a whole process to build up a CNN model to recognize handwriting number and speed up it by Intel® Neural Compressor.

We will learn how to train a CNN model based on Keras with TensorFlow, use Intel LPOT to quantize the model and compare the performance to understand the benefit of Intel LPOT.
We will learn how to train a CNN model based on Keras with TensorFlow, use Intel® Neural Compressor to quantize the model and compare the performance to understand the benefit of Intel® Neural Compressor.

## Key Implementation Details

- Use Keras on TensorFlow to build and train the CNN model.


- Define function and class for Intel LPOT to quantize the CNN model.
- Define function and class for Intel® Neural Compressor to quantize the CNN model.

The Intel LPOT can run on any Intel® CPU to quantize the AI model.
The Intel® Neural Compressor can run on any Intel® CPU to quantize the AI model.

The quantized AI model has better inference performance than the FP32 model on Intel CPU.

Expand All @@ -47,7 +47,7 @@ We will learn how to train a CNN model based on Keras with TensorFlow, use Intel
| OS | Linux* Ubuntu* 18.04
| Hardware | The Second Generation Intel® Xeon® Scalable processor family or newer
| Software | Intel® oneAPI AI Analytics Toolkit 2021.1 or newer
| What you will learn | How to use Intel LPOT tool to quantize the AI model based on TensorFlow and speed up the inference on Intel® Xeon® CPU
| What you will learn | How to use Intel® Neural Compressor tool to quantize the AI model based on TensorFlow and speed up the inference on Intel® Xeon® CPU
| Time to complete | 10 minutes

## Running Environment
Expand Down Expand Up @@ -127,10 +127,10 @@ conda activate tensorflow

```

### Install Intel LPOT by Local Channel
### Install Intel® Neural Compressor by Local Channel

```
conda install -c ${ONEAPI_ROOT}/conda_channel numpy pyyaml scikit-learn schema lpot -y
conda install -c ${ONEAPI_ROOT}/conda_channel neural-compressor -y --offline
```

### Install Jupyter Notebook
Expand All @@ -139,12 +139,6 @@ conda install -c ${ONEAPI_ROOT}/conda_channel numpy pyyaml scikit-learn schema l
python -m pip install notebook
```

### Install Matplotlib

```
python -m pip install matplotlib
```

## Running the Sample <a name="running-the-sample"></a>

### Startup Jupyter Notebook
Expand Down Expand Up @@ -178,7 +172,7 @@ conda activate /opt/intel/oneapi/intelpython/latest/envs/tensorflow

### Open Sample Code File

In a web browser, open link: **http://yyy:8888/?token=146761d9317552c43e0d6b8b6b9e1108053d465f6ca32fca**. Click 'lpot_sample_tensorflow.ipynb' to start up the sample.
In a web browser, open link: **http://yyy:8888/?token=146761d9317552c43e0d6b8b6b9e1108053d465f6ca32fca**. Click 'inc_sample_tensorflow.ipynb' to start up the sample.

### Run

Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
version: 1.0

model:
name: hello_world
framework: tensorflow # possible values are tensorflow, mxnet and pytorch
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
import sys

try:
import lpot
import neural_compressor as inc
print("neural_compressor version {}".format(inc.__version__))
except:
import ilit as lpot
try:
import lpot as inc
print("LPOT version {}".format(inc.__version__))
except:
import ilit as inc
print("iLiT version {}".format(inc.__version__))

if lpot.__version__ == '1.2':
if inc.__version__ == '1.2':
print("This script doesn't support LPOT 1.2, please install LPOT 1.1, 1.2.1 or newer")
sys.exit(1)

Expand Down Expand Up @@ -39,7 +46,7 @@ def __iter__(self):

def auto_tune(input_graph_path, yaml_config, batch_size):
fp32_graph = alexnet.load_pb(input_graph_path)
quan = lpot.Quantization(yaml_config)
quan = inc.Quantization(yaml_config)
dataloader = Dataloader(batch_size)

q_model = quan(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Intel® Low Precision Optimization Tool (LPOT) Sample for Tensorflow"
"# Intel® Neural Compressor Sample for Tensorflow"
]
},
{
Expand All @@ -13,17 +13,17 @@
"source": [
"## Agenda\n",
"- Train a CNN Model Based on Keras\n",
"- Quantize Keras Model by LPOT\n",
"- Quantize Keras Model by Intel® Neural Compressor\n",
"- Compare Quantized Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### LPOT Release and Sample \n",
"### Intel® Neural Compressor Release and Sample \n",
"\n",
"This sample code is always updated for the LPOT release in latest oneAPI release.\n",
"This sample code is always updated for the Intel® Neural Compressor release in latest oneAPI release.\n",
"\n",
"If you want to get the sample code for old oneAPI release, please checkout the old sample code release by git tag.\n",
"\n",
Expand Down Expand Up @@ -51,9 +51,9 @@
"source": [
"Import python packages and check version.\n",
"\n",
"Make sure the Tensorflow is **2.2** or newer, LPOT is **not 1.2** and matplotlib are installed.\n",
"Make sure the Tensorflow is **2.2** or newer, Intel® Neural Compressor is **not 1.2** and matplotlib are installed.\n",
"\n",
"Note, LPOT has an old name **ilit**. Following script supports to old package name **ilit**."
"Note, Intel® Neural Compressor has an old names: **lpot**, **ilit**. Following script supports to old package names."
]
},
{
Expand All @@ -64,13 +64,18 @@
"source": [
"import tensorflow as tf\n",
"print(\"Tensorflow version {}\".format(tf.__version__))\n",
"tf.compat.v1.enable_eager_execution()\n",
"\n",
"try:\n",
" import lpot\n",
" print(\"LPOT version {}\".format(lpot.__version__)) \n",
" import neural_compressor as inc\n",
" print(\"neural_compressor version {}\".format(inc.__version__)) \n",
"except:\n",
" import ilit as lpot\n",
" print(\"iLiT version {}\".format(lpot.__version__)) \n",
" try:\n",
" import lpot as inc\n",
" print(\"LPOT version {}\".format(inc.__version__)) \n",
" except:\n",
" import ilit as inc\n",
" print(\"iLiT version {}\".format(inc.__version__)) \n",
"\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np"
Expand All @@ -87,7 +92,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Intel Optimized TensorFlow 2.5.0 requires to set environment variable **TF_ENABLE_MKL_NATIVE_FORMAT=0** before running LPOT quantize Fp32 model or deploying the quantized model."
"Intel Optimized TensorFlow 2.5.0 and later require to set environment variable **TF_ENABLE_MKL_NATIVE_FORMAT=0** before running Intel® Neural Compressor quantize Fp32 model or deploying the quantized model."
]
},
{
Expand Down Expand Up @@ -219,12 +224,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Quantize FP32 Model by LPOT\n",
"## Quantize FP32 Model by Intel® Neural Compressor\n",
"\n",
"LPOT supports to quantize the model with a validation dataset for tuning.\n",
"Intel® Neural Compressor supports to quantize the model with a validation dataset for tuning.\n",
"Finally, it returns an frezon quantized model based on int8.\n",
"\n",
"We prepare a python script \"**LPOT_quantize_model.py**\" to call LPOT to finish the all quantization job.\n",
"We prepare a python script \"**inc_quantize_model.py**\" to call Intel® Neural Compressor to finish the all quantization job.\n",
"Following code sample is used to explain the code.\n",
"\n",
"### Define Dataloader\n",
Expand Down Expand Up @@ -286,7 +291,7 @@
"source": [
"### Define Yaml File\n",
"\n",
"We define alexnet.yaml to save the necessary parameters for LPOT.\n",
"We define alexnet.yaml to save the necessary parameters for Intel® Neural Compressor.\n",
"In this case, we only need to change the input/output according to the fp32 model.\n",
"\n",
"In this case, the input node name is '**x**'.\n",
Expand Down Expand Up @@ -322,7 +327,7 @@
"\n",
"def auto_tune(input_graph_path, yaml_config, batch_size): \n",
" fp32_graph = alexnet.load_pb(input_graph_path)\n",
" quan = lpot.Quantization(yaml_config)\n",
" quan = inc.Quantization(yaml_config)\n",
" dataloader = Dataloader(batch_size)\n",
" assert(dataloader)\n",
" q_model = quan(\n",
Expand Down Expand Up @@ -350,23 +355,25 @@
"source": [
"### Call Function to Quantize the Model\n",
"\n",
"Show the code in \"**lpot_quantize_model.py**\"."
"Show the code in \"**inc_quantize_model.py**\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"!cat lpot_quantize_model.py"
"!cat inc_quantize_model.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will execute the \"**lpot_quantize_model.py**\" to show the whole process of quantizing a model."
"We will execute the \"**inc_quantize_model.py**\" to show the whole process of quantizing a model."
]
},
{
Expand All @@ -377,7 +384,7 @@
},
"outputs": [],
"source": [
"!python lpot_quantize_model.py"
"!python inc_quantize_model.py"
]
},
{
Expand All @@ -393,11 +400,11 @@
"source": [
"## Compare Quantized Model\n",
"\n",
"We prepare a script **profiling_lpot.py** to test the performance of PB model.\n",
"We prepare a script **profiling_inc.py** to test the performance of PB model.\n",
"\n",
"There is no correct performance data if run the code by jupyter notebook. So we run the script as process.\n",
"\n",
"Let learn **profiling_lpot.py**. "
"Let learn **profiling_inc.py**. "
]
},
{
Expand All @@ -406,45 +413,51 @@
"metadata": {},
"outputs": [],
"source": [
"!cat profiling_lpot.py"
"!cat profiling_inc.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Execute the **profiling_lpot.py** with FP32 model file:"
"Execute the **profiling_inc.py** with FP32 model file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"!python profiling_lpot.py --input-graph=./fp32_frezon.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=32"
"!python profiling_inc.py --input-graph=./fp32_frezon.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=32"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Execute the **profiling_lpot.py** with int8 model file:"
"Execute the **profiling_inc.py** with int8 model file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"!python profiling_lpot.py --input-graph=./alexnet_int8_model.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=8"
"!python profiling_inc.py --input-graph=./alexnet_int8_model.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=8"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"!cat 32.json\n",
Expand Down Expand Up @@ -570,13 +583,6 @@
"- FP32 to INT8.\n",
"- Intel® Deep Learning Boost speed up INT8 if your CPU is the Second Generation Intel® Xeon® Scalable Processors which supports it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -595,7 +601,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.11"
"version": "3.9.7"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"guid": "82e7612f-2810-4d12-9c75-c17fcbb946fa",
"name": "Intel® Neural Compressor Tensorflow Getting Started",
"categories": ["Toolkit/oneAPI AI And Analytics/AI Getting Started Samples"],
"description": "This sample illustrates how to run Intel® Neural Compressor to quantize the FP32 model trained by Keras on Tensorflow to INT8 model to speed up the inference.",
"languages": [{"python":{}}],
"dependencies": ["tensorflow","neural-compressor"],
"os": ["linux"],
"builder": ["cli"],
"targetDevice": ["CPU"],
"ciTests": {
"linux": [
{
"env": ["source ${ONEAPI_ROOT}/setvars.sh --force",
"conda env remove -n user_tensorflow",
"conda create -n user_tensorflow -c ${ONEAPI_ROOT}/conda_channel python=`python -V| awk '{print $2}'` -y",
"conda activate user_tensorflow",
"conda install -n user_tensorflow -c ${ONEAPI_ROOT}/conda_channel tensorflow python-flatbuffers -y",
"conda install -n user_tensorflow -c ${ONEAPI_ROOT}/conda_channel neural-compressor -y --offline",
"conda install -n user_tensorflow -c ${ONEAPI_ROOT}/conda_channel lpot -y --offline",
"conda install -n user_tensorflow runipy notebook -y"
],
"id": "neural-compressor tensorflow",
"steps": [
"runipy inc_sample_tensorflow.ipynb"
]
}
]
}
}
Loading