diff --git a/notebooks/3_KernelMethods.ipynb b/notebooks/3_KernelMethods.ipynb index fa773d3..1c4a2fc 100644 --- a/notebooks/3_KernelMethods.ipynb +++ b/notebooks/3_KernelMethods.ipynb @@ -8,7 +8,7 @@ "\n", "In this notebook, we introduce kernel-based versions of the models covered in the [Linear Methods](1_LinearMethods.ipynb) and [PCovR](2_PrincipalCovariatesRegression.ipynb) notebooks.\n", "\n", - "As always, for each model, we first go step-by-step through the derivation, with equations, embedded links, and citations supplied where useful. Second, we will employ a \"Utility Class\" for the model, which can be found in the utilities folder and contains all necessary functions.\n", + "As always, for each model, we first go step-by-step through the derivation, with equations, embedded links, and citations supplied where useful. Second, we will employ a \"Sklearn/Skcosmo Class\" for the models, which can be found in the skcosmo/sklearn mudule and contains all necessary functions.\n", "\n", "**Note**: To display values of variables in the markdown cells, you should [enable the jupyter contrib nbextension](https://stackoverflow.com/questions/52812231/print-variable-in-jupyter-notebook-markdown-cell-python)." ] @@ -35,9 +35,13 @@ "from utilities.plotting import (\n", " plot_projection, plot_regression, check_mirrors, get_cmaps, table_from_dict\n", ")\n", - "from utilities.kernels import linear_kernel, gaussian_kernel, center_kernel\n", - "from utilities.classes import PCA, LR, KPCA, KRR, KPCovR\n", - "\n", + "from utilities.kernels import center_kernel # to be replaced the function from skcosmo\n", + "from sklearn.metrics.pairwise import linear_kernel, rbf_kernel \n", + "from utilities.classes import KPCovR # to be replaced with the class from skcosmo\n", + "from sklearn.decomposition import PCA, KernelPCA\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.kernel_ridge import KernelRidge\n", + "from functools import partial\n", "cmaps = get_cmaps()\n", "plt.style.use('../utilities/kernel_pcovr.mplstyle')\n", "dbl_fig=(2*plt.rcParams['figure.figsize'][0], plt.rcParams['figure.figsize'][1])" @@ -83,7 +87,7 @@ " k(\\mathbf{x}, \\mathbf{x}^{\\prime}) = \\exp{\\left(-\\gamma\\lVert \\mathbf{x} - \\mathbf{x}^{\\prime}\\rVert^2\\right)}.\n", "\\end{equation}\n", "\n", - "In this notebook we use the gaussian kernel (```gaussian_kernel(XA, XB, gamma=1.0)```), however, we have also included the linear kernel function (```linear_kernel(XA, XB)```) in ```utilities/kernels.py```. You can check rather easily that if you use a linear kernel all of the kernel methods reduce explicitly to linear regression or principal component analysis in the original feature space." + "In this notebook we use the gaussian kernel (```rbf_kernel(XA, XB, gamma=1.0)```), however, we have also included the linear kernel function (```linear_kernel(XA, XB)```) in ```sklearn```. You can check rather easily that if you use a linear kernel all of the kernel methods reduce explicitly to linear regression or principal component analysis in the original feature space." ] }, { @@ -95,7 +99,8 @@ "# Changing the kernels in this cell will change them throughout the notebook tutorial.\n", "# The function variable designates the kernel function to use throughout the notebook.\n", "# The string variable is used to pass to the utility classes. \n", - "kernel_func = gaussian_kernel\n", + "kernel_params = {\"kernel\": \"precomputed\"}\n", + "kernel_func = partial(rbf_kernel, gamma=1.0)\n", "kernel_type = 'gaussian'" ] }, @@ -590,10 +595,10 @@ "metadata": {}, "outputs": [], "source": [ - "lr = LR(regularization=regularization)\n", + "lr = Ridge(alpha=regularization)\n", "\n", "lr.fit(X_train, Y_train)\n", - "Y_lr = lr.transform(X_test)\n", + "Y_lr = lr.predict(X_test)\n", "\n", "table_from_dict([get_stats(x=X_test, \n", " yp=Y_lr,\n", @@ -623,11 +628,11 @@ "outputs": [], "source": [ "regularizations = np.linspace(-10, -2, 24)\n", - "krrs = [KRR(regularization=10**i, kernel_type=kernel_type) \n", + "krrs = [KernelRidge(alpha=10**i, **kernel_params) \n", " for i in regularizations]\n", "\n", "for k in krrs:\n", - " k.fit(X=X_train, K=K_train, Y=Y_train)" + " k.fit(X=K_train, y = Y_train)" ] }, { @@ -636,10 +641,8 @@ "metadata": {}, "outputs": [], "source": [ - "coeff_deter = np.array([k.statistics(X=X_test, \n", - " Y=Y_test, \n", - " K=K_test)['Coefficient of Determination
($R^2$)'] for k in krrs])\n", - "best_regularization = 10**regularizations[coeff_deter.argmax()]" + "ell = np.array([np.linalg.norm(Y_test - k.predict(K_test)) ** 2.0 / np.linalg.norm(Y_test) **2.0 for k in krrs])\n", + "best_regularization = 10**regularizations[ell.argmin()]" ] }, { @@ -655,15 +658,15 @@ "metadata": {}, "outputs": [], "source": [ - "plt.semilogx(10**regularizations, coeff_deter, marker='o', zorder=-1)\n", + "plt.semilogx(10**regularizations, ell, marker='o', zorder=-1)\n", "plt.scatter(best_regularization,\n", - " max(coeff_deter), marker='o', color='r',\n", + " min(ell), marker='o', color='r',\n", " )\n", "\n", "plt.xlabel(r\"$\\lambda$\")\n", - "plt.ylabel(r\"$R^2$\")\n", + "plt.ylabel(r\"$\\ell$\")\n", "\n", - "plt.title(r\"Effect of $\\lambda$ on $R^2$ for Kernel Ridge Regression\")\n", + "plt.title(r\"Effect of $\\lambda$ on $\\ell$ for Kernel Ridge Regression\")\n", "\n", "plt.show()" ] @@ -717,11 +720,10 @@ "\n", "regularization = 1e-12\n", "\n", - "krr = KRR(regularization=regularization, kernel_type=kernel_type)\n", - "krr.fit(X_train, Y_train)\n", - "Yhat_train = krr.transform(X_train).reshape(-1,Y.shape[-1])\n", - "PKY = krr.PKY\n", - "\n", + "krr = KernelRidge(alpha=regularization, **kernel_params)\n", + "krr.fit(K_train, Y_train)\n", + "Yhat_train = krr.predict(K_train).reshape(-1,Y.shape[-1])\n", + "PKY = krr.dual_coef_\n", "K_pca = K_train/(np.trace(K_train)/n_train)\n", "K_lr = np.matmul(Yhat_train, Yhat_train.T)\n", " \n", @@ -829,13 +831,13 @@ "source": [ "fig, axes = plt.subplots(1,2, figsize=dbl_fig)\n", "\n", - "ref = KPCA(n_PC=2, kernel_type=kernel_type)\n", - "ref.fit(X_train)\n", - "xref = ref.transform(X_test)\n", + "ref = KernelPCA(n_components=2, **kernel_params)\n", + "ref.fit(K_train)\n", + "T_kpca = ref.transform(K_test)\n", "\n", "plot_projection(Y_test, T_kpcovr_test, fig=fig, ax=axes[0], \n", " title = r\"KPCovR ($\\alpha={}$)\".format(alpha), **cmaps)\n", - "plot_projection(Y_test, xref, fig=fig, ax=axes[1], title = \"KPCA\", **cmaps)\n", + "plot_projection(Y_test, T_kpca, fig=fig, ax=axes[1], title = \"KPCA\", **cmaps)\n", "\n", "fig.suptitle(r\"These will become increasingly different as $\\alpha \\to 0.0.$\", \n", " y=0.0, fontsize=plt.rcParams['font.size']+6)\n", @@ -889,9 +891,9 @@ "source": [ "fig, axes = plt.subplots(1,2, figsize=dbl_fig)\n", "\n", - "ref = KRR(regularization=regularization, kernel_type=kernel_type)\n", - "ref.fit(X_train, Y_train)\n", - "yref = ref.transform(X_test)\n", + "ref = KernelRidge(alpha=regularization, **kernel_params)\n", + "ref.fit(K_train, Y_train)\n", + "yref = ref.predict(K_test)\n", "\n", "errors = np.concatenate((np.abs(Ypred-Y_test),np.abs(yref-Y_test)))\n", "\n", @@ -1121,9 +1123,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# The Utility Classes\n", + "# Implementation in `scikit-learn`\n", "\n", - "Classes from the utility module enable computing KPCA, KRR, and KPCovR with a scikit.learn-like syntax. " + "Classes from `sklearn` enable computing KPCA and KRR. Here we'll review the basics of their usage; readers are encouraged to refer to sklearn documentation for further detail." ] }, { @@ -1132,16 +1134,8 @@ "metadata": {}, "outputs": [], "source": [ - "from utilities.classes import KPCA, KRR, KPCovR" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Important Note**: In all kernel classes, the functions `fit`, `transform`, and `statistics` will either compute the designated kernel for the supplied $\\mathbf{X}$ data or use the provided precomputed kernel. The kernel is **always** a keyword argument (e.g. `model.fit(K=K)`), and the first argument, positionally, is $\\mathbf{X}$.\n", - "\n", - "In each demonstration, we will show the function signature using X, but we will also supply our precomputed kernel for computational efficiency." + "from sklearn.decomposition import KernelPCA\n", + "from sklearn.kernel_ridge import KernelRidge" ] }, { @@ -1157,14 +1151,15 @@ "metadata": {}, "outputs": [], "source": [ - "kpca = KPCA(n_PC=2, kernel_type=kernel_type)" + "# We set kernel = 'precomputed' and supply kpca.fit(X) with X = K_train to improve computational efficiency.\n", + "kpca = KernelPCA(n_components=2, **kernel_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Calling `kpca.fit(X)` will generate the kernel for $\\mathbf{X}$ and compute/internally store the eigenvectors/values." + "Calling `kpca.fit(K_train)` will generate the kernel for $\\mathbf{X}$ and compute/internally store the eigenvectors/values." ] }, { @@ -1173,14 +1168,14 @@ "metadata": {}, "outputs": [], "source": [ - "kpca.fit(X_train, K=K_train)" + "kpca.fit(K_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Calling `kpca.transform(X)` will compute and return the KPCA projection $\\mathbf{T}$." + "Calling `kpca.transform(K)` will compute and return the KPCA projection $\\mathbf{T}$." ] }, { @@ -1189,8 +1184,8 @@ "metadata": {}, "outputs": [], "source": [ - "T_KPCA_train = kpca.transform(X_train, K=K_train)\n", - "T_KPCA_test= kpca.transform(X_test, K=K_test)" + "T_KPCA_train = kpca.transform(K_train)\n", + "T_KPCA_test= kpca.transform(K_test)" ] }, { @@ -1208,21 +1203,35 @@ "plt.show()" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Calling `kpca.statistics(X, Y)` will output the statistics of the projection of the kernel created from $\\mathbf{X}$ and the regression onto $\\mathbf{Y}$." - ] - }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "table_from_dict([kpca.statistics(X_train, Y_train, K=K_train), \n", - " kpca.statistics(X_test, Y_test, K=K_test)], \n", + "PTX = np.matmul(np.diagflat(1/((v_K[:n_PC]))) , \n", + " np.matmul(T_KPCA_train.T, X_train))\n", + "\n", + "Xr_train = np.matmul(T_KPCA_train, PTX)\n", + "Xr_test = np.matmul(T_KPCA_test, PTX)\n", + "\n", + "K_approx_train = np.matmul(T_KPCA_train,T_KPCA_train.T)\n", + "K_test_test = kernel_func(X_test, X_test)\n", + "K_test_test = center_kernel(K_test_test)\n", + "K_approx_test = np.matmul(T_KPCA_test,T_KPCA_test.T)\n", + "\n", + "table_from_dict([get_stats(x=X_train, \n", + " xr=Xr_train,\n", + " y=Y_train, \n", + " t=T_KPCA_train, \n", + " k=K_train, \n", + " kapprox=K_approx_train), \n", + " get_stats(x=X_test, \n", + " xr=Xr_test,\n", + " y=Y_test, \n", + " t=T_KPCA_test, \n", + " k=K_test_test, \n", + " kapprox=K_approx_test)], \n", " headers = [\"Training\", \"Testing\"], \n", " title=\"KPCA\")" ] @@ -1240,7 +1249,8 @@ "metadata": {}, "outputs": [], "source": [ - "krr = KRR(regularization=regularization, kernel_type=kernel_type)" + "# We set kernel = 'precomputed' and supply krr.fit(X,y) with X = K_train to improve computational efficiency.\n", + "krr = KernelRidge(alpha=regularization, **kernel_params)" ] }, { @@ -1256,14 +1266,14 @@ "metadata": {}, "outputs": [], "source": [ - "krr.fit(X_train, Y_train, K=K_train)" + "krr.fit(K_train, Y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Calling `krr.transform(X)` will generate the desired kernel from input $\\mathbf{X}$ and compute and return the predicted $\\mathbf{Y}$ values from $\\hat{\\mathbf{Y}}_{KRR} = \\mathbf{K}\\mathbf{P}_{KY}$." + "Calling `krr.predict(X)` will generate the desired kernel from input $\\mathbf{X}$ and compute and return the predicted $\\mathbf{Y}$ values from $\\hat{\\mathbf{Y}}_{KRR} = \\mathbf{K}\\mathbf{P}_{KY}$." ] }, { @@ -1272,8 +1282,8 @@ "metadata": {}, "outputs": [], "source": [ - "Y_krr_train = krr.transform(X_train, K=K_train)\n", - "Y_krr_test = krr.transform(X_test, K=K_test)" + "Y_krr_train = krr.predict(K_train)\n", + "Y_krr_test = krr.predict(K_test)" ] }, { @@ -1291,11 +1301,25 @@ "plt.show()" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "table_from_dict([get_stats(x=X_train, yp=Y_krr_train,y=Y_train, k=K_train), \n", + " get_stats(x=X_test, yp=Y_krr_test, y=Y_test, k=K_test)], \n", + " headers = [\"Training\", \"Testing\"], \n", + " title=\"Kernel Ridge Regression\")" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Calling `krr.statistics(X,Y)` will output the statistics of the regression of $\\mathbf{X}$ and $\\mathbf{Y}$." + "# Implementation in `skcosmo`\n", + "\n", + "Classes from `skcosmo` enable computing KPCovR. Here we'll review the basics of their usage; readers are encouraged to refer to skcosmo documentation for further detail." ] }, { @@ -1304,10 +1328,16 @@ "metadata": {}, "outputs": [], "source": [ - "table_from_dict([krr.statistics(X=X_train, Y=Y_train, K=K_train), \n", - " krr.statistics(X=X_test, Y=Y_test, K=K_test)], \n", - " headers = [\"Training\", \"Testing\"], \n", - " title=\"Kernel Ridge Regression\")" + "from utilities.classes import KPCovR # to be replaced with the class from skcosmo" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Important Note**: In the class, the functions `fit`, `transform`, and `statistics` will either compute the designated kernel for the supplied $\\mathbf{X}$ data or use the provided precomputed kernel. The kernel is **always** a keyword argument (e.g. `model.fit(K=K)`), and the first argument, positionally, is $\\mathbf{X}$.\n", + "\n", + "We will show the function signature using X, but we will also supply our precomputed kernel for computational efficiency." ] }, { @@ -1330,7 +1360,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Calling `kpcovr.fit(X,Y)` will compute the weights $\\mathbf{P}_{KY}$ and internally store them." + "Calling `kpcovr.fit(X, Y, K)` will compute the weights $\\mathbf{P}_{KY}$ and internally store them." ] }, { @@ -1346,7 +1376,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Calling `kpcovr.transform(X)` will return the latent space transformation $\\mathbf{T}$, the predicted properties, and the reconstructed input data." + "Calling `kpcovr.transform(X, K)` will return the latent space transformation $\\mathbf{T}$, the predicted properties, and the reconstructed input data." ] }, { @@ -1394,13 +1424,6 @@ " ], \\\n", " headers = [r\"KPCovR ($\\alpha={}$)\".format(kp.alpha), \"KRR\", \"KPCA\"])" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": {