Skip to content

Commit b4f8dfb

Browse files
authored
Merge pull request MicrosoftDocs#2 from violetasdev/violetasdev-patch-2
Typos: Data exploration
2 parents e9e3bd7 + 522168d commit b4f8dfb

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

01 - Data Exploration.ipynb

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"> **Note**: If you've never used the Jupyter Notebooks environment before, there are a few things you should be aware of:\n",
1616
"> \n",
1717
"> - Notebooks are made up of *cells*. Some cells (like this one) contain *markdown* text, while others (like the one beneath this one) contain code.\n",
18-
"> - The notebook is connected to a Python *kernel* (you can see which one at the top right of the page - if you're running this noptebook in an Azure Machine Learning compute instance it should be connected to the **Python 3.6 - AzureML** kernel). If you stop the kernel or disconnect from the server (for example, by closing and reopening the notebook, or ending and resuming your session), the output from cells that have been run will still be displayed; but any variables or functions defined in those cells will have been lost - you must rerun the cells before running any subsequent cells that depend on them.\n",
18+
"> - The notebook is connected to a Python *kernel* (you can see which one at the top right of the page - if you're running this notebook in an Azure Machine Learning compute instance it should be connected to the **Python 3.6 - AzureML** kernel). If you stop the kernel or disconnect from the server (for example, by closing and reopening the notebook, or ending and resuming your session), the output from cells that have been run will still be displayed; but any variables or functions defined in those cells will have been lost - you must rerun the cells before running any subsequent cells that depend on them.\n",
1919
"> - You can run each code cell by using the **► Run** button. The **◯** symbol next to the kernel name at the top right will briefly turn to **⚫** while the cell runs before turning back to **◯**.\n",
2020
"> - The output from each code cell will be displayed immediately below the cell.\n",
2121
"> - Even though the code cells can be run individually, some variables used in the code are global to the notebook. That means that you should run all of the code cells <u>**in order**</u>. There may be dependencies between code cells, so if you skip a cell, subsequent cells might not run correctly.\n",
@@ -89,7 +89,7 @@
8989
"cell_type": "markdown",
9090
"metadata": {},
9191
"source": [
92-
"Note that multiplying a list by 2 creates a new list of twice the length with the original sequence of list elements repeated. Multiplying a NumPy array on the other hand performs an element-wise calculation in which the array behaves like a *vector*, so we end up with an array of the same size in which each element has been multipled by 2.\n",
92+
"Note that multiplying a list by 2 creates a new list of twice the length with the original sequence of list elements repeated. Multiplying a NumPy array on the other hand performs an element-wise calculation in which the array behaves like a *vector*, so we end up with an array of the same size in which each element has been multiplied by 2.\n",
9393
"\n",
9494
"The key takeaway from this is that NumPy arrays are specifically designed to support mathematical operations on numeric data - which makes them more useful for data analysis than a generic list.\n",
9595
"\n",
@@ -433,7 +433,7 @@
433433
"cell_type": "markdown",
434434
"metadata": {},
435435
"source": [
436-
"The DataFrame's **read_csv** method is used to load data from text files. As you can see in the example code, you can specify options such as the column delimiter and which row (if any) contains column headers (in this case, the delimter is a comma and the first row contains the column names - these are the default settings, so the parameters could have been omitted).\n",
436+
"The DataFrame's **read_csv** method is used to load data from text files. As you can see in the example code, you can specify options such as the column delimiter and which row (if any) contains column headers (in this case, the delimiter is a comma and the first row contains the column names - these are the default settings, so the parameters could have been omitted).\n",
437437
"\n",
438438
"\n",
439439
"### Handling missing values\n",
@@ -857,7 +857,7 @@
857857
"cell_type": "markdown",
858858
"metadata": {},
859859
"source": [
860-
"The histogram for grades is a symmetric shape, where the most frequently occuring grades tend to be in the middle of the range (around 50), with fewer grades at the extreme ends of the scale.\n",
860+
"The histogram for grades is a symmetric shape, where the most frequently occurring grades tend to be in the middle of the range (around 50), with fewer grades at the extreme ends of the scale.\n",
861861
"\n",
862862
"#### Measures of central tendency\n",
863863
"\n",
@@ -1398,11 +1398,11 @@
13981398
"\n",
13991399
"> **Warning - Math Ahead!**\n",
14001400
">\n",
1401-
"> Cast your mind back to when you were learning how to solve linear equations in school, and recall that the *slope-intercept* form of a linear equation lookes like this:\n",
1401+
"> Cast your mind back to when you were learning how to solve linear equations in school, and recall that the *slope-intercept* form of a linear equation looks like this:\n",
14021402
">\n",
14031403
"> \\begin{equation}y = mx + b\\end{equation}\n",
14041404
">\n",
1405-
"> In this equation, *y* and *x* are the coordinate variables, *m* is the slope of the line, and *b* is the y-intercept (where the line goes through the Y axis).\n",
1405+
"> In this equation, *y* and *x* are the coordinate variables, *m* is the slope of the line, and *b* is the y-intercept (where the line goes through the Y-axis).\n",
14061406
">\n",
14071407
"> In the case of our scatter plot for our student data, we already have our values for *x* (*StudyHours*) and *y* (*Grade*), so we just need to calculate the intercept and slope of the straight line that lies closest to those points. Then we can form a linear equation that calculates a new *y* value on that line for each of our *x* (*StudyHours*) values - to avoid confusion, we'll call this new *y* value *f(x)* (because it's the output from a linear equation ***f***unction based on *x*). The difference between the original *y* (*Grade*) value and the *f(x)* value is the *error* between our regression line and the actual *Grade* achieved by the student. Our goal is to calculate the slope and intercept for a line with the lowest overall error.\n",
14081408
">\n",

0 commit comments

Comments
 (0)