|
3 | 3 | ## Statement vs Row Based Replication |
4 | 4 |
|
5 | 5 | MySQL supports two primary modes of replication in its binary logs: statement or |
6 | | -row based. |
7 | | - |
8 | | -**Statement Based Replication**: |
9 | | - |
10 | | -* The statements executed on the master are copied almost as-is in the master |
11 | | - logs. |
12 | | -* The slaves replay these statements as is. |
13 | | -* If the statements are expensive (especially an update with a complicated WHERE |
14 | | - clause), they will be expensive on the slaves too. |
15 | | -* For current timestamp and auto-increment values, the master also puts |
16 | | - additional SET statements in the logs to make the statement have the same |
17 | | - effect, so the slaves end up with the same values. |
18 | | - |
19 | | -**Row Based Replication**: |
20 | | - |
21 | | -* The statements executed on the master result in updated rows. The new full |
22 | | - values for these rows are copied to the master logs. |
23 | | -* The slaves change their records for the rows they receive. The update is by |
24 | | - primary key, and contains the new values for each column, so usually it’s very |
25 | | - fast. |
26 | | -* Each updated row contains the entire row, not just the columns that were |
27 | | - updated (unless the flag --binlog\_row\_image=minimal is used). |
28 | | -* The replication stream is harder to read, as it contains almost binary data, |
29 | | - that don’t easily map to the original statements. |
30 | | -* There is a configurable limit on how many rows can be affected by one |
31 | | - binlog event, so the master logs are not flooded. |
32 | | -* The format of the logs depends on the master schema: each row has a list of |
33 | | - values, one value for each column. So if the master schema is different from |
34 | | - the slave schema, updates will misbehave (exception being if slave has extra |
35 | | - columns at the end). |
36 | | -* It is possible to revert to statement based replication for some commands to |
37 | | - avoid these drawbacks (for instance for DELETE statements that affect a large |
38 | | - number of rows). |
39 | | -* Schema changes always use statement based replication. |
40 | | -* If comments are added to a statement, they are stripped from the |
41 | | - replication stream (as only rows are transmitted). There is a flag |
42 | | - --binlog\_rows\_query\_log\_events to add the original statement to each row |
43 | | - update, but it is costly in terms of binlog size. |
44 | | - |
45 | | -For the longest time, MySQL replication has been single-threaded: only one |
46 | | -statement is applied by the slaves at a time. Since the master applies more |
47 | | -statements in parallel, replication can fall behind on the slaves fairly easily, |
48 | | -under higher load. Even though the situation has improved (parallel slave |
49 | | -apply), the slave replication speed is still a limiting factor for a lot of |
50 | | -applications. Since row based replication achieves higher update rates on the |
51 | | -slaves in most cases, it has been the only viable option for most performance |
52 | | -sensitive applications. |
53 | | - |
54 | | -Schema changes however are not easy to achieve with row based |
55 | | -replication. Adding columns can be done offline, but removing or changing |
56 | | -columns cannot easily be done (there are multiple ways to achieve this, but they |
57 | | -all have limitations or performance implications, and are not that easy to |
58 | | -setup). |
59 | | - |
60 | | -Vitess helps by using statement based replication (therefore allowing complex |
61 | | -schema changes), while at the same time simplifying the replication stream (so |
62 | | -slaves can be fast), by rewriting Update statements. |
63 | | - |
64 | | -Then, with statement based replication, it becomes easier to perform offline |
65 | | -advanced schema changes, or large data updates. Vitess’s solution is called |
66 | | -schema swap. |
| 6 | +row based. Vitess supports both these modes. |
| 7 | + |
| 8 | +For schema changes, if the number of affected rows is greater > 100k (configurable), we don't allow direct application |
| 9 | +of DDLs the recommended tool in such cases is gh-ost |
67 | 10 |
|
68 | | -We plan to also support row based replication in the future, and adapt our tools |
69 | | -to provide the same features when possible. See Appendix for our plan. |
| 11 | +When using statement based replication, Vitess helps by rewriting Update statements, |
| 12 | +therefore allowing complex schema changes, while at the same time simplifying the replication stream (so |
| 13 | +slaves can be fast). This is described in detail below. |
| 14 | + |
| 15 | +Thus, with statement based replication, it becomes easier to perform offline |
| 16 | +advanced schema changes, or large data updates. Vitess’s solution is called |
| 17 | +schema swap (described below). |
70 | 18 |
|
71 | 19 | ## Rewriting Update Statements |
72 | 20 |
|
|
0 commit comments