Projects/tasks of 21W PS 911.002 Advanced Databases seminar held by Prof. Schäler in winter semester of academic year 2021/22 at Paris Lodron Universität Salzburg.
Because this repository reflects an already marked university module, it is archived.
Task 1: Theory on Transactions in PostgreSQL
(individual, non-project work)
Task 2: Avoid Deadlocks by Schedulung Transactions
A lot of parallel transactions come in and produce deadlocks. Those transactions are currently getting restarted and racing to finish resulting in long runtimes. We do know all transactions in advance, so we can schedule them in parallel such that as many transactions as possible can run simultaneously. We heuristically ordered transactions by number of rows they would affect and execute the big ones first. Our reasoning being that large transactions are likely to block many other ones. Then, the other transactions are compared against each other using our Interval class. Batches of four parallel workers were used and filled with compatible transactions. If none can be found, the respective slot(s) remain(s) idle, the next batch is started as soon as all workers of the previous one have terminated.
Task 3: Efficient (Range) Selections in Main Memory
This task simulated database systems in main memory to allow performance comparisons between different selection strategies. A columnar and a row-wise database were given as well as sample data and queries of (range) selections. The latter could contain one single-column (mono) or multiple columns (multi). Part of the task was to make the selections work in existing but not yet finished classes for both kinds of selections. The last part included further optimisations using additional accessing strategies, e.g., indexes. We found working with the given MyArrayList class a very hard nut to crack spending hours on understanding how it works, so it is unlikely we will get drunk only once after having finished this project. Oh, and we used indexes in form of binary search trees to accelerate access times and some estimations whether it is better to create and use an index or do a linear scan. Ultimately, we liked our results. And, hopefully, the beer.
Task 4: Transaction Management
Different transactions come in and cause change conflicts. Some are Writer arbitrarily updating data, others are Reader wanting to do two consecutive full-table scans to confirm whether there were changes in the meantime. We were to experiment with four strategies of locking the table and measuring the performances. Then, we were to implement a deadlock avoidance strategy either prioritising the older transactions or the newer ones. For this, we added LockingTable.rowTransactionLocks and influenced the conditions, in which Reader and Writer could obtain read/write locks and their (rollback) behaviour when the locks were denied. For deadlock resolution with unpredictably incoming transactions, a wait-for graph with cycle detection was implemented. In case of a deadlock, that transaction affecting the fewest rows would get aborted but not restarted -- it would be the user's/application's job to handle that case and send the request again. There is, however, the basis for avoiding starvation. Canonical locking for transactions known in advance was done similarly to our solution of 2.3 creating batches that could run in parallel. This time, we used wait-for graphs to detect deadlocks while sceduling. Details can be found in the report on Task 4.
Task 5: Final Optimisation
Bottlenecks were to be identified and mitigated in two of our existing solutions, one Deadlock Avoidance strategy and one Deadlock Resolution strategy. We chose to optimise those of the previous project. Implementation, consideration details and performance measures can be found in the report of task 5. For our wait-for-graph-based solution of challenge 4.3, we identified that transactions in a cycle may have already terminated but are still represented in the the graph. This means, they are considered as deadlock-causing even though they are not. We mitigated this by accounting for two additional booleans abort and unlock in checkCycle. A similar problem could be indentified in our parallel-batch approach of 4.4. Maybe the cycle- and deadlock-causing transaction in a batch has already terminated allowing the next batch to be started and run in parallel. Therefore, whenever a Writer finishes, we hypothetically add the next batch to the currently running one and see if there are cycles. If yes, this would produce a deadlock and we wait. If no, the subsequent batch starts executing. In a third challenge, we were to implement the Multi Version Concurrency Control Protocol (MVCC) retaining several snapsots of our table's data and presenting "a good choice" of an old snapshot to the Reader. This allowed several Reader (plural!) to work in paralllel and a much higher throughput at the cost of a longer run time.
| Task | Awarded * | Possible | Cumulative * | Cumulative Max |
|---|---|---|---|---|
| 2 | 2.25 | 2.00 | 2.25 | 2.00 |
| 3 | 2.25 | 2.00 | 4.50 | 4.00 |
| 4 | 3.50 | 3.00 | 8.00 | 7.00 |
| 5 | 2.00 | 2.00 | 10.00 | 9.00 |
| 1 | N/D | 1.00 | N/D | 10.00 |
* There are bonus points to be awarded.
var e = new CreativityOverflowException();
e.Message = "nice jingle";
throw e;