GitHub - Jackymn25/Work-performance-analysis: Linear analysis of work-performace data from Kaggle

This project analyzes how burnout, sleep, screen time, and work hours are associated with task completion rate in a work-from-home setting. For details, please view our report.

Overview

Using a synthetic work-from-home employee burnout dataset, this project builds and compares several regression models to study which factors are most strongly related to daily task completion. The main goal is to identify an interpretable model and evaluate its assumptions through diagnostic analysis.

Methods

The analysis includes:

Exploratory Data Analysis (EDA)
- summary statistics
- boxplots
- barplots
- scatterplots
Multiple Linear Regression
- baseline linear model for task completion rate
Quadratic Regression
- added a quadratic term for burnout score to capture nonlinearity
Model Selection
- nested model comparison
- partial F-tests
- removal of weak predictors
Multicollinearity Diagnosis
- correlation analysis
- variance inflation factor (VIF)
Model Diagnostics
- residuals vs fitted
- normal Q-Q plots
- standardized/studentized residual checks
Influence Analysis
- leverage
- Cook’s distance
- DFFITS
- sensitivity analysis after removing influential observations

Main Result

The final model shows that burnout score is the dominant predictor of task completion rate.
A quadratic burnout term improves fit, suggesting the relationship between burnout and productivity is nonlinear.
After accounting for burnout, sleep hours, screen time, and work hours contribute little additional explanatory power in this dataset.

Requirements

Make sure the following are installed:

R
rmarkdown
knitr
LaTeX distribution with xelatex support

For example:

TinyTeX TeX Live MiKTeX How to Run Put data.csv in the same folder as reportInfo.Rmd. Open the project in RStudio or another R environment. Render the report with: rmarkdown::render("reportInfo.Rmd")

This will generate the final PDF report.

Data Input

The R Markdown file searches for the dataset in these locations:

data.csv ./data.csv data/data.csv ../data.csv /mnt/data/data.csv

If the file is not found, rendering will stop with an error.

Output

The main output is a PDF report containing:

introduction and motivation data description preliminary regression results model selection process final model inference discussion and conclusion

Notes

The dataset used in this project is synthetic, so the results should be interpreted mainly as a statistical modeling exercise. The project focuses on interpretability, model adequacy, and diagnostic reasoning rather than purely predictive performance.

Authors

Jingcheng Liang
Dana Huang
Haozhe Huo

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Final_Report.Rmd		Final_Report.Rmd
Final_report.PDF		Final_report.PDF
LICENSE		LICENSE
README.md		README.md
main.R		main.R
main.Rmd		main.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Methods

Main Result

Requirements

Data Input

Output

Notes

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Methods

Main Result

Requirements

Data Input

Output

Notes

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages