A comprehensive Spring Boot-based backend system for data ingestion, batch processing, and analytics. This system provides secure REST APIs for data upload, analytics retrieval, and job monitoring with optimized database queries and caching.
- User Authentication & Authorization: Spring Security + JWT for role-based access control
- Data Upload API: Accept CSV/Excel files via multipart/form-data
- Batch Processing: Spring Batch for efficient data processing in chunks
- Analytics API: Aggregated statistics with optimized queries and database indexes
- Job Monitoring: Track and manage batch job execution status
- Data Export: Download processed data as CSV files
- Dual Database Architecture: MySQL for processed data, MongoDB for raw data storage
- Optimized Queries: Database indexes and batch processing for performance
- Caching: Spring Cache for frequently accessed data
- Async Processing: Background job execution for large datasets
- Comprehensive Testing: JUnit 5 + Mockito for unit and integration tests
- Java 17
- Spring Boot 3.x - REST APIs and application framework
- Spring Batch - Batch processing framework
- Spring Security - Authentication and authorization
- Spring Data JPA - Data access layer
- Spring Data MongoDB - MongoDB integration
- MySQL - Relational database for processed data
- MongoDB - Document database for raw data
- JWT - JSON Web Token authentication
- Apache POI - Excel file processing
- Maven - Build tool and dependency management
- JUnit 5 + Mockito - Testing framework
smart-data-processing/
โโโ src/main/java/com/example/smartdata/
โ โโโ config/ # Security, DB, Batch configurations
โ โโโ controller/ # REST controllers
โ โโโ dto/ # Request/response objects
โ โโโ entity/ # JPA entities
โ โโโ repository/ # Spring Data repositories
โ โโโ service/ # Business logic services
โ โโโ batch/ # Spring Batch components
โ โโโ security/ # JWT and security components
โ โโโ util/ # Helper classes
โ โโโ SmartDataApp.java # Main application class
โโโ src/test/java/ # JUnit test cases
โโโ src/main/resources/ # Configuration files
โโโ pom.xml # Maven dependencies
โโโ README.md
- Java 17 or higher
- Maven 3.6+
- MySQL 8.0+
- MongoDB 4.4+
- Docker (optional)
-
Clone the repository
git clone <repository-url> cd smart-data-processing
-
Configure databases
- Create MySQL database:
smart_data - Create MongoDB database:
smart_data_raw - Update
application.ymlwith your database credentials
- Create MySQL database:
-
Build the project
mvn clean install
-
Run the application
mvn spring-boot:run
The application will start on http://localhost:8080
# Start MySQL
docker run --name mysql-db -e MYSQL_ROOT_PASSWORD=password -e MYSQL_DATABASE=smart_data -p 3306:3306 -d mysql:8.0
# Start MongoDB
docker run --name mongo-db -p 27017:27017 -d mongo:4.4POST /api/auth/register
{
"username": "user@example.com",
"password": "password123"
}POST /api/auth/login
{
"username": "user@example.com",
"password": "password123"
}Use the returned JWT token in the Authorization header: Bearer <token>
POST /api/files/upload- Upload CSV/Excel filesGET /api/files/my-files- Get user's uploaded filesGET /api/files/{fileId}- Get file details
POST /api/analytics/search- Search and filter dataGET /api/analytics/stats- Get quick statisticsGET /api/analytics/health- Health check
POST /api/jobs/start/{fileId}- Start batch processingGET /api/jobs/executions- List job executionsGET /api/jobs/executions/{executionId}- Get job detailsPOST /api/jobs/restart/{executionId}- Restart failed jobsGET /api/jobs/statistics- Job execution statistics
POST /api/export/csv- Export data as CSV
Key configuration options in application.yml:
spring:
datasource:
url: jdbc:mysql://localhost:3306/smart_data
username: root
password: password
data:
mongodb:
host: localhost
port: 27017
database: smart_data_raw
batch:
jdbc:
initialize-schema: always
security:
jwt:
secret: your-secret-key-here
expiration: 86400000The system automatically creates optimized indexes for:
processed_data.categoryprocessed_data.valueprocessed_data.processed_dateprocessed_data.file_id
# Run all tests
mvn test
# Run with coverage
mvn test jacoco:report
# Run specific test class
mvn test -Dtest=AnalyticsServiceTest- Uses H2 in-memory database for repository tests
- Mockito for mocking dependencies
- Spring Boot test utilities for integration tests
- Batch inserts with configurable chunk sizes
- Optimized indexes on frequently queried columns
- Connection pooling with HikariCP
- Spring Cache for analytics results
- Redis support (configurable)
- Configurable chunk sizes (default: 100)
- Parallel processing support
- Transaction management
- JWT-based stateless authentication
- Password encryption with BCrypt
- Role-based access control (USER, ANALYST, ADMIN)
- Endpoint-level security
- Method-level security annotations
- CORS configuration
- Consistent error response format
- Proper HTTP status codes
- Detailed error messages for debugging
- Input validation with Bean Validation
- Custom validation annotations
- Error message internationalization
- Database connectivity
- MongoDB connectivity
- Batch job status
- Application metrics
- Structured logging with SLF4J
- Configurable log levels
- Performance metrics logging
- File Upload โ Raw data stored in MongoDB
- Job Initiation โ Spring Batch job created
- Data Processing โ Chunk-based processing with reader/processor/writer
- Result Storage โ Cleaned data stored in MySQL
- Status Update โ Job completion status tracked
- Real-time job status tracking
- Failed job restart capability
- Job execution statistics
- Performance metrics
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For support and questions:
- Create an issue in the repository
- Check the documentation
- Review the test cases for usage examples
- Real-time data streaming
- Advanced analytics and machine learning
- WebSocket support for real-time updates
- Kubernetes deployment support
- Advanced caching strategies
- Data versioning and audit trails