Skip to content

gp219/SmartDataProcessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Smart Data Processing & Analytics Backend

A comprehensive Spring Boot-based backend system for data ingestion, batch processing, and analytics. This system provides secure REST APIs for data upload, analytics retrieval, and job monitoring with optimized database queries and caching.

๐Ÿš€ Features

Core Functionality

  • User Authentication & Authorization: Spring Security + JWT for role-based access control
  • Data Upload API: Accept CSV/Excel files via multipart/form-data
  • Batch Processing: Spring Batch for efficient data processing in chunks
  • Analytics API: Aggregated statistics with optimized queries and database indexes
  • Job Monitoring: Track and manage batch job execution status
  • Data Export: Download processed data as CSV files

Technical Features

  • Dual Database Architecture: MySQL for processed data, MongoDB for raw data storage
  • Optimized Queries: Database indexes and batch processing for performance
  • Caching: Spring Cache for frequently accessed data
  • Async Processing: Background job execution for large datasets
  • Comprehensive Testing: JUnit 5 + Mockito for unit and integration tests

๐Ÿ› ๏ธ Tech Stack

  • Java 17
  • Spring Boot 3.x - REST APIs and application framework
  • Spring Batch - Batch processing framework
  • Spring Security - Authentication and authorization
  • Spring Data JPA - Data access layer
  • Spring Data MongoDB - MongoDB integration
  • MySQL - Relational database for processed data
  • MongoDB - Document database for raw data
  • JWT - JSON Web Token authentication
  • Apache POI - Excel file processing
  • Maven - Build tool and dependency management
  • JUnit 5 + Mockito - Testing framework

๐Ÿ“ Project Structure

smart-data-processing/
โ”œโ”€โ”€ src/main/java/com/example/smartdata/
โ”‚   โ”œโ”€โ”€ config/           # Security, DB, Batch configurations
โ”‚   โ”œโ”€โ”€ controller/       # REST controllers
โ”‚   โ”œโ”€โ”€ dto/             # Request/response objects
โ”‚   โ”œโ”€โ”€ entity/          # JPA entities
โ”‚   โ”œโ”€โ”€ repository/      # Spring Data repositories
โ”‚   โ”œโ”€โ”€ service/         # Business logic services
โ”‚   โ”œโ”€โ”€ batch/           # Spring Batch components
โ”‚   โ”œโ”€โ”€ security/        # JWT and security components
โ”‚   โ”œโ”€โ”€ util/            # Helper classes
โ”‚   โ””โ”€โ”€ SmartDataApp.java # Main application class
โ”œโ”€โ”€ src/test/java/       # JUnit test cases
โ”œโ”€โ”€ src/main/resources/  # Configuration files
โ”œโ”€โ”€ pom.xml              # Maven dependencies
โ””โ”€โ”€ README.md

๐Ÿš€ Getting Started

Prerequisites

  • Java 17 or higher
  • Maven 3.6+
  • MySQL 8.0+
  • MongoDB 4.4+
  • Docker (optional)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd smart-data-processing
  2. Configure databases

    • Create MySQL database: smart_data
    • Create MongoDB database: smart_data_raw
    • Update application.yml with your database credentials
  3. Build the project

    mvn clean install
  4. Run the application

    mvn spring-boot:run

The application will start on http://localhost:8080

Docker Setup (Optional)

# Start MySQL
docker run --name mysql-db -e MYSQL_ROOT_PASSWORD=password -e MYSQL_DATABASE=smart_data -p 3306:3306 -d mysql:8.0

# Start MongoDB
docker run --name mongo-db -p 27017:27017 -d mongo:4.4

๐Ÿ” Authentication

Register a new user

POST /api/auth/register
{
  "username": "user@example.com",
  "password": "password123"
}

Login

POST /api/auth/login
{
  "username": "user@example.com",
  "password": "password123"
}

Use the returned JWT token in the Authorization header: Bearer <token>

๐Ÿ“Š API Endpoints

File Upload

  • POST /api/files/upload - Upload CSV/Excel files
  • GET /api/files/my-files - Get user's uploaded files
  • GET /api/files/{fileId} - Get file details

Analytics

  • POST /api/analytics/search - Search and filter data
  • GET /api/analytics/stats - Get quick statistics
  • GET /api/analytics/health - Health check

Job Management

  • POST /api/jobs/start/{fileId} - Start batch processing
  • GET /api/jobs/executions - List job executions
  • GET /api/jobs/executions/{executionId} - Get job details
  • POST /api/jobs/restart/{executionId} - Restart failed jobs
  • GET /api/jobs/statistics - Job execution statistics

Data Export

  • POST /api/export/csv - Export data as CSV

๐Ÿ”ง Configuration

Application Properties

Key configuration options in application.yml:

spring:
  datasource:
    url: jdbc:mysql://localhost:3306/smart_data
    username: root
    password: password
  
  data:
    mongodb:
      host: localhost
      port: 27017
      database: smart_data_raw
  
  batch:
    jdbc:
      initialize-schema: always
  
  security:
    jwt:
      secret: your-secret-key-here
      expiration: 86400000

Database Indexes

The system automatically creates optimized indexes for:

  • processed_data.category
  • processed_data.value
  • processed_data.processed_date
  • processed_data.file_id

๐Ÿงช Testing

Run Tests

# Run all tests
mvn test

# Run with coverage
mvn test jacoco:report

# Run specific test class
mvn test -Dtest=AnalyticsServiceTest

Test Configuration

  • Uses H2 in-memory database for repository tests
  • Mockito for mocking dependencies
  • Spring Boot test utilities for integration tests

๐Ÿ“ˆ Performance Optimizations

Database

  • Batch inserts with configurable chunk sizes
  • Optimized indexes on frequently queried columns
  • Connection pooling with HikariCP

Caching

  • Spring Cache for analytics results
  • Redis support (configurable)

Batch Processing

  • Configurable chunk sizes (default: 100)
  • Parallel processing support
  • Transaction management

๐Ÿ”’ Security

Authentication

  • JWT-based stateless authentication
  • Password encryption with BCrypt
  • Role-based access control (USER, ANALYST, ADMIN)

Authorization

  • Endpoint-level security
  • Method-level security annotations
  • CORS configuration

๐Ÿšจ Error Handling

Global Exception Handling

  • Consistent error response format
  • Proper HTTP status codes
  • Detailed error messages for debugging

Validation

  • Input validation with Bean Validation
  • Custom validation annotations
  • Error message internationalization

๐Ÿ“Š Monitoring & Health

Health Checks

  • Database connectivity
  • MongoDB connectivity
  • Batch job status
  • Application metrics

Logging

  • Structured logging with SLF4J
  • Configurable log levels
  • Performance metrics logging

๐Ÿ”„ Batch Processing

Job Lifecycle

  1. File Upload โ†’ Raw data stored in MongoDB
  2. Job Initiation โ†’ Spring Batch job created
  3. Data Processing โ†’ Chunk-based processing with reader/processor/writer
  4. Result Storage โ†’ Cleaned data stored in MySQL
  5. Status Update โ†’ Job completion status tracked

Job Monitoring

  • Real-time job status tracking
  • Failed job restart capability
  • Job execution statistics
  • Performance metrics

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Ensure all tests pass
  6. Submit a pull request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ†˜ Support

For support and questions:

  • Create an issue in the repository
  • Check the documentation
  • Review the test cases for usage examples

๐Ÿ”ฎ Future Enhancements

  • Real-time data streaming
  • Advanced analytics and machine learning
  • WebSocket support for real-time updates
  • Kubernetes deployment support
  • Advanced caching strategies
  • Data versioning and audit trails

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages