Skip to content
December 15, 2024 Machine Learning

Kallos Models: MLOps Framework for Cryptocurrency Forecasting

Production-grade MLOps framework for training and deploying deep learning time-series models for cryptocurrency price prediction. Features walk-forward validation, multi-objective optimization, and end-to-end CLI workflow.

MLOps Python Deep Learning Time Series Cryptocurrency PyTorch Optuna Darts

MLOps for Financial Forecasting

Most machine learning research stops at prediction accuracy. Models train on historical data, achieve impressive test set performance, then fail when deployed because they peeked at the future during validation or optimized for the wrong objective. Kallos Models was built to prevent these failures by implementing production-grade MLOps practices from the start.

This framework trains deep learning models for cryptocurrency price prediction using walk-forward validation that respects temporal ordering, multi-objective hyperparameter optimization that balances prediction accuracy with trading signal quality, and a three-step pipeline that separates tuning from training from evaluation. The result: models that maintain performance when deployed because they were validated like production systems, not academic experiments.

The framework supports GRU, LSTM, and Transformer architectures via the Darts library, implements a custom loss function that penalizes directional errors more heavily than magnitude mistakes, and provides an intuitive CLI for the complete workflow from hyperparameter search through final model evaluation.

The Walk-Forward Validation Problem

Traditional k-fold cross-validation randomly shuffles data into folds, allowing models to train on Friday and validate on Monday—temporal leakage that produces dangerously optimistic results. In time series forecasting, this violation of causality makes models appear to work when they fundamentally don’t. Kallos Models implements walk-forward splits that respect time:

Each validation fold uses only past data for training and immediate future data for testing. The training window spans 52 weeks, validation covers the subsequent 13 weeks, and testing happens on completely unseen quarters beyond that. When the system predicts Q2 2024 prices, it uses only information available through Q1 2024—exactly as real trading requires.

This temporal discipline revealed an operational insight often hidden by simpler approaches: models need quarterly retraining to maintain performance. After 13 weeks, prediction quality degraded 15-20% as market dynamics shifted. A single train/test split would have shown artificially stable results, concealing the real computational costs of keeping models current in production.

Multi-Objective Optimization for Trading

In academic forecasting, lower RMSE means better performance. In trading, correctly predicting direction matters more than precisely estimating magnitude—if you forecast Bitcoin will rise 5% but it rises 8%, your trade still profits. Kallos Models uses Optuna’s multi-objective optimization to balance competing goals:

Objective 1: Minimize prediction error (RMSE)

Objective 2: Maximize directional accuracy (percent of correct up/down calls)

The optimization explores Pareto-optimal solutions along this trade-off frontier. Some configurations minimize error but miss directional calls. Others sacrifice precision for better trading signals. The multi-objective approach surfaces these trade-offs explicitly, letting practitioners choose based on use case. A portfolio manager seeking trading signals values different solutions than a researcher studying price dynamics.

The framework also implements a direction-selective MSE loss function that adds a penalty term for directional mistakes:

\[\mathcal{L} = \text{MSE} + \lambda \cdot \text{DirectionPenalty}\]

The hyperparameter $\lambda$ controls the trade-off, and Optuna optimizes it alongside architecture parameters like hidden dimensions and dropout rates.

Three-Step MLOps Pipeline

The system enforces separation between tuning, training, and evaluation to prevent data leakage:

Step 1: Hyperparameter Tuning runs Optuna trials with walk-forward cross-validation on training and validation data. The search explores input window sizes (14-90 days), network depth (1-4 layers), hidden dimensions (32-256), dropout (0.0-0.7), learning rates (1e-5 to 1e-2), and the direction penalty weight (0.0-2.0). Each trial trains a model on historical data and evaluates on the immediate future, accumulating performance across multiple walk-forward splits.

Step 2: Final Model Training uses optimal hyperparameters from Step 1 to train a production model on the combined training and validation data. This model has never seen the hold-out test set, preventing the subtle overfitting that occurs when hyperparameters get tuned on test data.

Step 3: Hold-Out Evaluation assesses the final model on completely unseen data, producing metrics and visualizations that reflect real-world performance. This separation ensures no information from test data influences training or hyperparameter selection.

Optuna studies persist to PostgreSQL, enabling resumable optimization across sessions and collaborative tuning where multiple processes contribute trials. The framework checks if a study already completed the requested number of trials, avoiding redundant computation.

Architecture Flexibility Through Factory Pattern

The framework provides a unified interface across three state-of-the-art architectures. A factory function abstracts differences between RNN and Transformer models, normalizing parameters like mapping RNN hidden_dim to Transformer d_model. This design lets researchers experiment with different architectures using identical configuration files—just change the architecture parameter from “gru” to “transformer”.

GRU (Gated Recurrent Unit): Efficient RNN variant with lower computational cost than LSTM, effective for medium-length sequences

LSTM (Long Short-Term Memory): Classic architecture for sequential data with proven track record in financial forecasting

Transformer: Attention-based architecture with parallelizable training, superior performance on complex patterns

All models train using PyTorch Lightning for enterprise-grade training infrastructure including automatic checkpointing, early stopping, and GPU acceleration. Fixed random seeds ensure reproducibility across runs. The single-step forecasting configuration (output_chunk_length=1) optimizes for next-value prediction common in trading applications.

Command-Line Workflow

The CLI provides intuitive access to the complete pipeline:

Tune hyperparameters by specifying architecture, target variable, covariates, number of trials, and training date ranges. Output: best_params.json with optimal configuration.

Train final model using the best parameters JSON, outputting trained PyTorch Lightning model and fitted preprocessing scaler.

Evaluate on hold-out data with test date range, producing metrics JSON (RMSE, MAE, MAPE, direction accuracy) and forecast visualization comparing predictions to actual values.

This three-command workflow enforces MLOps best practices while remaining accessible to practitioners. Configuration management uses JSON files and environment variables, keeping sensitive credentials out of version control.

Data Pipeline Integration

The framework loads financial data directly from PostgreSQL, enabling real-time data access and version control. The feature preprocessing pipeline supports modular transformations for different feature categories—price features get log returns and normalization, technical indicators get scaling and clipping, market microstructure features get differencing and standardization.

This sophisticated handling ensures appropriate scaling for heterogeneous financial data. Mixing raw prices with percentage indicators and volume metrics without proper normalization would bias model training toward high-magnitude features regardless of predictive value.

Key Achievements

Production-Ready Practices: Walk-forward validation, quarterly retraining, separation of tuning/training/evaluation

Multi-Objective Optimization: Balances prediction accuracy with trading signal quality

Temporal Integrity: Proper handling of time-series dependencies throughout the pipeline

Persistent Infrastructure: Database-backed studies and data management survive restarts

Clean Abstractions: Factory patterns and unified interfaces across architectures

The complete results including forecast residual distributions and quarterly performance breakdowns appear in the research paper available through the main Kallos GRU project page.

Technologies

Core Stack: Python 3.8+, Darts 0.21.0+, PyTorch, Optuna 3.0+, PostgreSQL

Supporting Libraries: pandas, scikit-learn, PyTorch Lightning, python-dotenv

Explore the Framework

This MLOps framework demonstrates production-grade practices for financial machine learning: rigorous validation, clean architecture, and real-world deployability. The complete implementation including the CLI, hyperparameter search spaces, and custom loss function is available on GitHub.

View Repository →


Part of the Kallos trading system research, providing the forecasting engine that feeds portfolio optimization with neural network predictions while maintaining temporal integrity and production-ready validation.