Transformer

I’ll skip the theory. Here’s what actually happened. The Problem I was building a crypto price prediction model for SOL, ETH, and BTC perpetual futures. The plan was simple: train a Transformer on 70 features, get a good signal, trade it live. Three weeks of training later: SOL Transformer v3b: train_acc = 69.3%, val_acc = 54.1%, gap = 15.2% ETH Transformer v3b: train_acc = 78.6%, val_acc = 55.9%, gap = 22.7% The gap between training and validation accuracy was 15-22%. Classic overfitting. The model was memorizing patterns, not learning them. ...