The rapid adoption of blockchain-based financial systems has been accompanied by a surge in illicit activities, including money laundering, ransomware payments, phishing scams, and terrorist financing, necessitating robust anomalous transaction detection mechanisms. Detecting anomalies in cryptocurrency transactions is critical, as undetected illicit activity can result in significant economic losses and undermine trust in digital financial systems. This systematic review examines the state-of-the-art in cryptocurrency anomaly detection, with particular focus on methodological developments between 2008 and December 2025. A PRISMA-guided systematic literature search was conducted across IEEE Xplore, Scopus, Web of Science, ACM Digital Library, Google Scholar, and SpringerLink. From an initial set of 450 records, 32 empirical studies were selected after rigorous screening and eligibility assessment and included in the qualitative synthesis. Unlike prior surveys, this review provides a focused synthesis of empirical cryptocurrency transaction studies, a taxonomy of anomaly types, and a critical assessment of dataset bias and evaluation practices. The literature reveals a clear methodological shift from traditional feature-engineered machine learning approaches (e.g., Random Forest, XGBoost, and Support Vector Machines) toward graph-based deep learning architectures. Graph Neural Networks (GNNs), particularly Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs), demonstrate competitive performance by capturing relational dependencies among blockchain addresses, while temporal graph models and hybrid GNN–transformer architectures enhance the detection of evolving, multi-hop laundering schemes. Unsupervised and semi-supervised approaches address the challenge of limited labeled data but introduce trade-offs in interpretability. Emerging research directions include privacy-preserving federated learning and cross-chain detection frameworks. Despite some studies reporting accuracies exceeding 90%, the field faces several limitations, including dataset bias, lack of standardized multi-chain benchmarks, inconsistency in evaluation metrics, limited adversarial robustness testing, scalability constraints, and insufficient explainability for regulatory compliance. This review aims to provide researchers and practitioners with a structured synthesis of current methodologies, a comprehensive taxonomy of anomalies, and a detailed roadmap for transitioning from experimental validation to real-world, scalable deployment.