On the big data processing algorithms for finding frequent sequences
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Sequential pattern mining algorithms extract trendy sequence appearances insideordered transactional datasets such as market basket datasets. There is a lack ofresearch employing big data processing techniques to locate frequent sequences onlarge-scale datasets. Furthermore, there is a need for optimized sequential patternmining algorithms that run on ordered one-dimensional sequences. We also observe alack of sequential pattern search studies in the literature, where the focus is centeredaround multi-dimensional data sequences. Existing approaches that deal with orderedone-dimensional datasets suffer from scalability issues as the amount of data to beanalyzed is enormous. This research investigates the big data processing techniquesused to find frequent sequences in large-scale datasets. It also proposes a scalablesequence pattern mining algorithm called Sequential Pattern Acquisition by ReducingSearch Space (SPARSS) designed for distributed data processing systems that effi-ciently handle large datasets containing sequential one-element data. It introducesa prototype implementation of SPARSS and provides information on the SPARSS’smemory and time requirements, which were calculated as part of experimental stud-ies on a real-world dataset. The results confirm our expectations and demonstrateSPARSS’s superior scalability and run-time efficiency compared to other distributedalgorithms.