A Comprehensive Review of the Three Main Topic Modeling Algorithms and Challenges in Albanian Employability Skills

  • Milena Shehu University of Tirana, Albania
  • Eralda Gjika Royal College of Physicians and Surgeons of Canada, Ottawa, Canada
Keywords: Topic modelling, Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), BERTopic, Employability Skills

Abstract

Today’s jobseekers face many obstacles while trying to find a career that aligns with their interests, employability soft skills, and professional experience. In Albania, jobseekers frequently initiate their job search by actively exploring job vacancies listed on various online job portals. The analysis of job vacancies posted online provides an added advantage to the labour market actors compared to traditional survey-based analyses. This is because it enables a faster analytical process, promotes decision-making based on accurate data, and should be carefully considered by every country when formulating their Labor Market Policies. Since the data posted online are unlabelled, it has been proven that the potential of unsupervised learning techniques, more precisely the Topic Modelling algorithms, is outstanding when applied to analysing job vacancies, mainly with regard to assessing employability soft skills. Algorithms in topic modelling are essential for uncovering hidden patterns in texts, facilitating the extraction of important data, generating document summaries, and enhancing content comprehension. This paper analyses and compares the three primary methodologies and algorithms used in topic modelling, which can be applied to analyse employability soft-skills: Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and BERTopic. At the end of the paper, conclusions are drawn regarding superior performance and optimal algorithm applicability, challenges, and limitations through a review of studies conducted in the Albanian job market.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

PlumX Statistics

References

1. Abdelrazek, A., Eid, Y., Gawish, E., Medhat, W., & Hassan, A. (2023). Topic modeling algorithms and applications: A survey. Information Systems, 112, 102131. https://doi.org/10.1016/j.is.2022.102131
2. Alcoforado, A., Ferraz, T. P., Gerber, R., Bustos, E., Oliveira, A. S., Veloso, B. M., Siqueira, F. L., & Costa, A. H. R. (2022). ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling (pp. 125–136). https://doi.org/10.1007/978-3-030-98305-5_12
3. Ao, Z., Horváth, G., Sheng, C., Song, Y., & Sun, Y. (2023). Skill requirements in job advertisements: A comparison of skill-categorization methods based on wage regressions. Information Processing & Management, 60(2), 103185. https://doi.org/10.1016/j.ipm.2022.103185
4. Balazon, F. G., Vinluan, A. A., & Ambat, S. C. (2018). Job Matching Platform Using Latent Semantic Indexing and Location Mapping Algorithms. Asia Pacific Journal of Multidisciplinary Research, 6(4). www.apjmr.com
5. Blei, D. M., Ng, A. Y., & Edu, J. B. (2003). Latent Dirichlet Allocation Michael I. Jordan. In Journal of Machine Learning Research (Vol. 3).
6. Boriçi Kraja, Y. & Albana Begani Boriçi, A. (2021). Enhancing employability skills valued by employers-Case of Albania. Academic Journal of Business, 7(3). www.iipccl.org
7. Çano, E. & Lamaj, D. (2024). AlbNews: A Corpus of Headlines for Topic Modeling in Albanian. http://arxiv.org/abs/2402.04028
8. Chiarello, F., Fantoni, G., Hogarth, T., Giordano, V., Baltina, L., & Spada, I. (2021). Towards ESCO 4.0 – Is the European classification of skills in line with Industry 4.0? A text mining approach. Technological Forecasting and Social Change, 173, 121177. https://doi.org/10.1016/j.techfore.2021.121177
9. Cvijetic, B. & Radivojevic, Z. (2020). Application of machine learning in the process of classification of advertised jobs. IJEEC - INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTING, 4(2). https://doi.org/10.7251/IJEEC2002093C
10. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990a). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
11. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990b). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
12. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North, 4171–4186. https://doi.org/10.18653/v1/N19-1423
13. Djumalieva, J., Lima, A., & Sleeman, C. (2018). Classifying Occupations According to Their Skill Requirements in Job Advertisements. www.escoe.ac.uk.
14. ElSharkawy, G., Helmy, Y., & Yehia, E. (2022). Employability Prediction of Information Technology Graduates using Machine Learning Algorithms. International Journal of Advanced Computer Science and Applications, 13(10). https://doi.org/10.14569/IJACSA.2022.0131043
15. Fejzulla, P. E. (2021). Increasing Youth Employability in Albania by Enhancing Skills through Vocational Education. European Journal of Economics and Business Studies, 7(2), 12. https://doi.org/10.26417/685lur76k
16. Fetahu, E. & Lekli, L. (2023). Developing Soft Skills, the Intangible Qualities Empowering Competitiveness and Success in the Labor Market, Case Study, Elbasan, Albania. WSEAS Transactions on Business and Economics, 20, 965–976. https://doi.org/10.37394/23207.2023.20.89
17. Giabelli, A., Malandri, L., Mercorio, F., Mezzanzanica, M., & Seveso, A. (2021). NEO: A System for Identifying New Emerging Occupation from Job Ads. Proceedings of the AAAI Conference on Artificial Intelligence, 35(18), 16035–16037. https://doi.org/10.1609/aaai.v35i18.18004
18. Golowko, N. (2021). The Improvement of Sustainable Employability Transfer in Higher Education Institutions Using Large Scale Data Bases and Machine Learning (pp. 165–185). https://doi.org/10.1007/978-3-658-33997-5_6
19. Kherwa, P. & Bansal, P. (2018). Topic Modeling: A Comprehensive Review. ICST Transactions on Scalable Information Systems, 0(0), 159623. https://doi.org/10.4108/eai.13-7-2018.159623
20. Koehn, P. & Knowles, R. (2017). Six Challenges for Neural Machine Translation. Proceedings of the First Workshop on Neural Machine Translation, 28–39. https://doi.org/10.18653/v1/W17-3204
21. Mankolli, E. & Bushati, S. (2023). Candidate Engagement Success Prediction Using Machine Learning and Natural Language Processing Techniques. 2023 24th International Conference on Control Systems and Computer Science (CSCS), 431–435. https://doi.org/10.1109/CSCS59211.2023.00074
22. Minister of State for Youth and Children in Albania (2022). National Youth Strategy and Action Plan 2022-2029.
23. Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2010). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. International Journal of Surgery, 8(5), 336–341. https://doi.org/10.1016/j.ijsu.2010.02.007
24. Nikolaev, I. (2023). An intelligent method for generating a list of job profile requirements based on neural network language models using ESCO taxonomy and online job corpus. Business Informatics, 17(2), 71–84. https://doi.org/10.17323/2587-814X.2023.2.71.84
25. Sawant, S., Yu, J., Pandya, K., Ngan, C.-K., & Bardeli, R. (2022). An Enhanced BERTopic Framework and Algorithm for Improving Topic Coherence and Diversity. 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), 2251–2257. https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00332
26. Shehu Milena & Stringa Areti (2024). National measures undertaken to improve youth employability and further develop employability skills in Albania. CIDE Conference 14-21. https://upg-elearning.ro/cide23/about-the-conference/conference-proceedings/
27. Tijdens Kea (2019). Measuring job tasks by ISCO-08 occupational group.
28. Tufail, S., Riggs, H., Tariq, M., & Sarwat, A. I. (2023). Advancements and Challenges in Machine Learning: A Comprehensive Review of Models, Libraries, Applications, and Algorithms. Electronics, 12(8), 1789. https://doi.org/10.3390/electronics12081789
29. Varavallo, G., Scarpetti, G., & Barbera, F. (2023). The moral economy of the great resignation. Humanities and Social Sciences Communications, 10(1), 587. https://doi.org/10.1057/s41599-023-02087-x
30. Xu, A., Wu, Y., Meng, F., Xu, S., & Zhu, Y. (2022). Knowledge and Skill Sets for Big Data Professions: Analysis of Recruitment Information Based on The Latent Dirichlet Allocation Model. Www.Amfiteatrueconomic.Ro, 24(60), 464. https://doi.org/10.24818/EA/2022/60/464
31. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating Text Generation with BERT. http://arxiv.org/abs/1904.09675
Published
2024-04-29
How to Cite
Shehu, M., & Gjika, E. (2024). A Comprehensive Review of the Three Main Topic Modeling Algorithms and Challenges in Albanian Employability Skills. European Scientific Journal, ESJ, 20(12), 31. https://doi.org/10.19044/esj.2024.v20n12p31
Section
ESJ Natural/Life/Medical Sciences