Fuzzy Matching: An Alternative Technique for Merging Extracted Web Data


Web scrapping has been a popular method for collecting data from websites. This is because data on the internet is updated frequently thus making it a good source for getting accurate information. However, the non-homogeneous nature of each website may cause the data from the different internet web sources to have different data making the quality of the data inconsistent. Previous study has proposed the use of record linkage method to merge data from multiple websites. The record linkage method proposed by previous study used deterministic technique to match data which match the string of matching variable to merge data. However, deterministic technique requires the matching variable to be an exact match to be able to match. This study explores the use of fuzzy matching technique as an alternative technique. A comparison in this study found out that fuzzy matching has a slightly better performance in merging web data. However, the main drawback of fuzzy matching is that it is hard to determine the threshold to trigger a match. Therefore, the future work should focus on exploring an optimal method on determining the threshold for fuzzy matching to making the process more streamlined.

Author Biographies

Lee Qi Zian, Universiti Teknikal Malaysia Melaka
Lee Qi Zian is a graduate student at the Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka. He received his Bachelor Degree in Computer Science (Artificial Intelligent) in 2021. He is currently pursuing his Master Degree in Technology in 2023.
Nur Zareen Zulkarnain, Universiti Teknikal Malaysia Melaka

Nur Zareen Zulkarnain is currently a Senior Lecturer at Universiti Teknikal Malaysia Melaka. She received her Ph.D. in Computer Science (Natural Language Processing) from the University of Salford, United Kingdom. Her research interests include sentiment analysis, ontology, informatics, and data analytics.

Yogan Jaya Kumar, Universiti Teknikal Malaysia Melaka

Yogan Jaya Kumar is a Senior Lecture in Universiti Teknikal Malaysia Melaka.  He earned his Bachelor’s Degree and Master’s Degree from Universiti Sains Malaysia. He completed his Ph.D. in 2014 in the field of Computer Science. His research interest  involves the field of text mining, information extraction and AI applications.


How to Cite
Zian, L., Zulkarnain, N. Z., & Kumar, Y. (2024). Fuzzy Matching: An Alternative Technique for Merging Extracted Web Data. Journal of Advanced Computing Technology and Application (JACTA), 6(1), 1-13. https://doi.org/10.54554/jacta.2024.06.01.001