Iranian Journal of Information Processing and Management

Iranian Journal of Information Processing and Management

A Conceptual Framework for Preprocessing and Improving Quality of Event Log in Process Mining

Authors
Abstract
In today's challenging world, organizational growth is not possible without the efficient use of data. Process mining uses machine learning methods and business process management concepts to extract hidden knowledge about business processes from data stored in information systems. Process Discovery is the first step in process mining. The main goal of process discovery is to transform the event log into a process model. However, using process discovery methods will not be possible without appropriate data because any analysis based on low-quality data will lead to poor insights and bad decisions that will negatively affect the performance of the organization or business. This paper aims to provide a new conceptual framework for preprocessing data input into process discovery methods to improve the quality of the extracted model. The proposed conceptual framework has been developed using a qualitative research process based on grounded theory. For this purpose, 102 articles related to the domain of data quality in process mining were reviewed, and the most critical challenges of data quality in this field have been identified after filtering and integrating them from the literature, including “noisy/infrequent events”, “outlier events”, “anomalous events”, “missing values”, “incorrect time format”, “ambiguous timestamps”, “synonymous activities”, and “size and complexity”. Then, the basic steps for data preprocessing and cleaning tasks are defined, which include the activities of “repair”, “anomaly detection”, “filtering”, and “dimensional reduction. The final preprocessing framework then builds on data quality issues and identified activities. Four standardized datasets derived from real-world processes were used to assess the proposed framework's performance. Firstly, these data are raw, and secondly, four standard process discovery algorithms are applied after preprocessing by the introduced framework. The results showed that the preprocessing of the input data leads to the improvement of the model quality criteria extracted from the process discovery algorithms. Furthermore, to evaluate the validity of the proposed framework, its performance was compared with three preprocessing methods: “sampling”, “statistical preprocessing”, and “prototype selection”, which the results indicate better efficiency of the proposed approach. The results of this study can be used as guidelines by data and business analysts to identify and resolve data quality problems in process mining projects.
Keywords

دوماس، مارلون، مارچلو لازرا، جان مندلینگ‌، و هاجو ریجرز. 2013. مبانی مدیریت فرایندهای کسب‌‌وکار. ترجمه محمدحامد جعفرزاده، جلیل حیدری‌ دهویی و سید محسن رهنمافرد. 1399. تهران: دانشگاه تهران، مؤسسه انتشارات.
شامی زنجانی، مهدی، فراز نبیبی و شادی ایران‌دوست. 1399. ناخدایی دیجیتال: راهنمای تحول سازمان‌ها در عصر دیجیتال. تهران: آریانا قلم.
فن در آلست، ویل. 2011. فرایندکاوی: کشف، تطبیق و بهبود فرایندهای کسب‌وکار. ترجمه سید حسین سیادت و راضیه همتی گشتاسب. 1394. تهران: دانشگاه شهید بهشتی، مرکز چاپ و انتشارات.
Aljuaid, Tahani, and Sreela Sasi. 2016. Proper imputation techniques for missing values in data sets. 2016 international conference on data science and engineering (ICDSE). IEEE.
Andrews, Robert, Suriadi Suriadi, Chun Ouyang, and Erik Poppe. 2018. Towards event log querying for data quality. OTM Confederated International Conferences On the Move to Meaningful Internet Systems. Valletta, Malta.
Andrews, Robert, Christopher GJ van Dun, Moe Thandar Wynn, Wolfgang Kratsch, MKE Röglinger, and Arthur HM ter Hofstede. 2020. Quality-informed semi-automated event log generation for process mining. Decision Support Systems 132: 113265.
Augusto, Adriano, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, and Artem Polyvyanyy. 2019. Split miner: automated discovery of accurate and simple business process models from event logs. Knowledge and Information Systems 59 (2): 251-284.
Ayo, Femi Emmanuel, Olusegun Folorunso, and Friday Thomas Ibharalu. 2017. A probabilistic approach to event log completeness. Expert Systems with Applications 80: 263-272.
Batyuk, A Ye, and Volodymyr V Voityshyn. 2018. Process Mining: Applied Discipline and Software Implementations. Research Bulletin of the National Technical University of Ukraine ? (5): 22-36.
Bauer, Martin, Arik Senderovich, Avigdor Gal, Lars Grunske, and Matthias Weidlich. 2018. How much event data is enough? A statistical framework for process discovery. International Conference on Advanced Information Systems Engineering. Tallinn, Estonia.
Berti, Alessandro, and Wil MP van der Aalst. 2019. Reviving Token-based Replay: Increasing Speed While Improving Diagnostics. Diagnostics. 19th International Conference on Application of Concurrency to System Design, Aachen, Germany.
ATAED@ Petri Nets/ACSD.
Bose, R. P. Jagadeesh Chandra, Ronny S Mans, and Wil MP van der Aalst. 2013. Wanna improve process mining results?: it’s high time we consider data quality issues seriously. Business Process Management reports reports 1302.
Burattin, Andrea. 2015. Process mining techniques in business environments. In volume 207 of Lecture Notes in Business Information Processing. Berlin: Springer.
Chen, Qifan, Yang Lu, Charmaine Tam, and Simon Poon. 2021. A Novel Approach to Detect Redundant Activity Labels For More Representative Event Logs." arXiv preprint arXiv: 2103. 16061.
Cichy, Corinna, and Stefan Rass. 2019. An overview of data quality frameworks. IEEE Access 7: 24634-24648.
Conforti, Raffaele, Marcello La Rosa, and A Ter Hofstede. 2018. Timestamp repair for business process event logs. Melbourne: University of Melbourne:
De Leoni, Massimiliano, and Felix Mannhardt. 2015. Road Traffic Fine Management Process. edited by 4TU.ResearchData. Dataset. https://doi.org/10.4121/uuid:270fd440-1057-4fb9-89a9-b699b47990f5
De Medeiros, Ana Karla A, Boudewijn F Van Dongen, Wil Van der Aalst, and AJMM Weijters. 2004. Process mining for ubiquitous mobile systems: an overview and a concrete algorithm. International Workshop on Ubiquitous Mobile Information and Collaboration Systems. Second CAiSE Workshop, UMICS 2004, Riga, Latvia, Revised Selected Papers 2 (pp. 151-165). Springer Berlin Heidelberg.
De Weerdt, Jochen, Manu De Backer, Jan Vanthienen, and Bart Baesens. 2011. A robust F-measure for evaluating discovered process models. 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE.
Emamjome, Fahame, Robert Andrews, Arthur ter Hofstede, and Hajo Reijers. 2020. Alohomora: Unlocking data quality causes through event log context. Proceedings of the 28th European Conference on Information Systems (ECIS2020).
Eppler, Martin J, and Dörte Wittig. 2000. Conceptualizing Information Quality: A Review of Information Quality Frameworks from the Last Ten Years. Proceedings of the 2000 Conference on Information Quality; 2000; Cam-bridge (MA), USA.
Fani Sani, Mohammadreza, Sebastiaan J van Zelst, and Wil MP van der Aalst. 2019. The impact of event log subset selection on the performance of process discovery algorithms. European Conference on Advances in Databases and Information Systems. ADBIS 2019 Short Papers, Workshops BBIGAP, QAUCA, SemBDM, SIMPDA, M2P, MADEISD, and Doctoral Consortium, Bled, Slovenia, September 8–11, 2019, Proceedings 23 (pp. 391-404). Springer International Publishing.
Ferreira, Diogo R, and Daniel Gillblad. 2009. Discovering process models from unlabelled event logs. International Conference on Business Process Management. Ulm, Germany
Ghionna, Lucantonio, Gianluigi Greco, Antonella Guzzo, and Luigi Pontieri. 2008. Outlier detection techniques for process mining applications. International symposium on methodologies for intelligent systems. Toronto, Canada.
Helal, Iman, and Ahmed Awad. 2020. Correlating Unlabeled Events at Runtime. arXiv preprint arXiv:2004.09971.
Jabareen, Yosef. 2009. Building a conceptual framework: philosophy, definitions, and procedure. International journal of qualitative methods 8 (4): 49-62.
Laranjeiro, Nuno, Seyma Nur Soydemir, and Jorge Bernardino. 2015. A survey on data quality: classifying poor data. 2015 IEEE 21st Pacific rim international symposium on dependable computing (PRDC). IEEE.
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst. 2013. Discovering block-structured process models from event logs-a constructive approach. International conference on applications and theory of Petri nets and concurrency. Berlin Heidelberg.
Li, Feng. 2020. Leading digital transformation: three emerging approaches for managing the transition. International Journal of Operations & Production Management 40 (6): 809-817.
Lopes, Iezalde F, and Diogo R Ferreira. 2019. A survey of process mining competitions: the BPI challenges 2011–2018. International Conference on Business Process Management. Vienna, Austria.
Loshin, David. 2010. The practitioner's guide to data quality improvement Burlington: Morgan Kaufmann.
Lu, Ke, Xianwen Fang, Na Fang, and Esther Asare. 2021. Discovery of effective infrequent sequences based on maximum probability path. Connection Science 34 (1): 63-82.
Lu, Xixi, Dirk Fahland, Frank van den Biggelaar, and Wil van der Aalst. 2016. Handling duplicated tasks in process discovery by refining event labels. International Conference on Business Process Management. Rio de Janeiro, Brazil
.Lu, Xixi, Dirk Fahland, and Wil MP van der Aalst. 2016. In Proceedings of the BPM Demo Track 2016 Co-located with the 14th International Conference on Business Process Management (BPM 2016), Rio de Janeiro, Brazil, 21 September 2016; pp. 44–49.
Ly, Linh Thao, Conrad Indiono, Jürgen Mangler, and Stefanie Rinderle-Ma. 2012."Data transformation and semantic log purging for process mining. International Conference on Advanced Information Systems Engineering. Gdansk, Poland.
Mannhardt, Felix. 2016. Sepsis Cases - Event Log. edited by 4TU.ResearchData. https://doi.org/10.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460
Mannhardt, Felix. 2017. Hospital Billing - Event Log. edited by 4TU.ResearchData.https://doi.org/10.4121/uuid:76c46b83-c930-4798-a1c9-4be94dfeb741
Munoz-Gama, Jorge, and Josep Carmona. 2010. A fresh look at precision in process conformance. International Conference on Business Process Management. Hoboken, NJ, USA. Pyle, Dorian. 1999. Data preparation for data mining. San Fransisco: Morgan Kaufmann.
Rocco, Tonette S, and Maria S Plakhotnik. 2009. Literature reviews, conceptual frameworks, and theoretical frameworks: Terms, functions, and distinctions. Human Resource Development Review 8 (1):120-130.
Sadeghianasl, Sareh, Arthur HM ter Hofstede, Moe T Wynn, and Suriadi Suriadi. 2019. A contextual approach to detecting synonymous and polluted activity labels in process event logs. OTM Confederated International Conferences On the Move to Meaningful Internet Systems. Rhodes, Greece.
Sani, Mohammadreza Fani. 2020. Preprocessing Event Data in Process Mining. CAiSE (Doctoral Consortium). Grenoble, France.
_____, Mathilde Boltenhagen, and Wil van der Aalst. 2019. Prototype selection based on clustering and conformance metrics for model discovery. arXiv preprint arXiv:1912.00736.
Sani, Mohammadreza Fani, Sebastiaan J van Zelst, and Wil MP van der Aalst. 2017. Improving process discovery results by filtering outliers using conditional behavioural probabilities. International Conference on Business Process Management.
_____. 2018a. Applying sequence mining for outlier detection in process mining. OTM Confederated International Conferences" On the Move to Meaningful Internet Systems. Barcelona, Spain.
_____. 2018b. Repairing outlier behaviour in event logs. International Conference on Business Information Systems. Berlin, Germany.
Sim, Sunghyun, Hyerim Bae, and Yulim Choi. 2019. Likelihood-based multiple imputation by event chain methodology for repair of imperfect event logs with missing data. 2019 International Conference on Process Mining (ICPM). IEEE.
Succar, Bilal. 2009. Building information modelling framework: A research and delivery foundation for industry stakeholders. Automation in construction 18 (3):357-375.
Suriadi, Suriadi, Robert Andrews, Arthur HM ter Hofstede, and Moe Thandar Wynn. 2017. Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs. Information systems 64: 132-150.
Tax, Niek, Natalia Sidorova, and Wil MP van der Aalst. 2019. Discovering more precise process models from event logs by filtering out chaotic activities. Journal of Intelligent Information Systems 52 (1): 107-139.
Van Der Aalst, Wil. 2016. Process Mining, Data science in action, Process mining. Heidelberg: Springer.
_____, Arya Adriansyah, Ana Karla Alves De Medeiros, Franco Arcieri, Thomas Baier, Tobias Blickle, Jagadeesh Chandra Bose, Peter Van Den Brand, Ronald Brandtjen, and Joos Buijs. 2011. Process mining manifesto. International conference on business process management. Berlin Heidelberg.
Van der Aalst, Wil, Ton Weijters, and Laura Maruster. 2004. Workflow mining: Discovering process models from event logs. IEEE transactions on knowledge and data engineering 16 (9): 1128-1142.
Van Dongen, Boudewijn. 2012. BPI Challenge 2012, Event log of a loan application process. edited by 4TU.ResearchData.https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f
Verhulst, Rick. 2016. Evaluating quality of event data within event logs: an extensible framework. Master’s thesis, Eindhoven University of Technology.
Wang, Richard Y, and Diane M Strong. 1996. Beyond accuracy: What data quality means to data consumers. Journal of management information systems 12 (4): 5-33.
Weijters, AJMM, Wil MP van Der Aalst, and AK Alves De Medeiros. 2006. Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP 166: 1-34.
Wynn, Moe Thandar, and Shazia Sadiq. 2019. Responsible process mining-a data quality perspective. International Conference on Business Process Management. Vienna, Austria.
Yi, Guo, and Zhang Peng. 2019. Novel Approach to Discover Precise Process Model by Filtering out Log Chaotic Activities. Journal of Computers 30 (4): 140-150.
Zamora-Polo, Francisco, Amalia Luque Sendra, Francisco Aguayo-Gonzalez, and Jesus Sanchez-Martin. 2019. Conceptual framework for the use of building information modeling in engineering education. International Journal of Engineering Education 35 (3): 744-755.

  • Receive Date 10 December 2022
  • Revise Date 28 January 2023