ارائه یک چارچوب مفهومی برای پیش‌پردازش و بهبود کیفیت نگاره‌های رویداد در فرایندکاوی

صالحی, احمد; اقدسی, محمد; خطیبی, توکتم; شیخ محمدی, مجید

doi:10.22034/jipm.2023.698594

ارائه یک چارچوب مفهومی برای پیش‌پردازش و بهبود کیفیت نگاره‌های رویداد در فرایندکاوی

نویسندگان

احمد صالحی

محمد اقدسی

توکتم خطیبی

مجید شیخ محمدی

دانشکده مهندسی صنایع و سیستم‌ها، دانشگاه تربیت مدرس، تهران، ایران

10.22034/jipm.2023.698594

چکیده

در دنیای پیچیدة امروز حیات سازمان‌ها و کسب‌وکارها بدون شناخت و استفادة کارآمد از داده‌ها امکان‌پذیر نخواهد بود. فرایندکاوی با ترکیب روش‌های یادگیری ماشین و مفاهیم مدیریت فرایندهای کسب‌وکار تلاش دارد دانش نهان مربوط به چگونگی اجرای فرایندها را از داده‌های ذخیره‌شده در سامانه‌های اطلاعاتی استخراج نماید. اولین گام در فرایندکاوی، فعالیت کشف فرایند است که امکان مدل‌سازی فرایندها بر مبنای داده‌های رویداد ورودی را فراهم می‌سازد. اما استفاده از این مزیت بدون وجود داده‌های مناسب و باکیفیت فراهم نخواهد شد، زیرا هر گونه تحلیل بر پایة داده‌های با کیفیت پایین منجر به ایجاد بینش و تصمیمات نامناسبی می‌شود که بر عملکرد سازمان یا کسب‌‎وکار تأثیر منفی خواهد گذاشت. هدف این پژوهش ارائة یک چارچوب مفهومی جدید برای پیش‌پردازش داده‌های ورودی به ‌روش‌های کشف فرایند است تا کیفیت مدل فرایند نهایی بهبود یابد. چارچوب مفهومی پیشنهادی با استفاده از یک روش پژوهش کیفی بر اساس نظریة داده‌بنیاد پدید آمده است. بدین ‌منظور، 102 پژوهش مرتبط با حوزة کیفیت داده در فرایندکاوی مورد بررسی قرار گرفته و مهم‌ترین چالش‌های کیفیت داده در این زمینه پس از پالایش و یکپارچه‌سازی آن‌ها از ادبیات شناسایی شده‌اند که شامل: «رویدادهای آشفته/ کم‌تکرار»، «رویدادهای پرت»، «رویدادهای ناهنجار»، «مقادیر گمشده»، «قالب زمانی نادرست»، «برچسب‌های زمانی مبهم»، «فعالیت‌های مترادف» و «اندازه و پیچیدگی» است. در ادامه، گام‌های اساسی برای پیش‌پردازش و پاک‌سازی مناسب داده‌ها تعیین شده‌اند که دربرگیرندة فعالیت‎های «ترمیم»، «کشف ناهنجاری»، «پالایش» و «کاهش ابعاد» می‌شوند. سپس، چارچوب مفهومی نهایی بر پایة مشکلات کیفیت داده و فعالیت‌های پاک‌سازی شناسایی‌شده ایجاد شده است. برای بررسی عملکرد چارچوب پیشنهادی از چهار مجموعه دادة استاندارد‌ برگرفته از فرایندهای واقعی استفاده شده است. این داده‌ها در مرحلة اول به‌صورت خام و در مرحلة دوم پس از انجام پیش‌پردازش توسط چارچوب معرفی‌شده به چهار الگوریتم متداول کشف فرایند اعمال شده‌اند. نتایج نشان داد که پیش‌پردازش داده‌های ورودی منجر به بهبود معیارهای کیفیت مدل استخراج‌شده از الگوریتم‌های کشف فرایند می‌شود. همچنین، برای سنجش اعتبار چارچوب پیشنهادی، عملکرد آن با سه روش پیش‌پردازش «نمونه‌برداری»، «پیش‌پردازش آماری» و «انتخاب نمونة اولیه»‌ مقایسه شده که برایندها بیانگر کارایی بهتر رویکرد پیشنهادی بوده است. نتایج پژوهش حاضر می‌تواند به‌عنوان یک رهیافت کاربردی توسط متخصصان و تحلیلگران داده و کسب‎وکار در پروژه‌های فرایندکاوی مورد استفاده قرار گیرد.

کلیدواژه‌ها

سامانه‌های اطلاعاتی

مدیریت فرایندهای کسب‌‎وکار

فرایندکاوی

کیفیت داده

پیش‌پردازش نگاره رویداد

عنوان مقاله English

A Conceptual Framework for Preprocessing and Improving Quality of Event Log in Process Mining

نویسندگان English

Ahmad Salehi

Mohammad Aghdasi

Toktam Khatibi

Majid Sheikhmohammady

چکیده English

In today's challenging world, organizational growth is not possible without the efficient use of data. Process mining uses machine learning methods and business process management concepts to extract hidden knowledge about business processes from data stored in information systems. Process Discovery is the first step in process mining. The main goal of process discovery is to transform the event log into a process model. However, using process discovery methods will not be possible without appropriate data because any analysis based on low-quality data will lead to poor insights and bad decisions that will negatively affect the performance of the organization or business. This paper aims to provide a new conceptual framework for preprocessing data input into process discovery methods to improve the quality of the extracted model. The proposed conceptual framework has been developed using a qualitative research process based on grounded theory. For this purpose, 102 articles related to the domain of data quality in process mining were reviewed, and the most critical challenges of data quality in this field have been identified after filtering and integrating them from the literature, including “noisy/infrequent events”, “outlier events”, “anomalous events”, “missing values”, “incorrect time format”, “ambiguous timestamps”, “synonymous activities”, and “size and complexity”. Then, the basic steps for data preprocessing and cleaning tasks are defined, which include the activities of “repair”, “anomaly detection”, “filtering”, and “dimensional reduction. The final preprocessing framework then builds on data quality issues and identified activities. Four standardized datasets derived from real-world processes were used to assess the proposed framework's performance. Firstly, these data are raw, and secondly, four standard process discovery algorithms are applied after preprocessing by the introduced framework. The results showed that the preprocessing of the input data leads to the improvement of the model quality criteria extracted from the process discovery algorithms. Furthermore, to evaluate the validity of the proposed framework, its performance was compared with three preprocessing methods: “sampling”, “statistical preprocessing”, and “prototype selection”, which the results indicate better efficiency of the proposed approach. The results of this study can be used as guidelines by data and business analysts to identify and resolve data quality problems in process mining projects.

کلیدواژه‌ها English

Information Systems

Business Process Management

Process Mining

Data Quality

Event Log Preprocessing

دوماس، مارلون، مارچلو لازرا، جان مندلینگ‌، و هاجو ریجرز. 2013. مبانی مدیریت فرایندهای کسب‌‌وکار. ترجمه محمدحامد جعفرزاده، جلیل حیدری‌ دهویی و سید محسن رهنمافرد. 1399. تهران: دانشگاه تهران، مؤسسه انتشارات.

شامی زنجانی، مهدی، فراز نبیبی و شادی ایران‌دوست. 1399. ناخدایی دیجیتال: راهنمای تحول سازمان‌ها در عصر دیجیتال. تهران: آریانا قلم.

فن در آلست، ویل. 2011. فرایندکاوی: کشف، تطبیق و بهبود فرایندهای کسب‌وکار. ترجمه سید حسین سیادت و راضیه همتی گشتاسب. 1394. تهران: دانشگاه شهید بهشتی، مرکز چاپ و انتشارات.

Aljuaid, Tahani, and Sreela Sasi. 2016. Proper imputation techniques for missing values in data sets. 2016 international conference on data science and engineering (ICDSE). IEEE.

Andrews, Robert, Suriadi Suriadi, Chun Ouyang, and Erik Poppe. 2018. Towards event log querying for data quality. OTM Confederated International Conferences On the Move to Meaningful Internet Systems. Valletta, Malta.

Andrews, Robert, Christopher GJ van Dun, Moe Thandar Wynn, Wolfgang Kratsch, MKE Röglinger, and Arthur HM ter Hofstede. 2020. Quality-informed semi-automated event log generation for process mining. Decision Support Systems 132: 113265.

Augusto, Adriano, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, and Artem Polyvyanyy. 2019. Split miner: automated discovery of accurate and simple business process models from event logs. Knowledge and Information Systems 59 (2): 251-284.

Ayo, Femi Emmanuel, Olusegun Folorunso, and Friday Thomas Ibharalu. 2017. A probabilistic approach to event log completeness. Expert Systems with Applications 80: 263-272.

Batyuk, A Ye, and Volodymyr V Voityshyn. 2018. Process Mining: Applied Discipline and Software Implementations. Research Bulletin of the National Technical University of Ukraine ? (5): 22-36.

Bauer, Martin, Arik Senderovich, Avigdor Gal, Lars Grunske, and Matthias Weidlich. 2018. How much event data is enough? A statistical framework for process discovery. International Conference on Advanced Information Systems Engineering. Tallinn, Estonia.

Berti, Alessandro, and Wil MP van der Aalst. 2019. Reviving Token-based Replay: Increasing Speed While Improving Diagnostics. Diagnostics. 19th International Conference on Application of Concurrency to System Design, Aachen, Germany.

ATAED@ Petri Nets/ACSD.

Bose, R. P. Jagadeesh Chandra, Ronny S Mans, and Wil MP van der Aalst. 2013. Wanna improve process mining results?: it’s high time we consider data quality issues seriously. Business Process Management reports reports 1302.

Burattin, Andrea. 2015. Process mining techniques in business environments. In volume 207 of Lecture Notes in Business Information Processing. Berlin: Springer.

Chen, Qifan, Yang Lu, Charmaine Tam, and Simon Poon. 2021. A Novel Approach to Detect Redundant Activity Labels For More Representative Event Logs." arXiv preprint arXiv: 2103. 16061.

Cichy, Corinna, and Stefan Rass. 2019. An overview of data quality frameworks. IEEE Access 7: 24634-24648.

Conforti, Raffaele, Marcello La Rosa, and A Ter Hofstede. 2018. Timestamp repair for business process event logs. Melbourne: University of Melbourne:

De Leoni, Massimiliano, and Felix Mannhardt. 2015. Road Traffic Fine Management Process. edited by 4TU.ResearchData. Dataset. https://doi.org/10.4121/uuid:270fd440-1057-4fb9-89a9-b699b47990f5

De Medeiros, Ana Karla A, Boudewijn F Van Dongen, Wil Van der Aalst, and AJMM Weijters. 2004. Process mining for ubiquitous mobile systems: an overview and a concrete algorithm. International Workshop on Ubiquitous Mobile Information and Collaboration Systems. Second CAiSE Workshop, UMICS 2004, Riga, Latvia, Revised Selected Papers 2 (pp. 151-165). Springer Berlin Heidelberg.

De Weerdt, Jochen, Manu De Backer, Jan Vanthienen, and Bart Baesens. 2011. A robust F-measure for evaluating discovered process models. 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE.

Emamjome, Fahame, Robert Andrews, Arthur ter Hofstede, and Hajo Reijers. 2020. Alohomora: Unlocking data quality causes through event log context. Proceedings of the 28th European Conference on Information Systems (ECIS2020).

Eppler, Martin J, and Dörte Wittig. 2000. Conceptualizing Information Quality: A Review of Information Quality Frameworks from the Last Ten Years. Proceedings of the 2000 Conference on Information Quality; 2000; Cam-bridge (MA), USA.

Fani Sani, Mohammadreza, Sebastiaan J van Zelst, and Wil MP van der Aalst. 2019. The impact of event log subset selection on the performance of process discovery algorithms. European Conference on Advances in Databases and Information Systems. ADBIS 2019 Short Papers, Workshops BBIGAP, QAUCA, SemBDM, SIMPDA, M2P, MADEISD, and Doctoral Consortium, Bled, Slovenia, September 8–11, 2019, Proceedings 23 (pp. 391-404). Springer International Publishing.

Ferreira, Diogo R, and Daniel Gillblad. 2009. Discovering process models from unlabelled event logs. International Conference on Business Process Management. Ulm, Germany

Ghionna, Lucantonio, Gianluigi Greco, Antonella Guzzo, and Luigi Pontieri. 2008. Outlier detection techniques for process mining applications. International symposium on methodologies for intelligent systems. Toronto, Canada.

Helal, Iman, and Ahmed Awad. 2020. Correlating Unlabeled Events at Runtime. arXiv preprint arXiv:2004.09971.

Jabareen, Yosef. 2009. Building a conceptual framework: philosophy, definitions, and procedure. International journal of qualitative methods 8 (4): 49-62.

Laranjeiro, Nuno, Seyma Nur Soydemir, and Jorge Bernardino. 2015. A survey on data quality: classifying poor data. 2015 IEEE 21st Pacific rim international symposium on dependable computing (PRDC). IEEE.

Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst. 2013. Discovering block-structured process models from event logs-a constructive approach. International conference on applications and theory of Petri nets and concurrency. Berlin Heidelberg.

Li, Feng. 2020. Leading digital transformation: three emerging approaches for managing the transition. International Journal of Operations & Production Management 40 (6): 809-817.

Lopes, Iezalde F, and Diogo R Ferreira. 2019. A survey of process mining competitions: the BPI challenges 2011–2018. International Conference on Business Process Management. Vienna, Austria.

Loshin, David. 2010. The practitioner's guide to data quality improvement Burlington: Morgan Kaufmann.

Lu, Ke, Xianwen Fang, Na Fang, and Esther Asare. 2021. Discovery of effective infrequent sequences based on maximum probability path. Connection Science 34 (1): 63-82.

Lu, Xixi, Dirk Fahland, Frank van den Biggelaar, and Wil van der Aalst. 2016. Handling duplicated tasks in process discovery by refining event labels. International Conference on Business Process Management. Rio de Janeiro, Brazil

.Lu, Xixi, Dirk Fahland, and Wil MP van der Aalst. 2016. In Proceedings of the BPM Demo Track 2016 Co-located with the 14th International Conference on Business Process Management (BPM 2016), Rio de Janeiro, Brazil, 21 September 2016; pp. 44–49.

Ly, Linh Thao, Conrad Indiono, Jürgen Mangler, and Stefanie Rinderle-Ma. 2012."Data transformation and semantic log purging for process mining. International Conference on Advanced Information Systems Engineering. Gdansk, Poland.

Mannhardt, Felix. 2016. Sepsis Cases - Event Log. edited by 4TU.ResearchData. https://doi.org/10.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460

Mannhardt, Felix. 2017. Hospital Billing - Event Log. edited by 4TU.ResearchData.https://doi.org/10.4121/uuid:76c46b83-c930-4798-a1c9-4be94dfeb741

Munoz-Gama, Jorge, and Josep Carmona. 2010. A fresh look at precision in process conformance. International Conference on Business Process Management. Hoboken, NJ, USA. Pyle, Dorian. 1999. Data preparation for data mining. San Fransisco: Morgan Kaufmann.

Rocco, Tonette S, and Maria S Plakhotnik. 2009. Literature reviews, conceptual frameworks, and theoretical frameworks: Terms, functions, and distinctions. Human Resource Development Review 8 (1):120-130.

Sadeghianasl, Sareh, Arthur HM ter Hofstede, Moe T Wynn, and Suriadi Suriadi. 2019. A contextual approach to detecting synonymous and polluted activity labels in process event logs. OTM Confederated International Conferences On the Move to Meaningful Internet Systems. Rhodes, Greece.

Sani, Mohammadreza Fani. 2020. Preprocessing Event Data in Process Mining. CAiSE (Doctoral Consortium). Grenoble, France.

_____, Mathilde Boltenhagen, and Wil van der Aalst. 2019. Prototype selection based on clustering and conformance metrics for model discovery. arXiv preprint arXiv:1912.00736.

Sani, Mohammadreza Fani, Sebastiaan J van Zelst, and Wil MP van der Aalst. 2017. Improving process discovery results by filtering outliers using conditional behavioural probabilities. International Conference on Business Process Management.

_____. 2018a. Applying sequence mining for outlier detection in process mining. OTM Confederated International Conferences" On the Move to Meaningful Internet Systems. Barcelona, Spain.

_____. 2018b. Repairing outlier behaviour in event logs. International Conference on Business Information Systems. Berlin, Germany.

Sim, Sunghyun, Hyerim Bae, and Yulim Choi. 2019. Likelihood-based multiple imputation by event chain methodology for repair of imperfect event logs with missing data. 2019 International Conference on Process Mining (ICPM). IEEE.

Succar, Bilal. 2009. Building information modelling framework: A research and delivery foundation for industry stakeholders. Automation in construction 18 (3):357-375.

Suriadi, Suriadi, Robert Andrews, Arthur HM ter Hofstede, and Moe Thandar Wynn. 2017. Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs. Information systems 64: 132-150.

Tax, Niek, Natalia Sidorova, and Wil MP van der Aalst. 2019. Discovering more precise process models from event logs by filtering out chaotic activities. Journal of Intelligent Information Systems 52 (1): 107-139.

Van Der Aalst, Wil. 2016. Process Mining, Data science in action, Process mining. Heidelberg: Springer.

_____, Arya Adriansyah, Ana Karla Alves De Medeiros, Franco Arcieri, Thomas Baier, Tobias Blickle, Jagadeesh Chandra Bose, Peter Van Den Brand, Ronald Brandtjen, and Joos Buijs. 2011. Process mining manifesto. International conference on business process management. Berlin Heidelberg.

Van der Aalst, Wil, Ton Weijters, and Laura Maruster. 2004. Workflow mining: Discovering process models from event logs. IEEE transactions on knowledge and data engineering 16 (9): 1128-1142.

Van Dongen, Boudewijn. 2012. BPI Challenge 2012, Event log of a loan application process. edited by 4TU.ResearchData.https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f

Verhulst, Rick. 2016. Evaluating quality of event data within event logs: an extensible framework. Master’s thesis, Eindhoven University of Technology.

Wang, Richard Y, and Diane M Strong. 1996. Beyond accuracy: What data quality means to data consumers. Journal of management information systems 12 (4): 5-33.

Weijters, AJMM, Wil MP van Der Aalst, and AK Alves De Medeiros. 2006. Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP 166: 1-34.

Wynn, Moe Thandar, and Shazia Sadiq. 2019. Responsible process mining-a data quality perspective. International Conference on Business Process Management. Vienna, Austria.

Yi, Guo, and Zhang Peng. 2019. Novel Approach to Discover Precise Process Model by Filtering out Log Chaotic Activities. Journal of Computers 30 (4): 140-150.

Zamora-Polo, Francisco, Amalia Luque Sendra, Francisco Aguayo-Gonzalez, and Jesus Sanchez-Martin. 2019. Conceptual framework for the use of building information modeling in engineering education. International Journal of Engineering Education 35 (3): 744-755.