SciELO - Scientific Electronic Library Online

 
vol.35 issue2Using Summary Layers to Probe Neural Network BehaviourAlgorithmic definitions for KLM-style defeasible disjunctive Datalog author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Article

Indicators

Related links

  • On index processCited by Google
  • On index processSimilars in Google

Share


South African Computer Journal

On-line version ISSN 2313-7835
Print version ISSN 1015-7999

SACJ vol.35 n.2 Grahamstown Dec. 2023

http://dx.doi.org/10.18489/sacj.v35i2.17444 

VIEWPOINT

 

Natural Language-Driven Dialogue Systems for Support in Physical Medicine and Rehabilitation

 

 

Vladislav KaverinskyI; Kyrylo MalakhovII

IFrantsevic Institute for Problems in Material Science of the National Academy of Sciences of Ukraine, Kyiv, Ukraine. insamhlaithe@gmail.com
IIMicroprocessor technology lab, Glushkov Institute of Cybernetics of the National Academy of Sciences of Ukraine, Kyiv, Ukraine. malakhovks@nas.gov.ua (corresponding)

 

 


ABSTRACT

This paper presents a natural language-driven dialogue system designed to support healthcare professionals and students in the field of physical medicine and rehabilitation. The system seamlessly integrates concepts from intelligent information systems, data mining, ontologies, and human-computer interaction, employing at its core a rule-based dialogue mechanism. The system harnesses the power of ontology-based graph knowledge, underscoring its domain-specific efficacy. This article delves into the automated knowledge base formation, utilising Python scripts to translate EBSCO's dataset of articles on physical medicine and rehabilitation into an OWL ontology. This methodology ensures adaptability to the ever-evolving landscape of medical insights. The system's approach to natural language processing encompasses text preprocessing, semantic category discernment, and SPARQL query creation, providing 26 predefined categories. As an innovation in performance optimisation, the system integrates a strategy to cache precomputed responses using a PostgreSQL database, which aids in resource conservation and reduction in query execution latency. The system's user engagement avenues are further detailed, showcasing a Telegram bot and an API, enhancing accessibility and user experience. In essence, this article illuminates an advanced, efficient dialogue system for physical medicine and rehabilitation, synthesising multiple computational paradigms, and standing as a beacon for healthcare practitioners and students alike.
Categories · Artificial intelligence ~ Natural language processing, Discourse, dialogue and pragmatics

Keywords: Ontology engineering, Ontology learning, Knowledge management, Knowledge base, SPARQL, Natural Language-Driven Dialogue System, Human-Computer interaction, MedRehabBot


 

 

1 INTRODUCTION

Natural language-driven dialogue systems, colloquially known as chatbots, have evolved through a rich historical lineage encompassing diverse methodologies. Their appeal spans various sectors due to their ability to offer a user-friendly interface, particularly beneficial for users unfamiliar with intricate database queries or programming. Such systems allow users to pose questions or describe problems effortlessly, receiving detailed responses that not only encompass textual information but can also include tables, illustrations, and multimedia content. Given their capabilities and versatility, it is evident why there is a burgeoning interest in these dialogue systems, driving continuous advancements in the field.

 

2 AN OVERVIEW OF DEVELOPMENT APPROACHES AND TRENDS IN NATURAL LANGUAGE-DRIVEN DIALOGUE SYSTEMS

In the realm of contemporary dialogue systems, there has been a notable emergence of those harnessing computer ontologies. For instance, an analysis of a natural language-driven dialogue system tailored for English (Quamar, Lei et al., 2020; Quamar, Özcan et al., 2020) suggests that English sentences generally adhere to a consistent structural pattern, which can be represented using predefined templates. These templates consist of a static segment indicating the primary semantic intention and variable elements identifying the corresponding conceptual placeholders. These placeholders are carefully designed to match the anticipated intentions of the expected conceptual entities.

Another noteworthy system rooted in natural language transformation using SPARQL techniques, is FREyA (Damljanovic et al., 2012). Accessible publicly on GitHub (Kumar, 2022), FREyA offers an interactive interface for querying ontology-based databases. It combines parsing with ontology searches to interpret user queries, and when required, prompts users for added clarity. The system also improves its performance based on user interactions, enhancing query accuracy. Currently, FREyA is designed specifically for English, with the GitHub repository (Kumar, 2022) illustrating how it translates natural language queries into SPARQL. Crucially, FREyA's design is flexible enough to fit different ontology architectures.

In today's natural language-driven dialogue system landscape, large language models (LLM), like ChatGPT (OpenAI, 2023), are prominent. Despite their unparalleled capabilities, their extensive computational requirements and the necessity for specialised training render them not innately optimised for specific domains. As such, rule-based models maintain their significance and can be synergistically paired with LLMs for enhanced analytical power (Palagin et al., 2023).

In previous research (Litvin et al., 2023), we developed an ontology-driven dialogue system for the medical domain. The ontology used differed in structure and methodology, focusing on a detailed semantic evaluation of text, especially named entities and their relations. For texts with a consistent format, operating at an elevated abstraction level is viable. That research employed Neo4J as the preferred graph database system, using the Cypher query language. Yet, the more traditional Jena Fuseki and SPARQL queries are worthy of consideration for ontology-centric knowledge systems.

In summary, this paper offers an academic insight into recent natural language-driven dialogue system innovations, underscoring their adaptability across varied disciplines. We highlight the enduring relevance of rule-based models amidst the rise of LLMs, advocating for a balanced understanding of their roles in specific contexts. The article also emphasises the evolution of ontology-based systems to address varying textual structures, promoting deeper comprehension and user-friendly interactions. We advocate for further research into knowledge systems leveraging different technologies, ensuring methodologies are tailored to the unique needs of individual domains.

 

3 AUTOMATIC KNOWLEDGE BASE FORMATION TECHNIQUE

The natural language-driven dialogue system detailed herein harnesses an ontology-based graph knowledge base (Palagin et al., 2014; Palagin et al., 2018) and notably refrains from integrating neural network models. Opting instead for intent and entity detection in user input, it pinpoints these based on specific marker word lists. This methodology has demonstrated robust efficacy within the realm of physical medicine and rehabilitation. Not only does it ensure rapid response times, but it is also more resource-efficient than LLMs.

The foundational layer of this system, its knowledge base, is auto-generated using a dataset of EBSCO articles (Malakhov et al., 2023) encompassing physical medicine and rehabilitation domain knowledge. This dataset spans 1 013 PDF files, cumulatively accounting for 192 MB. All articles are penned in English. A distinct hallmark of this automated knowledge base generation lies in the standardised and predefined structure of these files, which acts as a guiding blueprint for the software. Custom Python scripts have been crafted to streamline the knowledge base's inception, shaping it as an OWL ontology in RDF/XML notation. This transformation unfolds in a bifurcated process. Initially, textual data is mined from the PDFs, whereupon its content - encompassing chapters and topics - is automatically structured into defined JSON configurations. As a result, a suite of JSON files emerges, each mirroring its source PDF and delineating its organised content.

Subsequent to this, the second phase instigates the creation of an OWL ontology, drawing from the amassed JSON configurations. The inherent hierarchy of the JSON dictionary keys provides the architecture for the OWL class system. Meanwhile, the paired context values evolve into designated individuals nested within their pertinent classes. Each article's file name undergoes metamorphosis into a named individual under the Articles class. An intrinsic OWL property, termed Relate to article, forges linkages between these contexts and their affiliated articles. Identified named entities within these contexts further evolve into designated individuals, housed within the Word class. These are then interconnected to their relevant contexts via the Relate to context OWL property. This intricate design permits the extraction of specific contexts from the ontology via SPARQL queries. Given the voluminous nature of the resultant knowledge base, it has been fragmented into ten segments, facilitating simultaneous querying.

 

4 THE PROPOSED DIALOGUE SYSTEM OPERATING DESCRIPTION

A rule-based natural language-driven dialogue system has been devised to cater to medical rehabilitation support. Primarily designed for medical professionals and students in the physical medicine and rehabilitation domain, it stands as an invaluable informational asset.

The system, equipped with a natural language user interface, also offers an API to facilitate interactions with other applications through specifically structured POST requests. At present, users can engage with the system through a Telegram bot.

The backbone algorithms that govern the system's functionality reside within a Python server application. Apache Jena Fuseki (triplestore and SPARQL server) is the chosen tool for interfacing with the ontological knowledge base. Key modules in the Python application include:

preprocess_input.py - This performs an initial analysis of the user's message to discern the inherent semantic categories (intents).

form_queries.py - Tasked with crafting SPARQL query packages.

process_queries.py - Overseeing the execution of queries on the Apache Jena Fuseki triplestore and interpreting their outcomes.

processor.py - It harmonises the operations of the aforementioned modules, ensuring a seamless transition from input reception to response formulation.

Additionally, the system's architecture integrates:

webhook.py - Serving to establish a connection with the Telegram messaging platform.

api.py - Detailing the API functions for the system.

Upon receiving a user's text input - either via the user interface or the API - the initial step involves purging it of any alien characters. The NLTK library then assists in breaking the text into individual word tokens which are subsequently lemmatised. Stop words, along with words absent in the physical medicine and rehabilitation domain knowledge base, are eliminated. What remains is a condensed list of semantically potent words.

The subsequent stage focuses on identifying and cataloguing specific semantic categories present in the user's message. Each category relates to a designated SPARQL query template in the system. As of now, 26 such categories exist, although the system's design facilitates future expansions.

The system's ability to pinpoint a semantic category is hinged on the detection of certain marker words from a pre-defined list. Each category-to-marker-word correlation is laid out in the marker_words.json file. The system's mechanism evaluates the input word list against this repository to assign a category.

The form_queries.py module emerges as the cornerstone in moulding SPARQL queries. Every distinct semantic category corresponds to a tailored SPARQL query template, all of which are stored in the query_templates.json file.

The Apache Jena Fuseki triplestore, in concert with the processy_queries.py module, oversees the execution of SPARQL queries. Recognising the intensive nature of this operation, the system distributes the queries across multiple threads to enhance efficiency. The outcomes of these queries, represented as JSON tables, may be sourced from one or several ontology sections.

However, the raw response obtained from Apache Jena Fuseki is not ideally structured for direct user presentation. The module, therefore, reshapes this data into a more digestible, tree-like structure following instructions detailed in the query templates outputs section.

To counteract the time-intensive nature of SPARQL queries, the system incorporates a cache mechanism, utilising a PostgreSQL-managed relational database to store precomputed responses.

For a more direct user interaction, a Telegram bot, MedRehabBot - @MedicalRehab-Bot (Kaverinsky & Malakhov, 2023a), has been introduced. The webhook.py module makes this engagement possible, leaning on the telebot package to communicate with Telegram's API.

Complementing the dialogue interface, the system presents a programmatic interface, articulated in the api.py module. Though it functions independently of webhook.py, it utilises the system's core modules. POST processing of API requests is orchestrated by the Flask framework. Upon server deployment, the standard server transitions to Gunicorn.

 

5 CONCLUSIONS

This paper introduces a comprehensive framework for the natural language-driven dialogue system, meticulously designed to offer vital support in the physical medicine and rehabilitation domain. Our holistic approach is an amalgamation of diverse disciplines such as intelligent information systems, data mining, ontologies, and human-computer interaction, culminating in a groundbreaking tool tailored for both healthcare professionals and students in the physical medicine and rehabilitation field (Malakhov, 2023a; Palagin, Malakhov, Velychko, Semykopna & Shchurov, 2022).

Central to our system is the knowledge base formation process. Harnessing the power of custom Python scripts and a vast collection of articles centred on physical medicine and rehabilitation from EBSCO, we have championed an auto-generation technique for creating an OWL ontology in RDF/XML format. Beyond streamlining the adaptability of the system to ever-evolving medical insights, this method underpins a robust data-driven decision support infrastructure.

A pivotal component of our system is its prowess in natural language processing. Incorporating stages like text preprocessing, discerning semantic categories, and crafting SPARQL queries, our design boasts 26 predefined categories, with built-in flexibility for future expansion.

By harnessing the Apache Jena Fuseki triplestore and SPARQL server for swift and efficient SPARQL query execution - enhanced further with multi-threading - we guarantee prompt retrieval of pivotal information. Furthermore, our system adeptly refines query outputs into a cohesive, hierarchical format, maximising both user interaction experience and API response efficacy.

 

6 ACKNOWLEDGEMENTS

The research team at the Glushkov Institute of Cybernetics extends heartfelt gratitude to Kath-erine Malan, the esteemed editor-in-chief of the South African Computer Journal. We deeply appreciate her steadfast commitment and dedication to advancing Ukrainian scientific endeavors during challenging wartime conditions, ensuring our contributions reach a global scholarly audience.

 

7 FUNDING

This study would not have been possible without the financial support of the National Research Foundation of Ukraine (Open Funder Registry: 10.13039/100018227). Our work was funded by Grant contract (application ID: 2021.01/0136):

Development of the cloud-based platform for patient-centered telerehabilitation of oncology patients with mathematical-related modeling (Malakhov, 2022, 2023b; Palagin, Malakhov, Velychko & Semykopna, 2022; Stetsyuk et al., 2023).

 

8 DATA AVAILABILITY

The data that support the findings of this study are derived from multiple sources, ensuring a comprehensive and detailed foundation for the Natural Language-Driven Dialogue System.

EBSCO Articles Dataset:

- Domain Knowledge: This dataset specifically pertains to rehabilitation medicine. With a collection of articles dedicated to this medical specialisation, it provides a rich source of knowledge that creates the system's core.

- Data Format: Every article from this dataset has been meticulously processed and represented in a structured JSON format. This ensures uniformity and ease of integration with the system's architecture.

- Publicly available: via Zenodo (Malakhov et al., 2023)

MedRehabBot:

- Description: MedRehabBot serves as an interactive reference system tailor-made for Physical Rehabilitation & Telerehabilitation. It caters to a broad audience, including Therapists, Students, and Patients, aiming to provide support and information.

- Functionality: As a pivotal component of our study, MedRehabBot is more than just a chatbot. It leverages the knowledge from the aforementioned EBSCO articles dataset and integrates with the dialogue system, ensuring real-time, relevant responses to users' queries.

- Publicly available: via GitHub (Kaverinsky & Malakhov, 2023b)

 

9 DECLARATION OF COMPETING INTEREST

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

 

References

Damljanovic, D., Agatonovic, M., & Cunningham, H. (2012). FREyA: An interactive way of querying linked data using natural language. In R. García-Castro, D. Fensel & G. Ant-oniou (Eds.), The semantic web: ESWC 2011 workshops (pp. 125-138). Springer. https://doi.org/10.1007/978-3-642-25953-1_11

Kaverinsky, V., & Malakhov, K. (2023a, September). MedRehabBot. https://t.me/MedicalRehabBot

Kaverinsky, V., & Malakhov, K. (2023b, September). MedRehabBot. https://github.com/knowledge-ukraine/MedRehabBot

Kumar, V. (2022, December). FREyA. https://github.com/nmvijay/freya

Litvin, A., Palagin, O., Kaverinsky, V., & Malakhov, K. (2023). Ontology-driven development of dialogue systems. SACJ, 35(1), 37-62. https://doi.org/10.18489/sacj.v35i1.1233        [ Links ]

Malakhov, K. (2022). Letter to the editor - update from Ukraine: Rehabilitation and research. International Journal of Telerehabilitation, 14(2), 1-2. https://doi.org/10.5195/ijt.2022.6535        [ Links ]

Malakhov, K. (2023a). Insight into the digital health system of Ukraine (eHealth): Trends, definitions, standards, and legislative revisions. International Journal of Telerehabilitation, 15(2). https://doi.org/10.5195/ijt.2023.6599        [ Links ]

Malakhov, K. (2023b). Letter to the editor - update from Ukraine: Development of the cloud-based platform for patient-centered telerehabilitation of oncology patients with mathematical-related modeling. International Journal of Telerehabilitation, 15(1). https://doi.org/10.5195/ijt.2023.6562        [ Links ]

Malakhov, K., Vakulenko, D., & Kaverinsky, V. (2023, September). EBSCO articles dataset (domain knowledge: Rehabilitation medicine) + JSON of every article. https://doi.org/10.5281/ZENODO.8308214

OpenAI. (2023, March). GPT-4 technical report (tech. rep.) (arXiv:2303.08774 [cs]). OpenAI. arXiv. https://doi.org/10.48550/arXiv.2303.08774

Palagin, O., Malakhov, K., Velychko, V., Semykopna, T., & Shchurov, O. (2022). Hospital information smart-system for hybrid e-rehabilitation. CEUR Workshop Proceedings, 3501, 140-157. https://ceur-ws.org/Vol-3501/s50.pdf        [ Links ]

Palagin, O., Kaverinsky, V., Litvin, A., & Malakhov, K. (2023). OntoChatGPT information system: Ontology-driven structured prompts for ChatGPT meta-learning. International Journal of Computing, 22(2), 170-183. https://doi.org/10.47839/ijc.22.2.3086        [ Links ]

Palagin, O., Malakhov, K., Velychko, V., & Semykopna, T. (2022). Hybrid e-rehabilitation services: SMART-system for remote support of rehabilitation activities and services. International Journal of Telerehabilitation, Special Issue (Research Status Report - Ukraine). https://doi.org/10.5195/ijt.2022.6480

Palagin, O., Petrenko, M., Velychko, V., & Malakhov, K. (2014). Development of formal models, algorithms, procedures, engineering and functioning of the software system "Instrumental complex for ontological engineering purpose". CEUR Workshop Proceedings, 1843, 221-232. http://ceur-ws.org/Vol-1843/221-232.pdf        [ Links ]

Palagin, O., Velychko, V., Malakhov, K., & Shchurov, O. (2018). Research and development workstation environment: The new class of current research information systems. CEUR Workshop Proceedings, 2139, 255-269. http://ceur-ws.org/Vol-2139/255-269.pdf        [ Links ]

Quamar, A., Lei, C., Miller, D., Ozcan, F., Kreulen, J., Moore, R. J., & Efthymiou, V. (2020). An ontology-based conversation system for knowledge bases. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 361-376. https://doi.org/10.1145/3318464.3386139

Quamar, A., Özcan, F., Miller, D., Moore, R. J., Niehus, R., & Kreulen, J. (2020). Conversational BI: An ontology-driven conversation system for business intelligence applications. Proceedings of the VLDB Endowment, 13(12), 3369-3381. https://doi.org/10.14778/3415478.3415557        [ Links ]

Stetsyuk, P. I., Fischer, A., & Khomiak, O. M. (2023). Unified representation of the classical ellipsoid method. Cybernetics and Systems Analysis, 59(5), 784-793. https://doi.org/10.1007/s10559-023-00614-x        [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License