SciELO - Scientific Electronic Library Online

 
vol.35Application of Machine Learning Classification to Detect Fraudulent E-wallet Deposit Notification SMSes author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

    Related links

    • On index processCited by Google
    • On index processSimilars in Google

    Share


    The African Journal of Information and Communication

    On-line version ISSN 2077-7213Print version ISSN 2077-7205

    Abstract

    ADEGBITE, Adewuyi Adetayo  and  KOTZE, Eduan. Detection of GenAI-produced and student-written C# code: A comparative study of classifier algorithms and code stylometry features. AJIC [online]. 2025, vol.35, pp.1-20. ISSN 2077-7213.  https://doi.org/10.23962/ajic.i35.21309.

    The prevalence of students using generative artificial intelligence (GenAI) to produce program code is such that certain courses are rendered ineffective because students can avoid learning the required skills. Meanwhile, detecting GenAI code and differentiating between GenAI-produced and human-written code are becoming increasingly challenging. This study tested the ability of six classifier algorithms to detect GenAI C# code and to distinguish it from C# code written by students at a South African university. A large dataset of verified student-written code was collated from first-year students at South Africa's University of the Free State, and corresponding GenAI code produced by Blackbox.AI, ChatGPT and Microsoft Copilot was generated and collated. Code metric features were extracted using modified Roslyn APIs. The data was organised into four sets with an equal number of student-written and AI-generated code, and a machine-learning model was deployed with the four sets using six classifiers: extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), support vector machine (SVM), AdaBoost, random forest, and soft voting (with XGBoost, KN N and SVM as inputs). It was found that the GenAI C# code produced by Blackbox.AI, ChatGPT, and Copilot could, with a high degree of accuracy, be identified and distinguished from student-written C# code through use of the classifier algorithms, with XGBoost performing strongest in detecting GenAI code and random forest performing best in identification of student-written code.

    Keywords : C# code; generative AI (GenAI) code; student-written code; machine-learning; code classification; code stylometry features.

            · text in English     · English ( pdf )