SciELO - Scientific Electronic Library Online

 
vol.35Application of Machine Learning Classification to Detect Fraudulent E-wallet Deposit Notification SMSes índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

    Links relacionados

    • Em processo de indexaçãoCitado por Google
    • Em processo de indexaçãoSimilares em Google

    Compartilhar


    The African Journal of Information and Communication

    versão On-line ISSN 2077-7213versão impressa ISSN 2077-7205

    Resumo

    ADEGBITE, Adewuyi Adetayo  e  KOTZE, Eduan. Detection of GenAI-produced and student-written C# code: A comparative study of classifier algorithms and code stylometry features. AJIC [online]. 2025, vol.35, pp.1-20. ISSN 2077-7213.  https://doi.org/10.23962/ajic.i35.21309.

    The prevalence of students using generative artificial intelligence (GenAI) to produce program code is such that certain courses are rendered ineffective because students can avoid learning the required skills. Meanwhile, detecting GenAI code and differentiating between GenAI-produced and human-written code are becoming increasingly challenging. This study tested the ability of six classifier algorithms to detect GenAI C# code and to distinguish it from C# code written by students at a South African university. A large dataset of verified student-written code was collated from first-year students at South Africa's University of the Free State, and corresponding GenAI code produced by Blackbox.AI, ChatGPT and Microsoft Copilot was generated and collated. Code metric features were extracted using modified Roslyn APIs. The data was organised into four sets with an equal number of student-written and AI-generated code, and a machine-learning model was deployed with the four sets using six classifiers: extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), support vector machine (SVM), AdaBoost, random forest, and soft voting (with XGBoost, KN N and SVM as inputs). It was found that the GenAI C# code produced by Blackbox.AI, ChatGPT, and Copilot could, with a high degree of accuracy, be identified and distinguished from student-written C# code through use of the classifier algorithms, with XGBoost performing strongest in detecting GenAI code and random forest performing best in identification of student-written code.

    Palavras-chave : C# code; generative AI (GenAI) code; student-written code; machine-learning; code classification; code stylometry features.

            · texto em Inglês     · Inglês ( pdf )