Detection of GenAI-produced and student-written C# code: A comparative study of classifier algorithms and code stylometry features

Adegbite, Adewuyi Adetayo; Kotzé, Eduan

doi:10.23962/ajic.i35.21309

Services on Demand

Journal

Article

Indicators

The African Journal of Information and Communication

On-line version ISSN 2077-7213Print version ISSN 2077-7205

Abstract

ADEGBITE, Adewuyi Adetayo and KOTZE, Eduan. Detection of GenAI-produced and student-written C# code: A comparative study of classifier algorithms and code stylometry features. AJIC [online]. 2025, vol.35, pp.1-20. ISSN 2077-7213. https://doi.org/10.23962/ajic.i35.21309.

The prevalence of students using generative artificial intelligence (GenAI) to produce program code is such that certain courses are rendered ineffective because students can avoid learning the required skills. Meanwhile, detecting GenAI code and differentiating between GenAI-produced and human-written code are becoming increasingly challenging. This study tested the ability of six classifier algorithms to detect GenAI C# code and to distinguish it from C# code written by students at a South African university. A large dataset of verified student-written code was collated from first-year students at South Africa's University of the Free State, and corresponding GenAI code produced by Blackbox.AI, ChatGPT and Microsoft Copilot was generated and collated. Code metric features were extracted using modified Roslyn APIs. The data was organised into four sets with an equal number of student-written and AI-generated code, and a machine-learning model was deployed with the four sets using six classifiers: extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), support vector machine (SVM), AdaBoost, random forest, and soft voting (with XGBoost, KN N and SVM as inputs). It was found that the GenAI C# code produced by Blackbox.AI, ChatGPT, and Copilot could, with a high degree of accuracy, be identified and distinguished from student-written C# code through use of the classifier algorithms, with XGBoost performing strongest in detecting GenAI code and random forest performing best in identification of student-written code.

Keywords : C# code; generative AI (GenAI) code; student-written code; machine-learning; code classification; code stylometry features.

· text in English · English (

pdf )