Analyzing Civil Servants data using Unsupervised Learning techniques
This report leverages unsupervised learning techniques, specifically Principal Component Analysis (PCA) and k-means clustering algorithms, to explore the annual declarations of personal interest and investment data from Mongolian civil servants[1]. The dataset, originally scraped from a dynamic web platform and collected for investigative journalism training by the Mongolian Data Club, contains comprehensive information from 2016 to 2021. For this analysis, data from the year 2021 was utilized. PCA was employed to reduce the dimensionality of the dataset, capturing approximately 81.64% of the total variance with the first 16 components. Subsequently, k-means clustering was applied to the principal components, identifying distinct clusters within the data. This approach aimed to uncover underlying trends and group similar declarations, a quantification task that has not previously been attempted on this dataset.
