Skip to main content

Hi, this is Koray

Loves Computing and Data Science

MSc thesis

Partial RDF Schema Retrieval

[ Computer Science, Data Science ]

Natural Science

[ Paper | Code ]

"There are various data structures that represent data interrelationships in the universe of information. One is a graph-based data structure, which depicts a collection of entities connected by relationships. Resource Description Framework (RDF) is a widely used data model that facilitates the storage of graph-based data. This system, unlike standardised SQL, lacks a consistent schema that evolves over time. When presenting a complete schema is crucial, the loose standards combined with timeout limits in the retrieval process pose a challenge. The objective of this master's thesis is therefore to develop a partial schema retrieval pipeline in order to solve the previously outlined problem. We evaluate the quality of our approach by measuring performance and completeness. This is conducted by running the pipeline against several SPARQL-endpoints. The pipeline lays the foundation for retrieving partial graph schemas per iteration. The result is a rendered set of visualisations of partial schemas displayed in a hierarchical aggregated view. This should provide the ability to iteratively express portion of a graph, regardless of the evolving schema."

[ Schema Discovery, High-dimensional, Graph Embedding, Unsupervised/ Manifold Learning ]

Graph representations of (left) DBPedia and (right) WikiData at the 10th iteration. Image by Author

Side projects

AI-powered compression as latent space representations

[Data Science, Computer Science]

In a world of compression without storing original images, latent space representations are all you need | Pre-training VQ-VAE Net | VAE Tiny | 8bit space
Reducing the dimensionality of embeddings by utilizing Matryoshka Representation Learning

[Data Science, Computer Science]

MRL is a scaling solution to reduce the dimensionality of embeddings by truncation. The solution introduces embeddings of various sizes keeping the most important features in earlier dimensions compared to later dimensions. This results in maintaining the most important features while remaining high quality and cheaper embeddings as used by OpenAI.
Deep Convolutional Neural Network from scratch

[Computer Science]

Mimicking biological neural computation in analog image processing by utilizing a constructed Deep Convolutional Net
Semantic search powered with multilingual language model

[Data Science, Computer Science]

Fast semantic neural search powered with multilingual language model and Qdrant vector db. | Text Embedding | Encoder-only transformer
Analysis of emotions in movie trailers over time utilizing Deep Learning models

[Data Science]

This repository depicts the operationalization of analyzing the occurrence of emotions within movie trailers over time utilizing Object, Face and Emotion Detection Models
Time series Forecasting using Deep Learning

[Data Science]

The objective of this repo is to design, train and evaluate multiple forecast models utilizing deep neural nets (e.g. ConvNet, RNN Long-Short Term Memory) in order to forecast Energy Consumption
Analysis and classification of certain music genre utilizing Log-Likelihood Ratio and Logistic Regression

[Data Science]

This repository depicts the operationalization of analyzing and classifying certain music genre and others utilizing Log-likelihood Ratio and Logistic Regression
Gender biases in TV show discussions utilizing Word2Vec Neural Network

[Data Science]

This repository depicts the operationalization of analyzing biasness towards (fe)male within TV show discussions (The Witcher and Ozark) by utilizing NLP and Word2Vec Neural Network
CallFlow Visualizer

[Computer Science]

A web component which gives the possibility to visualize the structure of various CallFlows in an interactive network graph
Prominent Topics in CNN and FOXNEWS tweets utilizing a Generative Probabilistic model

[Data Science]

This repository depicts the operationalization of analyzing the topics that are prominent in the distribution of CNN and FOXNEWS tweets by utilizing NLP and LDA mallet model (Latent Dirichlet Allocation)
TV-content Recommendation System

[Data Science, Computer Science]

A Personalized Recommendation System powered by a Topic Model (Latent Dirichlet allocation model) for content-based filtering and AI-powered search along with custom algorithms taking into account public values such as transparency, fairness, diversity, social cohesion and serendipity. In addition, user behavior (such as logs; implicit data) has been simulated for metric purposes.
Anomaly Detection System

[Data Science, Software Engineering]

An anomaly detection system (ADS) using AutoEncoder Neural Net developed with Tensorflow and hosted within the framework of Streamlit. ADS has integrated default data sets (Electrocardiogram & Creditcard fraud) that can be utilized for analysis and training purposes.
Data Mining in Text, Images and Video

[Data Science]

Cultural analytics concerns the use of data and digital methods for the inquiry of (large) cultural corpora. Using text mining and statistical analysis with Python, exploratory programming and digital methods, will employ data mining techniques on text, image or video archives. The objective is to develop insights into patterns of cultural production and media formats.
Speech to Phonemes

[Data Science]

Patients that suffer from Aphasia have difficulty comprehending and/or formulating language. The cause is usually brain damage in the language center. Recovery from Aphasia is usually never 100%, and rehab can take years but does help the patients. Regardless, having Aphasia is usually very stressful for the patients even during rehabilitation sessions. Specialists from the Rijndam rehabilitation institute in Rotterdam treat patients that suffer from Aphasia. Their impression is that the stress experienced by patients may be amplified by human-human interaction in which the patients experience the 'embarrassment' of not being able to communicate correctly. Possibly, the rehabilitation stress can be reduced by having patients do exercises on a computer rather than to talk to a person. For this project the first goal is to see if we can properly translate what Aphasia patients say to text and identify where they likely make mistakes in their language.
NRC Scraper api

[Software Engineering]

A simple NRC Scraper API written in GO with Colly (scraping framework) and Gin (web framework).
NRC news app

[Software Engineering]

A simple NRC Android app which makes use of the NRC Scraper API for populating the front end.