Asif Ahmed Neloy

Recent News

[March 2026] Joined Delta Controls Inc. as a Senior AI Engineer.
[August 2025] Led a week long MFRE bootcamp and workshop series on Python and R covering data access, visualization, and coding for economic analysis.
[April 2025] Supervised graduate students in the UBC MFRE Summer Program.
[November 2024] My paper titled "Disentangled Conditional Variational Autoencoder for Unsupervised Anomaly Detection" was accepted at the IEEE Big Data Conference (IEEE BigData 2024), Washington, D.C., December 15-18, 2024. IEEE Xplore
[July 2024] My paper titled "A Comprehensive Study of Auto-Encoders for Anomaly Detection: Efficiency and Trade-offs" was published in Machine Learning with Applications. ScienceDirect
[June 2024] Received Research Dissemination Present and Research Dissemination Publish Grant from Douglas College
[December 2023] Joined Douglas College, New Westminster Campus as a Full-time Regular Faculty Member.
[August 2023] Started my new journey as a Faculty Member at the Vancouver Island University.
[May 2023] Promoted to Machine Learning Engineer, Daris Properties Ltd.
[February 2023] Latest published conference paper - Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling
[January 2023] My MSc dissertation, Dimension Reduction and Anomaly Detection using Unsupervised Machine is now online
[November 2022] Guest Lecture, Introduction to Python and Numpy, STAT 447: Statistical Machine Learning for Data Science, Department of Mathematics and Statistics, University of Saskatchewan
[September 2022] Received Graduate Travel Award from University of Manitoba, NSERC CREATE VADA Program

Projects

01

Crosshair and Event Detection for Skill Assessment (Training Arc)

▼ Click to expand details

Problem Convert raw gameplay video into reliable, frame-level signals so player aim and reaction metrics can be computed consistently across different resolutions, HUD settings, and frame rates.

Details Built a computer-vision pipeline to extract frames, detect crosshair and targets, track events over time, and compute stable reaction and correction metrics. Implemented model inference with YOLO-style detectors, lightweight tracking, and robust timestamp alignment to prevent drift. Packaged the workflow as reproducible scripts with config-driven runs and structured JSONL outputs for downstream analytics.

Outcome Enabled repeatable measurement of aim stability, first-shot reaction, and correction behavior from VODs, producing clean artifacts that can be used for benchmarking players and validating training interventions.

Role I owned the end-to-end design and implementation, including data layout, frame timebase logic, detection and tracking integration, metric definitions, and export formats. I validated failure modes across clips and added guardrails for edge cases (missing HUD, occlusion, and partial visibility).

02

Intelligent Document Processing and OCR Pipeline

▼ Click to expand details

Problem Organizations deal with heterogeneous document formats (PDFs, scanned images, handwritten forms) requiring automated extraction and structuring of key information for downstream processing and analytics.

Details Developed an end-to-end document processing pipeline combining OCR engines (Tesseract, EasyOCR) with layout analysis and named entity recognition. Implemented preprocessing steps for deskewing, denoising, and contrast enhancement. Built structured output schemas for invoices, forms, and contracts with field-level confidence scoring. Integrated with vector stores for semantic search across extracted content.

Outcome Reduced manual data entry time significantly while maintaining high extraction accuracy. Enabled searchable document archives with semantic retrieval capabilities and automated compliance checking workflows.

Role Designed the pipeline architecture, selected and fine-tuned OCR models, implemented post-processing logic for entity extraction, and built the API layer for integration with existing document management systems.

03

Enterprise RAG System with Multi-Source Knowledge Integration

▼ Click to expand details

Problem Enterprise knowledge is scattered across wikis, documents, databases, and code repositories, making it difficult for teams to find accurate, contextual answers without extensive manual search.

Details Built a production-grade RAG (Retrieval-Augmented Generation) system with multi-source connectors for Confluence, SharePoint, GitHub, and internal databases. Implemented hybrid search combining dense embeddings (sentence-transformers) with sparse BM25 retrieval. Designed chunking strategies optimized for different content types. Added re-ranking with cross-encoders and citation generation for traceability. Deployed with FastAPI, Redis caching, and comprehensive logging for quality monitoring.

Outcome Delivered a reliable internal Q&A system with grounded, auditable responses. Reduced time-to-answer for common queries while maintaining high factual accuracy through retrieval-based grounding.

Role Architected the retrieval pipeline, implemented embedding and indexing strategies, built the API service layer, and designed the evaluation framework for measuring retrieval quality and answer accuracy.

04

NLP Pipeline for Multi-Domain Text Classification and Sentiment Analysis

▼ Click to expand details

Problem Unstructured text data from customer feedback, support tickets, and surveys contains valuable insights but requires scalable, accurate classification and sentiment extraction to be actionable.

Details Developed a modular NLP pipeline supporting multi-label text classification, aspect-based sentiment analysis, and topic modeling. Fine-tuned transformer models (BERT, RoBERTa) for domain-specific classification tasks. Implemented active learning workflows to efficiently expand training data. Built preprocessing modules for text cleaning, language detection, and PII redaction. Deployed as microservices with batch and real-time inference capabilities.

Outcome Enabled automated routing of support tickets, real-time sentiment monitoring dashboards, and trend analysis across customer feedback channels. Improved response time and customer satisfaction metrics.

Role Led model development and fine-tuning, designed the annotation guidelines and active learning strategy, implemented the inference pipeline, and built monitoring dashboards for model performance tracking.

05

Streaming Integrity-Aware Recommendation Engine

▼ Click to expand details

Problem Rank candidates reliably under noisy and incomplete signals while keeping latency low enough for real-time product use and maintaining predictable behavior under distribution shifts.

Details Designed a ranking pipeline using gradient-boosted models and embedding features, served through a FastAPI layer with caching and feature hydration. Implemented an offline replay framework for consistent evaluation (NDCG, precision at k) and added monitoring hooks to detect drift in top features. Deployed containerized services and data stores to support scalable inference.

Outcome Delivered a stable, low-latency ranking API that improved offline ranking quality and reduced operational friction through automated evaluation and monitoring.

Role I led the model and system design, built the evaluation harness, implemented the serving stack, and set up the deployment workflow. I drove the success-metric alignment and maintained a regression suite to prevent silent quality drops.

06

Lifecycle Analytics: Churn Prediction and Intervention Prioritization

▼ Click to expand details

Problem Identify at-risk users early and prioritize interventions with interpretable drivers so retention actions can be targeted and measurable, not generic.

Details Built churn models on product usage, engagement, and subscription signals with a feature pipeline that supports backtesting and leakage checks. Added explainability with SHAP for driver analysis and delivered a scoring workflow that ranks users for outreach and experimentation. Implemented periodic recalibration and monitoring dashboards to keep performance stable over time.

Outcome Produced a repeatable churn and prioritization workflow that improved decision quality for retention targeting and reduced manual review overhead through ranked, explainable outputs.

Role I owned the modeling lifecycle (data preparation, feature engineering, training, thresholding strategy, and explainability outputs). I also set up monitoring and documented playbooks for retraining and governance so the system stays operational after handoff.

07

Retrieval Q&A Assistant with Rubric-Driven Evaluation

▼ Click to expand details

Problem Teams needed fast, grounded answers from internal documents, plus a way to measure language model quality and regressions under clear evaluation criteria.

Details Built a retrieval-based assistant using embeddings and vector search, wrapped in an API with structured logging and citation-style output so responses are auditable. Implemented a rubric-driven evaluation harness with test suites, edge-case probes, and regression tracking across prompt variants. Added guardrails to detect unsafe or unsupported completions and route uncertain cases to human review.

Outcome Enabled reliable internal Q&A with traceability, and introduced a measurable evaluation loop that supports iteration without quality drift.

Role I designed the retrieval pipeline, implemented the service layer and logging, and built the evaluation framework end to end. I defined rubric interpretations and drove regression strategy so quality stays stable as prompts and models change.

Research

My work centers on anomaly detection, representation learning, and probabilistic or Bayesian modeling, with an emphasis on unsupervised methods and reproducibility. I study auto-encoder families and variational formulations for high-dimensional data, build governed analytics for population health settings, and publish practical comparisons that surface efficiency and trade-offs across model classes.

Unsupervised Anomaly Detection

Disentanglement in latent spaces, total-correlation objectives, and conditional VAEs for detecting rare structure in image and tabular data.

Representation Learning

Comparative studies of auto-encoder architectures that quantify reconstruction quality, sampling behavior, latent visualization, and classification accuracy.

Applied Health Analytics

Population-scale modeling with governance, documentation, and repeatable pipelines as part of the NSERC CREATE VADA program.

NLP & Information Retrieval

RAG systems, document processing, text classification, and evaluation frameworks for language model applications.

Teaching

University of British Columbia

Winter 2026

FRE 521D: Data Analytics in Climate, Food and Environment

Fall 2025

FRE 501: Topics in Food Market Analysis (Co-instructor)

Winter 2025

FRE 521D: Data Analytics in Climate, Food and Environment

Douglas College

Summer 2025

CSIS 3300: Database II
CSIS 3360: Fundamentals of Data Analytics
CSIS 4260: Special Topics in Data Analytics

Winter 2025

CSIS 1175: Introduction to Programming I
CSIS 2200: Systems Analysis & Design
CSIS 3860: Data Visualization

Summer 2024

CSIS 2300: Database I
CSIS 3300: Database II
CSIS 3360: Fundamentals of Data Analytics

Winter 2024

CSIS 2200: Systems Analysis & Design
CSIS 2300: Database I
CSIS 3290: Fundamentals of Machine Learning in Data Science

Vancouver Island University

Fall 2023

CSCI 251: Systems and Networks
CSCI 159: Computer Science I
CSCI 112: Applications Programming

University of Manitoba

Winter 2023

DATA 2010: Tools and Techniques for Data Science

Fall 2022

COMP 3490: Computer Graphics 1

Guest Lectures and Seminar Presentations

Invited Sessions

ICSA-Canada Chapter 2022 Symposium, Banff Center, Banff, Alberta, Canada.
Topic: Auto-encoders for Anomaly Detection: Efficiency and Trade-Offs.

Lectures

Introduction to Machine Learning, North South University, Dhaka, Bangladesh.

Courses: STAT 447: Statistical Machine Learning for Data Science

Publications

See my Google Scholar profile for the most recent publications.

Pairwise comparison of RAG evaluation frameworks

A Meta-Analysis of Evaluation Framework Reliability and Cross-Domain Generalization

Asif Ahmed Neloy, Md Nazmul Islam

Proceedings of the 39th Canadian Conference on Artificial Intelligence, PMLR 318:796-811, 2026

PMLR GitHub

First systematic meta-analysis comparing 20 RAG evaluation frameworks across three knowledge domains, revealing extreme inter-framework heterogeneity and three distinct methodological clusters.

Disentangled Conditional Variational Autoencoder for Unsupervised Anomaly Detection

Asif Ahmed Neloy*, Maxime Turgeon

IEEE International Conference on Big Data (IEEE BigData 2024), 2024

GitHub IEEE Xplore

A novel generative architecture combining beta-VAE, CVAE, and total correlation to enhance feature disentanglement and improve anomaly detection in high-dimensional datasets.

A Comprehensive Study of Auto-Encoders for Anomaly Detection: Efficiency and Trade-offs

Asif Ahmed Neloy*, Maxime Turgeon

Machine Learning with Applications, 2024

Project Page DOI

Systematic review of 11 Auto-Encoder architectures, analyzing reconstruction ability, sample generation, latent space visualization, and anomaly classification accuracy.

Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling

Asif Ahmed Neloy*, Maxime Turgeon

ICDM, 2022

Project Page IEEE

Two-stage modeling approach combining stacked ensemble classifiers with CNN and bidirectional RNN for NLP problems in real-world datasets.

Python Packages

Data Scaler Selector Active

Open-source Python library to select the appropriate data scaler for your Machine Learning model.

Image to Sketch Active

Python open-source library to convert color or B&W image to pencil sketch.

Data Preparer In Progress

Open-source Python package to clean and prepare your dataset before applying a machine learning model.