Saad Yaqine - ML Engineer & Data Scientist

Saad Yaqine

ML Engineer & Data Scientist

I build production-ready AI systems that solve real business problems. Specializing in NLP, LLMs, and end-to-end MLOps pipelines, I've deployed sentiment analysis models for financial trading, automated code review systems, and intelligent document retrieval platforms. From data streaming architecture to production deployment, I deliver measurable results.

End-to-End ML Pipelines
NLP & LLM Engineering
MLOps & Production Systems
Real-Time Data Processing
Cloud & Infrastructure

About Me

I'm a Machine Learning Engineer and Data Scientist with 2 years of experience building AI-driven solutions that go beyond proof-of-concept. My focus is on production-grade systems: models that run reliably in real-world environments, handle scale, and deliver business value.

πŸŽ“ Education

Master's Degree in Computer Science

Polytech Marseille

Artificial Intelligence, Machine Learning, Data Science

πŸ“œ Certifications

MLOps Practitioner & Advanced Designer

Dataiku β€’ 2024

Google Cloud Professional Data Scientist

Google Cloud β€’ In Progress

End-to-end ownership

I don't just train models. I architect data pipelines, containerize services, set up monitoring, and deploy to production. My projects span the full ML lifecycle.

Production-first mindset

Whether it's a real-time trading bot processing Kafka streams or an automated code review system integrated with GitHub, my work is designed for reliability, not just demos.

NLP/LLM expertise

I've fine-tuned transformer models for financial sentiment analysis, built RAG systems with vector databases, and integrated LLMs for summarization and generation tasks.

Pragmatic problem-solving

I choose technologies based on requirements, not hype. Sometimes that means FAISS over a managed vector DB, or a well-structured FastAPI service over a heavyweight framework.

Frenchβ€’ Fluent
Englishβ€’ Fluent
Arabicβ€’ Native

Professional Experience

Bouygues Telecom

Data Scientist

2024 - 2025

France

  • β–ΆDesigned a call classification model with 94% accuracy, reducing inefficient interactions by 46%
  • β–ΆBuilt end-to-end MLOps pipelines using Docker, AWS Lambda, and SageMaker, achieving 30% faster deployments
  • β–ΆProcessed large-scale customer datasets with PySpark and Dataiku, improving ingestion performance by 40%
  • β–ΆImplemented automated model monitoring and retraining workflows for production ML systems
PythonPyTorchDockerAWS LambdaSageMakerPySparkDataiku

LeBonCoin

Machine Learning Engineer

2023

France

  • β–ΆCleaned and normalized large volumes of user-generated text using Azure Data Factory pipelines
  • β–ΆTrained a text classifier achieving 92% accuracy using TF-IDF and embedding techniques
  • β–ΆDeployed a production FastAPI microservice for automated text analysis of live ad listings
  • β–ΆOptimized model inference time by 40% through efficient preprocessing and caching strategies
PythonAzure Data FactoryTF-IDFEmbeddingsFastAPIDocker

Universidad PolitΓ©cnica de Valencia

AI/NLP Research Intern

2022 - 2023

Spain

  • β–ΆBuilt multilingual corpora from news articles and social media data for LLM pretraining experiments
  • β–ΆDeveloped a language detection model with >90% accuracy to enhance preprocessing pipelines
  • β–ΆCreated robust NLP pipelines covering text cleaning, vectorization, and encoding for downstream tasks
  • β–ΆContributed to research on multilingual transfer learning and cross-lingual representations
PythonNLPspaCyNLTKTransformersPyTorch

Marsa Maroc

Data Engineer

2022

Morocco

  • β–ΆConducted exploratory data analysis of operational equipment data, identifying modernization and predictive maintenance opportunities
  • β–ΆBuilt comprehensive ETL pipelines aggregating data from sensors, ERP systems, and maintenance logs
  • β–ΆDeveloped advanced Tableau dashboards tracking availability rates and maintenance costs, reducing monthly reporting time by 40%
  • β–ΆImplemented data quality checks and validation rules to ensure reliability of analytics outputs
PythonSQLETLTableauData Analysis

Featured Projects

FinSentBot screenshot

FinSentBot

Real-Time Trading Signal Generation

Built an end-to-end automated trading intelligence system that combines real-time news sentiment analysis with live market data to generate Buy/Hold/Sell signals.

🎯 Problem

Financial traders need to process vast amounts of news and market data to make informed decisions. Manual sentiment analysis is slow, subjective, and can't keep pace with market movements.

πŸ’‘ Solution

Developed a production-ready system using Apache Kafka for real-time data streaming, FinBERT for financial sentiment analysis, and custom ML models for signal generation.

πŸ“Š Key Results

  • βœ“Automated analysis of 100+ financial news articles daily
  • βœ“87% accuracy on financial sentiment classification
  • βœ“End-to-end latency under 5 seconds from news to signal
  • βœ“Production-ready with comprehensive logging and error handling
PythonPyTorchTransformers (FinBERT)Apache KafkaDockerStreamlityfinanceBeautifulSoup
View on GitHub
Code Review AI screenshot

Code Review AI

Automated Python Analysis System

Developed an AI-powered code review system that combines Abstract Syntax Tree (AST) parsing with Claude AI to automatically analyze pull requests and provide detailed, actionable feedback.

🎯 Problem

Manual code reviews are time-consuming and often miss subtle bugs, security issues, or style inconsistencies. Development teams need automated quality checks that integrate seamlessly into their workflow.

πŸ’‘ Solution

Created a production system using FastAPI webhooks, Python AST parsing, and Claude AI for intelligent code analysis with automatic PR commenting.

πŸ“Š Key Results

  • βœ“Deployed to production on Railway with live webhook integration
  • βœ“2,400+ lines of production code with modular architecture
  • βœ“Average review time reduced to <30 seconds per PR
  • βœ“100% automation with zero manual intervention required
PythonFastAPIClaude AI (Anthropic)ASTPyGithubDockerpytestRailway
View on GitHub
DocuMind screenshot

DocuMind

Intelligent RAG System for Semantic Search

Built a Retrieval-Augmented Generation (RAG) system that combines semantic embeddings with vector similarity search to enable intelligent document discovery.

🎯 Problem

Traditional keyword search fails to capture semantic meaning, making it difficult to find relevant information in large document collections. Users need intelligent systems that understand context and intent.

πŸ’‘ Solution

Implemented a RAG architecture using Sentence Transformers for embeddings, FAISS for vector search, and planned LLM integration for generation.

πŸ“Š Key Results

  • βœ“123 documents indexed with sub-second query response times
  • βœ“92% of test queries return relevant results in top-3
  • βœ“Fully containerized MVP ready for production deployment
  • βœ“Roadmap includes LoRA fine-tuning and multi-modal support
PythonSentence TransformersFAISSPyTorchHugging FaceStreamlitDockerNumPy
View on GitHub
AI News Agent screenshot

AI News Agent

Automated Multi-Source News Pipeline

Created an end-to-end automated pipeline that scrapes major tech publications, deduplicates and processes articles, generates AI summaries, and distributes daily email digests.

🎯 Problem

Staying current with technology news across multiple publications is time-consuming. Professionals need curated, summarized content delivered automatically without manual aggregation.

πŸ’‘ Solution

Built a fully automated pipeline using Selenium for scraping, OpenAI for summarization, and GitHub Actions for scheduling.

πŸ“Š Key Results

  • βœ“Automated aggregation from 5 major tech publications daily
  • βœ“100% automated pipeline with zero manual intervention
  • βœ“French summaries optimized for quick consumption (2-3 sentences per article)
  • βœ“Free hosting via GitHub Actions with no server costs
  • βœ“Average 20-30 articles processed and summarized daily
PythonOpenAI APISeleniumSQLiteGitHub ActionsStreamlit
View on GitHub

Technical Skills

Programming & Data

PythonAdvanced
SQLProficient
Bash/ShellProficient
PandasAdvanced
NumPyAdvanced
PySparkProficient
KafkaProficient

Machine Learning & AI

PyTorchAdvanced
TensorFlowProficient
scikit-learnAdvanced
Hugging Face TransformersAdvanced
BERT/FinBERTAdvanced
Model Training & EvaluationAdvanced
Hyperparameter TuningAdvanced

NLP & LLMs

spaCyAdvanced
NLTKProficient
Sentiment AnalysisAdvanced
Text ClassificationAdvanced
LangChainProficient
RAG SystemsProficient
Sentence TransformersProficient
Claude AI APIProficient
OpenAI APIProficient

MLOps & Deployment

DockerAdvanced
FastAPIAdvanced
GitHub ActionsProficient
pytestProficient
CI/CDProficient
DataikuAdvanced

Cloud & Infrastructure

AWS (Lambda, SageMaker, S3, EC2)Proficient
GCPProficient
Azure (Data Factory)Proficient

Data Visualization & Tools

StreamlitAdvanced
TableauProficient
Matplotlib/SeabornAdvanced
Git/GitHubAdvanced

Let's Connect

I'm currently available for full-time ML Engineer / Data Scientist roles and freelance AI/ML projects. Whether you're looking to build production NLP systems, deploy real-time ML pipelines, or implement RAG solutions, I'd love to hear from you.

πŸ“ Based in Paris, France

Open to remote opportunities across Europe and international projects

⏱️ Response Time

I typically respond within 24 hours.