Foto de perfil de Iván Seldas
DATA SCIENTIST & ARCHITECT

Hi, I'm Iván Seldas.
Welcome to my Portfolio.

I discovered my passion for data while designing a Data Center.
Since then, I've been building Data Science solutions with today's AI and ML tools.

Let’s work together • ML projects ready for deployment • Live Demos & Code Available

Programming

Python SQL PySpark C

Geospatial Analysis

ArcGIS ArcPy PostGIS GeoPandas Shapely NetworkX OSMnx

Machine Learning

Classification Regression Clustering Forecasting A/B Testing

Generative AI & NLP

RAG LLM Fine-Tuning Embeddings Semantic Search Transformers RoBERTa

Visualization

Tableau Power BI Plotly Matplotlib

Cloud & Big Data

GCP (Cloud Run, BigQuery) AWS (Glue, SageMaker, ECS Fargate) PostgreSQL

Automation & MLOps

MLflow Docker GitHub CI/CD n8n FastAPI Streamlit
Selected Work

BCN Noise Predictions Time Series

What if your ML model could hear the city and predict its next move?
I've developed a real-time noise forecasting system that leverages urban noise data.

  • Production pipeline: feature engineering, experiment tracking (MLflow), model registry & deployment.
  • Real-time stack: FastAPI service + Streamlit dashboard (Google Cloud Run) with Docker + CI/CD.
Results
  • 70% error reduction vs baseline models (RMSE 3.0 dB,
    MAE ~1.2 dB), accurately predicting when noise exceeds 65-70 dB human safety threshold.
  • Forecast ranges capture 93–95% of real outcomes.
Screenshot of the Barcelona Noise Forecasting project
Time Series Forecasting FastAPI Streamlit MLflow Docker CI/CD Google Cloud Platform

Sentiment-Driven Video Recommendation System

Ever wondered what AI fans really want to watch next?
I turned 180K+ YouTube comments into personalized video suggestions.

  • Video recommendation system using clustering over 2,500+ videos, sentiment analysis of 180,000+ comments in PySpark, and content similarity (TF‑IDF), generating personalized suggestions based on the current video.
Results
  • Web app in production, deployed with Docker on Google Cloud Platform.
  • Recommendations computed via a final score conditioned on the currently watched video.
Python RoBERTa Transformers Classification Docker Google Cloud Platform Embeddings CI/CD

Automated Content Generator - Agentic RAG Workflow

Manual drafts are slow. Generate long-form, cited content from your internal docs.

  • Automated research and article generation using your own content base, retrieved from a vectorized PostgreSQL database.
  • Delivered structured, verifiable content with citation tracking and quality controls.
Results
  • Reduces content creation time to a minimum
  • Maintains content structure and quality standards
  • Only stored docs + citations (no hallucination)
n8n AI Agents LLM Fine-Tuning Supabase Semantic Search Embeddings RAG Pipelines

Clients Churn Prediction

Can you predict when a customer is about to churn?
I developed a supervised ML pipeline (LogReg, Random Forest, XGBoost) to detect and rank customers at highest risk of churn.

  • EDA + Feature Engineering to expose churn patterns across categorical and numeric signals.
  • Trained and tuned Logistic Regression, Random Forest, and XGBoost models optimizing for recall and ROC-AUC.
Results
  • Recall 0.773, Precision 0.94, ROC-AUC 0.928 on validation data (tuned XGBoost).
  • Potential 20% churn reduction by applying a 30% effectiveness campaign to the top 500 high-risk customers ranked by the model.
Screenshot of the Telecom Churn Prediction project
Classification

More details

Feature store powered pipeline with experiment tracking and SHAP-driven explanations. Calibrated ensemble for actionable retention strategies.

  • Segment-specific thresholds for contact prioritization.
  • Top drivers surfaced for each subscriber for tailored offers.
About

Who I Am

I’m a Data Scientist, and an Architect.

It all started during my Master's thesis in Architecture, where I designed a data center. To do it properly, I had to understand how data actually worked… and that's when I knew I wanted to work with data.

That moment was the start of a new chapter for me. I began learning to code from scratch at 42 Madrid (Telefónica), then specialised in Data Science, Machine Learning, and AI at Ironhack — where I trained to become a Data Scientist.

I've always built things - I guess that's what I do best. Now, I build with data. In the end, the goal has remained the same: to make life a little better for people.

Iván Seldas casual portrait
That's me in Lanzarote. What a place!

Experience

Team Leader & BIM Architect — Marchese Partners | LIFE3A
2019 – Present · Madrid

  • Led the interior architecture team in Madrid for senior living projects in Sydney (200+ units), coordinating development with the Australian office.
  • Planned project documentation across all phases, managing timelines and resources.
  • Integrated AI into creative processes, documentation workflows, and quality assurance.
  • Automated BIM workflows using Dynamo and Python scripts.
  • Key projects: Akoya (SYD), Newgreens Chatswood (SYD), Cumberland Country Golf Club (SYD).

Education & Certifications

  • Data Science & Machine Learning Bootcamp
    2024 · Ironhack, Barcelona
  • IBM Data Science Professional Certificate
    2025 · Online
  • Google Advanced Data Analytics Certificate
    2024 · Online
  • 42 Madrid Programming Campus
    2022–2023 · Campus Telefónica
  • Master’s Degree in Architecture
    2019–2021 · Universidad Politécnica de Madrid
  • Bachelor’s Degree in Architecture
    2013–2019 · Universidad Politécnica de Madrid
    2017-2018 · Karlsruhe Institute of Technology
Contact

Based in Madrid. Always open to coffee chats about AI & data.

Email me
Currently: Open to full-time & freelance