Technical portfolio

Projects organized by solution type, stack and technical focus

Instead of stacking everything at the same level, this page is organized into four main fronts with highlighted cases and a supporting catalog.

Data and analytics PySpark, Databricks, quality, tabular ML and dashboards
Documents and NLP OCR, PDFs, text classification and document consistency
RAG, agents and platform Search, assistants, MCP, automation and observability

Portfolio map

Choose the area that best fits the conversation

Data, audit and analytics

Quality, operational analytics, tabular machine learning and analytical indicators

This front groups PySpark, Databricks, operational analytics, inconsistency detection, territorial analysis and tabular ML work.

Synthetic invoice audit dashboard screen Synthetic invoice audit dashboard screen

Invoice audit with PySpark, Databricks and Genie

Analytical tables and PySpark queries built in Databricks notebooks to investigate transaction issues affecting invoice analysis.

  • Goal: structure a reliable view of inconsistencies in high-volume data.
  • Delivery: analytical tables, PySpark queries and views by inconsistency type.
  • Stack: PySpark, Databricks notebooks, analytical modeling, Genie and dashboard layer.
  • Technical highlight: big data analytics engineering, data quality and automated refresh.

Bank transaction audit with Random Forest

Machine learning project focused on prioritizing inconsistencies and risk signals in transactional data to support audit and analysis.

  • Goal: classify risk signals and prioritize transactions for review.
  • Delivery: supervised classification with tabular preparation and derived features.
  • Stack: Python, pandas, scikit-learn, matplotlib, pyarrow, joblib and unittest.
  • Technical highlight: supervised tabular ML and explainability for analytical support.

Outlier Detection Lab for inconsistencies and anomalies

Outlier and anomaly detection lab on large datasets to identify extremes, unlikely combinations and cases worth manual review.

  • Goal: identify anomalies and extreme cases for review.
  • Delivery: comparison of statistical and unsupervised approaches.
  • Stack: Python, pandas, scikit-learn, robust statistics and unsupervised ML.
  • Technical highlight: combining statistical methods and unsupervised models for audit support.
CadUnico dashboard

CadÚnico profile analytics

Project inspired by sampled public microdata to analyze income, registration status, family vulnerability and territorial prioritization.

  • Goal: transform social microdata into territorial and managerial analysis.
  • Delivery: vulnerability indicators, profile views and territorial analytics.
  • Stack: Python, pandas, numpy, Streamlit and Plotly.
  • Technical highlight: social indicators and territorial prioritization.

Social indicators and territorial analysis

Dashboards and analyses built from public data for territorial reading and program comparison.

Bolsa Família vs BPC by territory

Territorial comparison to understand social spending composition and program dependency.

View on GitHub
BPC judicialization and concentration map

Municipal view of benefit concentration and judicialization signals.

View on GitHub
Bolsa Família territorial evolution

Territorial follow-up combining social and operational views.

View on GitHub

Classical ML, time series and tabular experiments

Cases with regression, classification, forecasting and anomaly labs on structured datasets.

Covid-19 deaths linear regression baseline

Daily time series built from public data for an interpretable baseline model.

View on GitHub
Loan Default XGBoost

Default prediction on tabular data with a risk-oriented classification approach.

View on GitHub
Anomaly Detection Lab sklearn

Complementary anomaly lab with statistical techniques and unsupervised algorithms.

View on GitHub
Sales Forecasting GRU

Sales forecasting experiment comparing sequence architectures.

View on GitHub

Documents, NLP and OCR

Document reading, text classification, scraping and information extraction

This front brings together PDFs, OCR, classification, document consistency, NLP and text monitoring.

Payment calendar screen Payment alerts screen

Contract reading, payment calendar and escalation with AI

Solution for reading contract PDFs, extracting financial clauses, building expected payment calendars and tracking divergences.

  • Goal: transform unstructured documents into a monitorable financial flow.
  • Delivery: contract reading, clause extraction and expected payment calendar generation.
  • Stack: Python, regex, PDF processing, analytical architecture and agent layer.
  • Technical highlight: contract extraction, financial reconciliation and workflow automation.
Technical Request Document Assistant

Technical Request Document Assistant

Integrated flow for technical request reading, structured PDF field extraction and related reference retrieval.

  • Goal: structure document reading and reference lookup in a single interface.
  • Delivery: structured PDF extraction and reference retrieval in one app.
  • Stack: Python, reportlab, pypdf, pandas, scikit-learn and Streamlit.
  • Technical highlight: structured extraction, semantic retrieval and document assistant design.
Engineering Document Consistency AI

Engineering Document Consistency AI

Pipeline for clause extraction, semantic search, inconsistency detection and human review in a dashboard.

  • Goal: compare documents, retrieve relevant passages and support inconsistency review.
  • Delivery: document comparison and semantic retrieval of related passages.
  • Stack: Python, reportlab, pypdf, pandas, scikit-learn, Streamlit and Plotly.
  • Technical highlight: document governance, semantic comparison and review workflow.
Articles by theme Top entities

Political and Economic News Intelligence Dashboard

Web scraping project using `newspaper3k` to collect news, structure an analytical dataset, apply NLP and publish an interactive dashboard.

  • Goal: collect, enrich and visualize news content in an analytical structure.
  • Delivery: scraping, NLP and dashboard with thematic and entity views.
  • Stack: Python, newspaper3k, pandas, spaCy, Streamlit and Plotly.
  • Technical highlight: automated collection, NLP over news and executive visualization.

Extraction, classification and text understanding

Cases built around tags, text classification, routing and operational reading of requests and documents.

LLM Tag Extraction Lab

Comparison between rigid baselines, fuzzy matching, few-shot prompting and human validation.

View on GitHub
Maintenance Request Classification

Supervised classification for routing maintenance requests based on text and operational attributes.

View on GitHub
Ticket Classification Pipeline

Classification pipeline for tickets and queue organization by category and priority.

View on GitHub
Fake News Detection

Binary text classification with PyTorch and a sequence-based model.

View on GitHub

OCR, legal and document automation

Projects oriented to OCR, structured extraction, automatic filling and legal workflows.

Document Auto Fill OCR

OCR pipeline for field extraction and automatic pre-filling from photos and scans.

View on GitHub
Processo Judicial OCR

OCR over legal documents with structured output for more operational analysis.

View on GitHub
Judicial Settlement MVP

MVP for settlement evaluation with OCR, enrichment and explainable structuring.

View on GitHub
Invoice Processing UiPath

Document automation for accounts payable with OCR and operational routines.

View on GitHub

Search, RAG and assistants

Context retrieval, ranking and evidence-grounded answer systems

This front concentrates retrieval experiments, assisted generation, document Q&A and hybrid search pipelines.

RAG NLP SQL

RAG NLP SQL with LangChain, OpenAI and SQLite

Python application that answers natural language questions over a SQL database by combining schema-aware retrieval and SQL generation.

  • Goal: allow natural language analytics over relational data.
  • Delivery: app for question answering over SQL with semantic context and assisted query generation.
  • Stack: Python, LangChain, SQLAlchemy, SQLite, Streamlit and BM25Retriever.
  • Technical highlight: RAG applied beyond documents, improving structured navigation.
Search Performance Assistant

Search Performance Assistant for retrieval evaluation

Python application to study document retrieval with TF-IDF, vector indexing, fallback behavior and evidence-based assistant responses.

  • Goal: explore retrieval in a transparent and comparable way.
  • Delivery: ingestion, indexing, retrieval and visual explanation of search results.
  • Stack: Python, scikit-learn, TF-IDF, FAISS, cosine similarity, Tkinter and unittest.
  • Technical highlight: retrieval evaluation and traceable ranking explanation.
Release Notes Generation Assistant

Release Notes Generation Assistant

Python app for assisted release note generation based on release context, pull requests, similarity retrieval and evaluation metrics.

  • Goal: organize release context and PRs for assisted note generation.
  • Delivery: release note generation with retrieval and thematic rules.
  • Stack: Python, scikit-learn, TF-IDF, cosine similarity, Tkinter and unittest.
  • Technical highlight: product-oriented retrieval pipeline with reproducible evaluation.

Educational and document assistants

Repositories focused on Q&A, study material organization and internal-knowledge style assistance.

Academic Paper RAG Search

Question answering over academic papers and technical chapters with evidence-based retrieval.

View on GitHub
Educational RAG Assistant

Educational assistant answering questions over chapters, articles, notes and FAQs.

View on GitHub
Syllabus to Study Guide RAG

Pipeline that turns course material into study guides, summaries and review questions with citations.

View on GitHub
Student Support Copilot

Copilot for academic rules, administrative questions and next-step guidance.

View on GitHub

Retrieval, ranking and search experiments

Experiments around hybrid search, ranking and retrieval pipelines across different domains.

Visual Product Complaint Retrieval

Complaint-oriented retrieval and multimodal search in a product context.

View on GitHub
Hybrid Ranking Product Search

Product search combining different ranking strategies in the same pipeline.

View on GitHub
Hybrid Ranking Support Search

Hybrid ranking system for support tickets and knowledge bases.

View on GitHub
PDF to RAG Rechunking

Chunking and rechunking experiments to improve retrieval quality in document pipelines.

View on GitHub

Agents, automation and platform

Tooling for agents, workflows, MCP, MLOps and product delivery

This front combines MCP servers, agent-driven automations, observability and product/platform work.

MCP Docs Assistant

MCP Docs Assistant

Read-only MCP server for local markdown documentation with resources, tools and prompt support.

  • Goal: expose local documentation in a format consumable by MCP clients.
  • Delivery: read-only MCP server with catalog, search and retrieval.
  • Stack: Python, FastMCP, rank-bm25, frontmatter parsing and markdown.
  • Technical highlight: MCP design, BM25 retrieval and agent-ready documentation access.
MCP SQL Analytics Server

MCP SQL Analytics Server

MCP server for SQL analytics exposing structured tools for schema inspection and analytical querying.

  • Goal: enable structured data exploration by agents in a controlled environment.
  • Delivery: MCP tools for inspection and SQL analytics.
  • Stack: Python, MCP, SQL analytics and modular tool design.
  • Technical highlight: agent-oriented data access and tool-based analytics.
Curriculo Site

Curriculo Site built with Codex and vibe coding

Personal website and technical portfolio with bilingual pages, booking pages and production publishing on a custom domain.

  • Goal: consolidate a professional presence in a custom domain.
  • Delivery: bilingual site with home, portfolio, booking flow and production deployment.
  • Stack: HTML, CSS, JavaScript, GitHub, Vercel and manual technical content curation.
  • Technical highlight: AI-assisted prototyping, content structuring and shipping.

Agents, automation and workflows

Multi-agent setups, agent-driven automation, routing and HITL workflows.

AI Support Triage with HITL

Support triage workflow with human approval, retrieval and automated routing.

View on GitHub
Candidate Screening Workflow n8n

Candidate screening flow with automation and staged evaluation logic.

View on GitHub
Learning Path Agents

Agents for learning-path organization and recommendation.

View on GitHub
Market Intelligence CrewAI

Agent structure for market briefing, synthesis and intelligence workflows.

View on GitHub

Credit and domain-specific agents

Repositories focused on credit, service, fraud prevention and business insights.

Credit Analysis Agent

Support for profile, risk and decision reading in credit workflows.

View on GitHub
Customer Service Agent

Structured service workflow built with PydanticAI.

View on GitHub
Fraud Prevention Agent

Fraud signal monitoring and support for prevention-oriented workflows.

View on GitHub
Portfolio Risk Agent

Risk monitoring, alerting and portfolio-oriented analysis.

View on GitHub

MLOps, observability and cloud labs

Repos for serving, monitoring, feature pipelines, Vertex AI, Kubeflow and cloud experimentation.

ML Model Serving Observability

Model observability with metrics, Prometheus, Grafana and operational monitoring.

View on GitHub
Feature Store Pipeline Metaflow

Versioned and reproducible feature pipeline for training and scoring with Metaflow.

View on GitHub
Vertex AI and Kubeflow Labs

Training pipelines and benchmarks with Vertex AI, Kubeflow and recommendation/computer vision workloads.

View on GitHub
Cloud repositories

Umbrella repositories for GCP, AWS and Azure experiments organized by platform.

GCP