Professional portrait of Flávia Gaia
NLP MLOps LLMOps GenAI

Recruiter Introduction

I turn data, automation and AI into clear, practical and scalable solutions.

I am Flávia, a Senior Data Scientist with 6 years of experience in AI, data and automation. My work combines Generative AI, NLP, machine learning and the full data process that supports data science, from structure and data quality to modeling, automation and real-world application.

Experience
6 years in data, AI and analytics
Role
Senior Data Scientist
Specialty
Generative AI, RAG and data architecture
Academic Background
UnB, USP e IESB

Executive Summary

What I bring to data and product teams

01

AI applied to critical workflows

I have experience with audit automation, contract reading, clause extraction, regulatory compliance and technical information retrieval using LLMs and multi-agent systems.

02

Strong science and engineering foundation

I work with Python, Spark, Databricks, Delta Lake, MLflow, Streamlit, LangChain and governance-oriented pipelines designed for monitoring and production.

03

Communication and continuous learning

Alongside hands-on delivery, I keep strong technical content, recent Data Engineering certifications and academic research in Data Science.

Professional Journey

Experience built across Industry 4.0, government, consulting and applied AI

Current

FNDE / G4F ecosystem

Role: Senior Data Scientist

Structured an analytics environment in Azure and Databricks with Medallion Architecture, scalable pipelines, governance and a strong foundation for audit and intelligence use cases.

Previous

BBTS

Role: Senior Data Scientist / Senior AI Engineer

Built an intelligent payment calendar system from PDF contracts using LangChain, CrewAI, Databricks Workflows and Delta Lake.

Petrobras

Compass UOL

Role: AI-focused Data Scientist

Created LLM-based solutions for technical PDF extraction, regulatory RAG, Streamlit validation flows, MLflow versioning and autonomous agents.

Foundation

EVCOMX · service provided to Petrobras

Role: Data Scientist / AI Specialist

Worked on machine learning, NLP, technical data extraction, Streamlit, few-shot prompting and the evolution from regex pipelines to LLM-based solutions.

Government

MDS · Ministry of Social Development, Family and Fight Against Hunger

Role: Data Analyst

Analyzed large government datasets such as Bolsa Família, BPC and Cadastro Único using Big Data, SQL, Python scripts and analytical dashboards for decision support.

Early Career

Presidency of the Republic

Role: Data Science Intern

Automated news collection with web scraping, used NLP with spaCy for NER and summarization, built Power BI visualizations and explored experimental deep learning projects.

Areas of Practice

A hybrid profile combining technical depth, real-world execution and business context

Generative AI and LLMOps

Development of solutions with LLMs, RAG, agents and structured workflows for extraction, analysis, automation and decision support in corporate contexts.

Machine Learning and NLP

Experience with classification, supervised models, natural language processing, embeddings, deep learning and applications focused on text, documents and unstructured data.

Data, Architecture and Pipelines

Design of analytics environments, scalable pipelines, governance, data quality and modern architecture to support analytics, machine learning and AI.

Analytics and Decision Support

Exploratory analysis, dataset reconciliation, dashboards, monitoring and translation of data into clear insights for technical, operational and business teams.

Education and Credentials

Credentials that support the hands-on side of my work

Academic

UnB, USP e IESB

Master's in Applied Computing at UnB, MBA in Artificial Intelligence and Big Data at USP, and a Bachelor's in Data Science and Artificial Intelligence at IESB.

Research

Data Science and legal texts

Data Science research line focused on named entity recognition in legal texts, supported by a strong base in databases, large-scale data mining and applied experimentation.

2025-2026

Data Engineering e Databricks

Recent certifications in Airflow, Spark, Snowflake, BigQuery, Modern Data Stack and Databricks tracks connected to data preparation for machine learning and retrieval agents.

Highlighted Credentials

Certificações

Badge Databricks

Databricks

Badge tied to machine learning tracks and modern data ecosystems.

Badge Databricks

Retrieval Agents

Building retrieval agents and applied workflows within the Databricks ecosystem.

Badge Palantir Foundry

Palantir Foundry

Training focused on building data solutions and operations on enterprise platforms.

Badge Palantir Foundry

AIP Foundations

Complementary foundation in AI applications and workflows in the Palantir ecosystem.

Data Engineering

Data Architect 4.0 training with 360 hours, plus focused certifications in Spark, Airflow, BigQuery, Snowflake and Data Engineering fundamentals.

Databricks and Retrieval

Tracks completed in March 2026 in Data Preparation for Machine Learning and Building Retrieval Agents on Databricks.

AI and Agents

Certification in Introduction to LangGraph, an academic talk on ChatGPT at USP and additional tracks focused on applied AI solutions.

Governance and Regulated Environments

Petrobras courses in LGPD, information security, information classification and handling, human rights, conflict of interest and anti-discrimination.

Academic Experience

Read my thesis projects connecting research, prototyping and product thinking

IESB · 2023

NER in legal texts with spaCy and a published application

In my undergraduate capstone, I developed two spaCy models fine-tuned in Brazilian Portuguese for named entity recognition in the legal domain, using the LeNER-Br dataset and publishing a working application on Hugging Face.

  • Theme: extraction of legislation, jurisprudence, people, time, location and organization entities from legal texts.
  • Results: F1 of 81.42% in the small model and 83.76% in the large model, both above 80%.
  • Pitch: applied research in legal NLP, comparative evaluation and a usable end-user deliverable.
USP · 2023

CNN for COVID-19 support in chest X-ray diagnosis

In my MBA thesis, I developed and validated a convolutional neural network model to classify chest X-ray images, focused on supporting COVID-19 diagnosis.

  • Dataset created with 2,089 images, including 505 COVID-19 cases and 1,580 normal cases.
  • Results: 98.40% test accuracy, with hyperparameter tuning and data augmentation.
  • Pitch: deep learning applied to healthcare, experimental design and public code on GitHub.

Digital Presence

Where to explore my work further

Contact

Get in touch and let’s talk

I am always open to new opportunities.