Professional portrait of Flávia Gaia
Python AI Engineering NLP MLOps LLMOps GenAI

Hello, welcome. I am Flávia, a Senior Data Scientist with 6 years of experience in AI, data and automation. My work combines generative AI, NLP, machine learning and the full data process behind data science, from information structure and quality to modeling, automation and real-world application.

Experience
6 years in data, AI and analytics
Positioning
Senior Data Scientist
Specialty
Generative AI, RAG and data architecture
Academic Background
UnB, USP and IESB

Executive Summary

What I bring to data and product teams

01

AI applied to critical workflows

I have experience with audit automation, contract reading, clause extraction, regulatory compliance and technical information retrieval using LLMs and multi-agent systems.

02

Strong science and engineering foundation

I work with Python, Spark, Databricks, Delta Lake, MLflow, Streamlit, LangChain and pipelines designed for governance, monitoring and production.

03

Communication and continuous learning

Alongside professional delivery, I keep strong technical production, recent Data Engineering certifications and academic research in Data Science.

Frameworks and Technologies

Tools and ecosystems I have already worked with in practice

LangChain Langflow Flowise AutoGen Codex Llama Kiro Antigravity Azure AWS Databricks Spark Oracle

Experience

Areas where I have worked with data, automation and applied AI

Data

Auditing and analytics at scale

Role: Senior Data Scientist

Work with PySpark, Databricks and SQL in analytical flows, data quality, inconsistency analysis, automated refresh and applied AI for operational support.

Documents

Document automation and financial flow

Role: Senior Data Scientist / Senior AI Engineer

Development of solutions for PDF reading, rule extraction, financial-flow organization and automation with agents, workflows and analytical persistence.

LLMs

Document assistants with RAG and validation

Role: AI Specialist Data Scientist

Creation of LLM-based solutions for structured document extraction, retrieval, validation interfaces and pipeline evolution with observability.

NLP

Technical extraction and text classification

Role: Data Scientist / AI Specialist

Work across machine learning, NLP, information extraction, text classification, validation interfaces and the evolution from heuristic pipelines to generative AI.

Social

Territorial analysis and analytical monitoring

Role: Data Analyst

Analysis of large datasets, indicator construction, territorial monitoring, dashboards and analytical support for operational and policy-oriented reading.

Early

Web scraping, NLP and visualization

Role: Data Science Intern

Automated content collection, NLP with spaCy, data visualization and experimental modeling and deep learning projects.

Areas of Practice

A hybrid profile combining technical depth, real-world execution and business context

Generative AI and LLMOps

Development of solutions with LLMs, RAG, agents and structured workflows for extraction, analysis, automation and decision support in corporate contexts.

Machine Learning and NLP

Experience with classification, supervised models, natural language processing, embeddings, deep learning and applications focused on text, documents and unstructured data.

Data, Architecture and Pipelines

Design of analytics environments, scalable pipelines, governance, data quality and modern architecture supporting analytics, machine learning and AI.

Analytics and Decision Support

Exploratory analysis, dataset reconciliation, dashboards, monitoring and translation of data into clear insights for technical, operational and business teams.

Education and Certifications

Credentials that support my technical practice

Academic

UnB, USP and IESB

Master’s in Applied Computing at UnB, MBA in Artificial Intelligence and Big Data at USP, and a Bachelor’s degree in Data Science and Artificial Intelligence from IESB.

Research

Data Science and legal texts

Data Science research line focused on named entity recognition in legal texts, supported by a strong foundation in databases, large-scale data mining and applied experimentation.

2025-2026

Data Engineering and Databricks

Recent certifications in Airflow, Spark, Snowflake, BigQuery, Modern Data Stack and Databricks tracks connected to machine learning data preparation and retrieval agents.

Highlighted Certifications

Certifications

Databricks badge

Databricks

Badge connected to machine learning tracks and modern data ecosystems.

Databricks badge

Retrieval Agents

Building retrieval agents and applied workflows inside the Databricks ecosystem.

Palantir Foundry badge

Palantir Foundry

Training focused on building data solutions and operations on enterprise platforms.

Palantir Foundry badge

AIP Foundations

Additional foundation in AI application and workflows within the Palantir ecosystem.

Functions, Tools and Agents with LangChain

Specialization in OpenAI Function Calling and LangChain tools for turning LLMs into operational agents.

Google Data Analytics Professional Certificate

End-to-end analytics workflow using BigQuery, R, SQL and Sheets.

Introduction to LangGraph and LangSmith

Training in graph-based orchestration, observability, traceability and evaluation of LLM applications, strengthening AI engineering practices.

Academic Experience

Read my thesis projects

IESB · 2023

NER in legal texts with spaCy and a published application

In my undergraduate capstone, I developed two spaCy models fine-tuned for named entity recognition in Brazilian legal texts using the LeNER-Br dataset and published a working app on Hugging Face.

  • Topic: extraction of legislation, case law, people, time, location and organizations from legal texts.
  • Results: F1 of 81.42% for the small model and 83.76% for the large model.
  • Technical highlight: applied legal NLP, comparative evaluation and a usable end-user deliverable.
USP · 2023

CNN for COVID-19 support in chest X-ray diagnosis

In my MBA thesis, I developed and validated a convolutional neural network model to classify chest X-ray images with a focus on supporting COVID-19 diagnosis.

  • Dataset built with 2,089 images, including 505 COVID-19 cases and 1,580 normal cases.
  • Results: 98.40% test accuracy, with hyperparameter tuning and data augmentation.
  • Technical highlight: deep learning applied to healthcare, experimental design and public code on GitHub.

Where to Find Me

Projects, publications and technical presence

Contact

Get in touch and let’s talk

I am always open to new opportunities.