Hi, I'm Raul Rocha
Data Scientist & LLM Engineer
A Data Scientist and LLM Engineer with a passion for building systems that transform messy, unstructured data into clear signals and smarter decisions.
My journey started at USP, where I studied Molecular Sciences: a deeply interdisciplinary program that gave me strong roots in math, statistics, programming, and scientific thinking. Since then, I've worked across the data lifecycle, from SQL pipelines and automated dashboards to forecasting models, churn prediction, and production-ready GenAI solutions.
Over the last few years, I've led and delivered projects that improved marketing performance, cut operational time, and helped teams move faster with fewer errors. What excites me isn't just building a model; it's building something that works. Something that integrates with people, processes, and platforms.
What I work on
- Fine-tuning open-source LLMs (Mistral, DeepSeek, etc.) for business-specific use cases
- Designing RAG pipelines using vector DBs, LangChain, and semantic search
- Deploying LLMs on AWS, with scalability and cost in mind (SageMaker, ECR, Athena, S3)
- Building AI agents that automate complex workflows like campaign mapping
- Maintaining clean SQL pipelines for marketing, ecommerce, and funnel analytics
When needed, I still reach for traditional ML, especially when a lean logistic regression or XGBoost model solves the problem just as well.
I currently specialize in the practical side of Generative AI.
My approach
I don't build AI for the sake of buzz. I build it to ship, improve, and stay useful over time. I prefer:
- Testable experiments over assumptions
- Simpler tools used well, rather than exotic stacks no one maintains
- Transparent documentation that non-engineers can follow
Some results I'm proud of:
- Reducing API inference cost by ~60% after fine-tuning
- Saving 30 minutes per analyst per day through LLM-powered mapping automation
- Improving decision speed with dashboards tied directly to modeled forecasts and churn signals
But most of all, I like when the people using what I build say: "This actually helped."
Why I created fromdata2ai.com
This site is part portfolio, part lab notebook, part digital garden. It exists because I believe:
- Good ideas become better when written down
- AI should be shared, questioned, improved; not hidden behind buzzwords
- The best learning happens when we build in the open
Here, I document the systems I'm building, the ideas I'm refining, and the trade-offs I'm constantly learning from. Whether it's fine-tuning a model for market prediction, embedding customer feedback into vector stores, or exploring how GenAI tools integrate with human workflows — I share the technical details and the design thinking behind them.
A few things that shape how I think
- I'm deeply curious about how information shapes decisions, especially in markets and in teams
- I believe data science is both a craft and a conversation, and clarity is part of our responsibility
- I enjoy systems that blend language, logic, and structure, from pipelines to prompts
- Outside work, I read sci-fi, study behavior, and reflect on how technology transforms how we relate to each other
If you're someone who cares about building AI that works — and keeps working, I hope this space brings you something useful.