Engineering · Remote
Data
Engineer
Plexi is a data platform for asset managers, hedge funds, and fintechs. We are heavy AI users and we are looking for a data engineer hungry to grow into both modern data engineering and the new agentic way of building software.
About Plexi
Plexi is two connected products. Plexi Fact is our data lakehouse platform - we pull data from dozens of vendors, clean it, model it, and serve it back through dashboards, APIs, Excel, and AI agents. Enterprise clients, real traction, growing fast. Plexi Flexor is the agent platform we use to build everything else - 50+ specialized agents that pair with us on schema review, query tuning, anomaly diagnosis, lineage, code review, and more. We don't just use those agents. We design them, refine them, and ship the patterns we find.
Why You Want to Be Part of This Story
If you've been pushing what Claude Code or Cursor can do beyond autocomplete - if you have opinions about prompts, agent design, and where this is all going - you'll be at home here. You'll work directly with the founders. You'll have a roster of agents reviewing your DAGs, explaining query plans, diffing schemas, and diagnosing pipeline failures, and you'll be writing the skills and prompts that make them better. The work moves fast and the bar is high; you'll learn quickly.
You'll be trusted to find what needs doing, dive in, build the context, make a plan, and ship - pulling others in when their expertise is genuinely needed. We're a small team and there's more work than people. That's the deal, and it's also the opportunity.
What You'll Actually Do
- Build and operate Apache Airflow DAGs that ingest messy data from real financial vendors
- Design and optimize ETL/ELT pipelines in Python - Pandas, NumPy, SQLAlchemy
- Model data in SQL Server: an operational schema and a data warehouse, hundreds of tables across the two
- Write and tune T-SQL: stored procedures, views, complex queries, indexing decisions
- Pair with AI agents on data ops: direct them, review their work, push back when they're wrong
- Write Flexor skills and hooks for data ops, schema management, and pipeline diagnostics
- Sit in on client calls, map their data into our model, and ship the integration that same week
- Contribute to architecture decisions - we don't gatekeep that to senior people
Tech Stack
Orchestration
- Apache Airflow (multiple DAGs in production)
Languages
- Python 3
- SQL (T-SQL primary, PostgreSQL secondary)
Data Libraries
- Pandas
- NumPy
- SQLAlchemy
- pyodbc
Databases
- SQL Server (with Flyway migrations)
- PostgreSQL
Cloud
- Azure
- Azure DevOps CI/CD
AI / Agentic
- Claude Code
- Anthropic API
- MCP (Model Context Protocol)
- Ollama
- our internal Flexor platform
Vendors Integrated
- Bloomberg DL
- CME
- CSI
- Eze
- IBKR
- SEC EDGAR
- Snowflake
What Matters
- You're a self-starter who drives a problem end-to-end - dive in, build understanding, make a plan, execute, and pull others in when their expertise is needed. You don't wait to be told what to do, and you don't get stuck waiting for permission.
- You actually work with data - degree, bootcamp, previous job, or side projects that grew teeth. We care what you can do, not how you got there.
- You already use AI in your work and you push it harder than your friends do
- Strong Python data manipulation - Pandas and NumPy in your sleep, or close to it
- Real SQL fluency or hunger to get there fast - SELECTs aren't enough; you need to think in joins, indexes, and query plans
- You've shipped or operated at least one pipeline that ran on a schedule and mattered (school project counts if it actually ran)
- You're comfortable reading vendor docs and integrating messy external APIs
- You can sit in front of a client and translate "how is your data shaped?" into a real schema
Nice to Have (We'll Teach You the Rest)
- Hands-on Airflow (or Prefect, Dagster, Luigi - any orchestrator)
- An MCP server, Claude skill, custom agent, or hook you've built or extended
- Financial / market / reference data background
- Stored procs and Flyway-style versioned migrations
- Snowflake or other cloud data warehouse
- Data modeling: dimensional, normalized, lakehouse
- dbt, Kafka, Event Hubs, Azure Data Factory, Synapse
What We Offer
Direct mentorship from founders and senior engineers who have been doing this a long time
Greenfield design alongside operational ownership - you build it, you run it
Client exposure early - your work lands in front of paying users
A modern data stack with practical pragmatism, no rebuild-for-the-sake-of-it
Remote-friendly, flexible schedule
Competitive compensation calibrated to experience
We respond to every application within 3 business days.
Apply for This RoleQuestions first? Email careers@plexl.ai