About
Time-series forecasting is where NLP was five years ago: dominated by bespoke, single-use models built from scratch for every new problem. The bet here is that the same paradigm shift is coming — and this team is building the foundation model to drive it.
This is a well-funded early-stage startup pre-training foundation models on multi-terabyte time-series datasets. The founders are ML PhDs with deep experience building forecasting infrastructure at hyperscale — the problem they're solving is one they've lived firsthand. The team is small, highly international, and technically serious. They're adding 2–3 ML engineers now ahead of a Series A.
Training runs on a single node today; multi-node and multimodal inputs (text, images, news) are on the roadmap.
This is not a research role. It's the hire that makes experiments fast, the system reliable, and the architecture decisions sound.
What you'll do
- Build and own training infrastructure end-to-end: pipelines, GPU utilisation, iteration speed, and reproducibility
- Architect and train time-series foundation models on diverse multi-modal datasets
- Design reproducible experiments to test, compare, and combine ideas from the literature
- Build data exploration tooling to understand correlations, sparsity, and structure across sources
- Deploy models to production via the API and platform — including the gritty details when ONNX export or torch.compile breaks
- Iterate on model capabilities based on direct customer feedback
- Help shape the engineering and research culture as the team scales
What you'll need
- Deep, end-to-end ML training infrastructure experience — pipelines, GPU utilisation, iteration speed
- Strong understanding of architectural differences between encoder and decoder models, and what those mean for infrastructure decisions
- Breadth across model architectures — transformers, diffusion, Mamba and others; the field hasn't converged and curiosity matters
- Fluency in Python and PyTorch or JAX
- Systems-level thinking: reasons from first principles, not from defaults
- Fully fluent in English — written and verbal, hard requirement
- Must be based in Europe, ideally with a connection to France
Optional Bonus
- Experience with low-level systems programming: CUDA, Rust, or C++
- Multi-node distributed training experience
- Background in time-series data or real-time data streams
Shortlisted candidates will be contacted within 48 hours.