Principal Data Scientist (AI)- REMOTE (US) Job Details

Job Description

Principal Data Scientist (AI)- REMOTE (US)

Job Location (Short): Houston, Texas-USA | Madison, Alabama-USA | Roanoke, Virginia-USA

Workplace Type: Remote

Req Id: 2289

Responsibilities

Octave's ETQ division is seeking a hands-on Data Scientist to build predictive models, implement Generative AI and Agentic AI features, and architect data-driven solutions for our document-based compliance management platform. This role requires a technical expert who can develop, deploy, and maintain ML systems in production environments.

Build and deploy Generative AI features using foundation models (AWS Bedrock, OpenAI, Anthropic Claude) and RAG architectures with vector databases for compliance document understanding
Design agentic AI systems that autonomously handle compliance workflows, document review, regulatory mapping, and multi-step reasoning tasks
Implement comprehensive LLM evaluation frameworks with automated pipelines, custom metrics, benchmark datasets, and safety guardrails ensuring regulatory compliance
Build end-to-end MLOps pipelines for model training, deployment, monitoring, versioning, and automated retraining with drift detection
Develop predictive models for compliance risk scoring, regulatory change impact, anomaly detection, and time-series forecasting
Write production-quality Python code for data processing, feature engineering, API development (FastAPI/Flask), and ETL/ELT workflows
Lead A/B experiments and product analytics to measure AI feature impact and drive data-driven decision-making
Create explainability frameworks (SHAP/LIME) and monitoring dashboards ensuring transparency and regulatory adherence
Collaborate with cross-functional teams to translate business needs into ML solutions and communicate insights to stakeholders

Python (5+ years): Production-level experience with Pandas, NumPy, scikit-learn, XGBoost, TensorFlow/PyTorch, Hugging Face Transformers, FastAPI/Flask, MLflow, and pytest
SQL: Advanced proficiency with complex queries, window functions, and optimization
Machine Learning & NLP: Strong foundation in supervised/unsupervised learning, deep learning, document understanding, text classification, and semantic analysis
Generative AI & LLMs: Hands-on experience with foundation models (GPT, Claude, Llama), prompt engineering, RAG architectures, and vector databases (Pinecone, Weaviate, Chroma)
MLOps & ModelOps: End-to-end experience with ML pipelines, experiment tracking (MLflow, W&B), model versioning, feature stores, drift detection, CI/CD for ML, and Docker containerization
LLM Evaluation: Experience with evaluation frameworks (RAGAS, DeepEval), custom metrics, benchmark datasets, and human-in-the-loop validation
Cloud & AWS: Experience with AWS services including SageMaker, Bedrock, S3, Lambda, EC2, and CloudWatch
Statistics & Experimentation: Strong foundation in statistics, A/B testing, causal inference, and experimental design
Visualization: Proficiency with Tableau, Power BI, or Python visualization libraries

Education / Qualifications

Experience & Education

7+ years in data science, ML engineering, or related roles
3+ years building NLP/generative AI applications and implementing MLOps in production
Bachelor's or Master's degree in Data Science, Computer Science, Statistics, or related field (PhD preferred)
Track record of deploying ML systems processing large-scale datasets with proper monitoring and governance

Preferred Qualifications

Experience with agentic AI frameworks (LangGraph, LangChain, AutoGen, CrewAI)
Knowledge of Life Sciences/regulated industries (FDA, EMA, ISO, GxP) and compliance management systems
Familiarity with big data tools (Spark, Databricks, Snowflake), orchestration (Airflow, Kubeflow), and monitoring tools (Datadog, Prometheus)
Experience with LLM fine-tuning, document processing libraries, multi-modal AI, or distributed training
Understanding of ML governance, bias detection, model risk management, and data privacy regulations (GDPR, CCPA, HIPAA)
Experience working in agile environments with Jira
AWS ML certifications or similar credentials

Key Competencies

Strong communication skills explaining complex models to technical and non-technical audiences
Ability to work independently and collaboratively in fast-paced environments
Proven ability to convert POCs into production-grade solutions
Understanding of ethical AI and building trustworthy, explainable systems for regulated environments

Octave will not provide visa sponsorship for this role.

ote

About Octave

Octave provides mission-critical software that empowers organizations to make informed decisions across every stage of the asset lifecycle — Design, Build, Operate and Protect — where performance, safety, and reliability are non-negotiable and failure is not an option.

Turning complex operational data into actionable intelligence, Octave connects expertise, real-world conditions and enterprise-scale insight to improve performance, resilience and incident response where it matters most.

Octave has approximately 7,200 employees in 45 countries. Learn more at octave.com and follow us on LinkedIn.

Why work for Octave?

All in. Always forward. That's the way we do things around here. We put trust in our people because we believe it's the best way to unleash potential, bring ideas to life, and keep moving ahead. And it's why we're committed to creating an environment that's truly supportive, providing you with the resources you need to support your ambitions, no matter who you are or where you are in the world.

Everyone is welcome

At Octave, we believe that diverse and inclusive teams are critical to the success of our people and our business. Here, everyone is welcome. As an inclusive workplace, we don't discriminate. In fact, we embrace differences and are fully committed to creating equal opportunities, an inclusive environment, and fairness for all.

Respect is the cornerstone of how we operate, so speak up and be yourself. You're valued here.