Senior data scientist, document intelligence

Job Category: Data AI & ML Jobs
Job Type: Remote

A fast‑moving, cross‑functional product team is seeking a senior data scientist to support an artificial intelligence–driven document intelligence product that extracts structured data from unstructured documents to power business‑to‑business quoting workflows. This role operates as a senior individual contributor with light leadership responsibility, combining hands‑on model development with squad‑level technical guidance.

The position is designed for a data scientist who thinks in terms of product outcomes, owns problems end to end, and collaborates closely with engineering and product partners in an iterative delivery environment.

Responsibilities

  • Serve as a senior data science contributor within a cross‑functional squad focused on document intelligence and data extraction systems.
  • Provide light technical leadership by guiding data science practices, prioritization, and execution within the squad.
  • Design, build, and improve data extraction and AI‑driven pipelines that transform documents into structured data.
  • Analyze data quality and model performance with a strong focus on precision, recall, and real‑world impact.
  • Own experimentation cycles, including hypothesis development, testing, iteration, and performance evaluation.
  • Partner closely with product and engineering to translate ambiguous business problems into scalable data science solutions.
  • Contribute to continuous improvement of data science workflows, delivery quality, and modeling approaches.
  • Leverage modern AI tools to accelerate development while maintaining strong foundational understanding of methods and results.
  • Operate with high ownership and accountability, taking responsibility for outcomes rather than task completion.

Required experience and skills

Technical and analytical skills

  • Strong, hands‑on proficiency in Python, including the ability to debug, explain, and maintain production‑quality code.
  • Solid working knowledge of SQL for data manipulation, analysis, and exploration.
  • Demonstrated strength in data analysis and problem exploration as a core differentiator.
  • Practical experience applying data science and machine learning fundamentals in real‑world systems.

Applied machine learning and experimentation

  • Experience working with document data, natural language processing, or similar unstructured data domains.
  • Exposure to experimentation frameworks, model evaluation techniques, and iterative improvement practices.
  • Familiarity with optimizing models for precision and recall.
  • Awareness of reinforcement learning or adaptive systems as a future‑state capability.
  • Experience with A and B testing concepts and experimentation approaches.

Collaboration and ways of working

  • Strong sense of ownership and accountability for delivering meaningful product outcomes.
  • Ability to work effectively across data science, engineering, and product disciplines.
  • Comfort operating with limited direction and high autonomy.
  • Proven ability to translate loosely defined problems into clear, actionable solutions.
  • Willingness to work beyond traditional role boundaries to achieve squad goals.
  • Curiosity‑driven mindset with a strong interest in experimentation and continuous learning.

Ideal background

  • Senior‑level, full‑stack data scientist with experience across analysis, modeling, and delivery.
  • Comfortable using AI‑assisted development tools without relying on them exclusively.
  • Product‑oriented thinker who prioritizes business impact over isolated technical outputs.

FAQ

1. What are the core responsibilities of a Senior Data Scientist in document intelligence?
This role focuses on building models that extract, classify, and interpret information from unstructured documents such as PDFs, invoices, and contracts. It includes designing end-to-end pipelines for document processing, from ingestion to insight generation. The role also involves improving model accuracy and scalability in production environments.

2. What types of problems are solved in document intelligence?
Common problems include document classification, entity extraction, OCR enhancement, and semantic understanding of text. The role may also involve automating workflows such as invoice processing or contract analysis. Solutions aim to reduce manual effort and improve accuracy.

3. What technologies and tools are commonly used in this role?
Tools include Python, machine learning frameworks like TensorFlow or PyTorch, and NLP libraries such as spaCy or Hugging Face. OCR tools like Tesseract or cloud-based document AI services are often used. Data processing frameworks and cloud platforms support scalable solutions.

4. How is natural language processing (NLP) applied in document intelligence?
NLP techniques are used to extract meaning from text, identify entities, and understand document context. Models such as transformers help process large volumes of text efficiently. NLP enables automation of complex document-based tasks.

5. What role does data preparation play in this position?
Data preparation is critical, including cleaning, labeling, and structuring document data. High-quality training data improves model performance and reliability. The role often involves working with annotation tools and pipelines.

6. How is model performance evaluated in this role?
Performance is measured using metrics such as precision, recall, F1-score, and accuracy. Evaluation may vary depending on tasks like classification or extraction. Continuous monitoring ensures models perform well in production.

 

Apply for this position

**If you have already submitted your resume for another Job Opening please do not re-apply to a different role. You can email through Contact Us about your interest in other roles.

Allowed Type(s): .pdf, .doc, .docx