Data Engineering·2025·Johnson Matthey

Lab Data Pipeline

Manual lab operations replaced with a self-updating cloud pipeline.

PythonAWS EC2Google Sheets APIgspreadpandascronGit

View source

Context: A lab operations team was exporting CSVs and emailing spreadsheets. Analysts reinvented the same queries every week. The business wanted a single, always-current picture.
Timeline: Delivered as a team data project, 2025
Role: Infrastructure, ingestion pipeline, analysis, deployment

Samples: 310k+
Schedule: Daily cron ingest
Deploy: AWS EC2

What I built

A Python ingestion pipeline that pulls lab data from Google Sheets daily.
An AWS EC2 deployment running on Amazon Linux, SSH-secured, Git-managed.
Exploratory analysis across 310k+ samples answering 20 standing business questions.
A cron-scheduled refresh so downstream dashboards never lag more than 24 hours.
Team-accessible docs covering the setup, credentials model, and recovery.

Architecture

Ingestion

PythongspreadGoogle Sheets APIDrive API

Compute

AWS EC2 (t3)Amazon Linuxvenvcron

Analysis

pandasnumpymatplotlibseaborn

Ops

SSHGitsystemd unitlog rotation

Interesting decisions

Decision

EC2 over serverless

The refresh runs long-form pandas code over hundreds of thousands of rows. Cold-start latency on a typical serverless function would dominate. A small always-on EC2 instance is cheaper, predictable, and owns its state.

Decision

Cron over an orchestrator

Airflow or Prefect would be overkill for one daily job. Cron plus structured logging keeps the stack small enough for a single engineer to own. If this ever grows to dozens of jobs, the swap is clean.

Result

Data that used to take a full day to compile is ready by 6am every morning. The analyst team reclaimed several hours per week and the leadership reports pull from a single, trusted source.

→More case studies

Full-stack

Jumbo / InterStar

Full e-commerce and ERP system. Storefront returning soon after a compliance update.

Read case study

AI & Automation

Competitive Intelligence Agent

An AI agent that runs weeks of market research in minutes.

Read case study

Frontend

PickAxe

Data-heavy client site with a custom interactive profit calculator.

Read case study

Have a build like this in mind?

Book a call See more work