Product · Engineering

From iPhone to gallery in 8 seconds: the tech behind Picsaris

May 24, 20267 min readPicsaris team

A guest taps the camera icon. Selects a photo from their library. Hits "share." Eight seconds later, the photo appears in the shared event gallery and the face-matching algorithm starts running. For guests in the moment, it feels instant. Behind the scenes, we're orchestrating 400 operations across three cloud services without blocking the UI. Here's how.

The architecture (10,000 foot view)

Frontend: React Native (iOS + Android). Local image processing, upload state management, real-time UI updates.
Backend: Next.js API (AWS Lambda). Metadata validation, queue management, orchestration.
Storage: Cloudflare R2 (S3-compatible object storage). Photos uploaded directly from phone; we store presigned URLs.
Image processing: AWS Rekognition (face detection, embedding extraction).
Database: Supabase PostgreSQL. Event metadata, user accounts, photo records, face vectors.

The upload flow (what actually happens in those 8 seconds)

Step 1: Client-side prep (0.5 seconds)

The app receives the photo from the camera roll:

Detects format (HEIC on iPhone, JPEG on Android). If HEIC, converts to JPEG in-memory using native APIs (fast).
Resizes to max 2400×2400px (maintains quality, reduces upload size).
Generates a SHA256 hash of the file (duplicate detection — if Aunt Karen uploads the same photo twice, we know).
Creates a presigned URL request to our backend: "I want to upload this photo, here's the hash."

Step 2: Backend grants access (0.1 seconds)

Our API validates the request:

Check: is this user in this event?
Check: have we seen this hash before in this event? (duplicate prevention)
Generate a Cloudflare R2 presigned URL valid for 15 minutes.
Create a "photo record" in the database with status = "processing".
Return the presigned URL to the client.

All of this happens in a single Lambda invocation, sub-100ms.

Step 3: Direct upload to storage (3–5 seconds, depends on network)

The app now uploads directly to Cloudflare R2 using the presigned URL:

React Native makes an HTTP PUT request directly to R2 (not through our servers).
This is the big win: the phone talks directly to cloud storage, bypassing our API. Network is the only bottleneck.
On 4G/5G, the upload completes in 3–5 seconds. On WiFi, sub-2 seconds.
The app shows a progress bar to the user (Upload: 47%).

Step 4: Notify backend (0.1 seconds)

Once upload completes, the app makes a final callback:

"Upload done. Photo hash: ABC123. File size: 2.1 MB."
Backend confirms receipt and enqueues the photo for face processing.
Status changes from "processing" to "queued".

Step 5: Asynchronous face matching (starts immediately, completes in 2–10 seconds)

Now the async work begins:

An AWS Lambda worker picks up the photo from the queue.
Downloads it from R2 to a /tmp filesystem (fast, S3-compatible).
Passes it to AWS Rekognition: "Detect faces in this image and extract embeddings."
Rekognition returns face bounding boxes and 512-number vectors per face.
Backend stores the face vectors in the database, linked to the photo ID.
Immediately runs vector matching: "Compare these faces against all event attendees' sign-up vectors."
For each match above threshold, create a record: "This person appears in this photo."

Why it feels instant (even though it's async)

Progressive UI updates

The frontend doesn't wait for face processing. After the upload completes:

The app immediately shows the photo in the user's gallery with a "thumbnail" state.
The UI shows: "Uploaded! Finding you in photos..."
Behind the scenes, the backend is running Rekognition and vector matching.
As soon as matching completes, the photo animates into the "shared gallery" view.

Shared gallery updates in real-time

We use websockets (via Supabase Realtime) to push updates:

The backend broadcasts: "New photo uploaded by @guest_name, matches 4 people."
Every guest's app receives the notification and refreshes their personal gallery in real-time.
No polling, no stale data, instant refresh.

The performance tricks

Presigned URLs (don't upload through our servers)

Standard naive approach: app → our API → R2. Adds 2× latency, burns our bandwidth.

Our approach: app → R2 directly. Our API only handles metadata (sub-100ms). The photo itself never touches our servers.

Format detection on-device

iPhone captures HEIC by default. Converting to JPEG on our servers would block the upload flow and waste cloud compute. We do it on the phone using native APIs (instant).

Parallel Rekognition calls

If a photo has 8 faces, we don't call Rekognition 8 times. One call, one response with 8 vectors. Batch processing.

Vector caching

Every sign-up vector is cached in memory on the matching Lambda. On the first request, we warm the cache. Subsequent matches are microseconds (just distance calculations).

Batch queue processing

If 100 photos upload simultaneously (common at events), we don't spawn 100 Lambda invocations. SQS groups them into batches of 10. Process 10 at a time, scale down when done.

The data flow (in one sentence per step)

User picks photo from camera roll.
App resizes and converts to JPEG locally.
App requests presigned URL from our API.
API validates user, generates R2 presigned URL, creates photo record.
App uploads directly to R2 using presigned URL.
App notifies our API: "Upload complete."
API enqueues photo for processing.
Lambda worker downloads photo from R2.
Lambda calls Rekognition: detect faces, extract vectors.
Lambda compares vectors against all attendee sign-up vectors (cached in memory).
Lambda writes matches to database.
Realtime event fires: photo appears in shared gallery + attendees' personal galleries.

The cost math

R2 storage: $0.015/GB. A 12,000-photo event = 36 GB. Cost: $0.54.
Lambda compute (presigned URL generation): ~100ms per photo. Cost: ~$0.000001 per photo.
Rekognition (face detection + embedding): $0.001 per image. 12,000 photos = $12.
Bandwidth (realtime updates): ~1 KB per update × 400 attendees = 400 KB per photo. High but manageable.
Total cost per 12,000-photo event: ~$15. We charge $55 per event.

The hard parts (that nobody sees)

Duplicate detection (same photo uploaded multiple times from different phones).
Rekognition failures (low-light photos, extreme angles, masks). Graceful degradation is critical.
Vector version management (if we upgrade the embedding model, old vectors don't match new ones).
Rate limiting (one spammy user shouldn't block the entire event's processing queue).

See the system in action.

Try Picsaris →