Product · Engineering

From iPhone to gallery in 8 seconds: the tech behind Picsaris

A guest taps the camera icon. Selects a photo from their library. Hits "share." Eight seconds later, the photo appears in the shared event gallery and the face-matching algorithm starts running. For guests in the moment, it feels instant. Behind the scenes, we're orchestrating 400 operations across three cloud services without blocking the UI. Here's how.

The architecture (10,000 foot view)

The upload flow (what actually happens in those 8 seconds)

Step 1: Client-side prep (0.5 seconds)

The app receives the photo from the camera roll:

Step 2: Backend grants access (0.1 seconds)

Our API validates the request:

All of this happens in a single Lambda invocation, sub-100ms.

Step 3: Direct upload to storage (3–5 seconds, depends on network)

The app now uploads directly to Cloudflare R2 using the presigned URL:

Step 4: Notify backend (0.1 seconds)

Once upload completes, the app makes a final callback:

Step 5: Asynchronous face matching (starts immediately, completes in 2–10 seconds)

Now the async work begins:

Why it feels instant (even though it's async)

Progressive UI updates

The frontend doesn't wait for face processing. After the upload completes:

Shared gallery updates in real-time

We use websockets (via Supabase Realtime) to push updates:

The performance tricks

Presigned URLs (don't upload through our servers)

Standard naive approach: app → our API → R2. Adds 2× latency, burns our bandwidth.

Our approach: app → R2 directly. Our API only handles metadata (sub-100ms). The photo itself never touches our servers.

Format detection on-device

iPhone captures HEIC by default. Converting to JPEG on our servers would block the upload flow and waste cloud compute. We do it on the phone using native APIs (instant).

Parallel Rekognition calls

If a photo has 8 faces, we don't call Rekognition 8 times. One call, one response with 8 vectors. Batch processing.

Vector caching

Every sign-up vector is cached in memory on the matching Lambda. On the first request, we warm the cache. Subsequent matches are microseconds (just distance calculations).

Batch queue processing

If 100 photos upload simultaneously (common at events), we don't spawn 100 Lambda invocations. SQS groups them into batches of 10. Process 10 at a time, scale down when done.

The data flow (in one sentence per step)

  1. User picks photo from camera roll.
  2. App resizes and converts to JPEG locally.
  3. App requests presigned URL from our API.
  4. API validates user, generates R2 presigned URL, creates photo record.
  5. App uploads directly to R2 using presigned URL.
  6. App notifies our API: "Upload complete."
  7. API enqueues photo for processing.
  8. Lambda worker downloads photo from R2.
  9. Lambda calls Rekognition: detect faces, extract vectors.
  10. Lambda compares vectors against all attendee sign-up vectors (cached in memory).
  11. Lambda writes matches to database.
  12. Realtime event fires: photo appears in shared gallery + attendees' personal galleries.

The cost math

The hard parts (that nobody sees)

See the system in action.

Try Picsaris →