# The Robot Cookbook → a "Cook From What You Have" platform

**Working spec & strategy — written 2026-06-09.**
Audience: you (non-technical founder). I've kept the plain-English explanation up front in every section and put the engineering depth below it, so you can read the top of each section and stop, or keep going.

---

## 0. The one-sentence idea

> **Tell it what's in your kitchen → it shows you what you can cook right now (and what you're one or two items short of) → it builds the shopping list and hands it to a grocery service.** Any cookbook — ours, yours, or the whole web — can plug in as a recipe source.

The **Robot Cookbook (132 recipes) is just the first source.** The product is the pipeline around it.

---

## 1. What exists today (already built & tested)

All of this lives in `recipe-cookbook/explore/` — **standalone, never touches the live site**, and runs as plain static files (no server yet).

| File | What it is | Status |
|---|---|---|
| `assets/ingredients.js` | The **ingredient brain**: turns 747 messy ingredient spellings into 554 canonical ingredients, and parses free-text lines like "2 cloves garlic, minced" → `garlic`. | ✅ done, tested |
| `explore/app.html` | **The unified flow** (this spec's centerpiece): multi-source recipe pool, "what's in your kitchen" input, live make-now / 1–2-away ranking, smart nudges, shopping basket, **"bring your own recipes"** importer. | ✅ done, tested (0 console errors) |
| `explore/kitchen.html` | Earlier single-purpose "cook from your cart" page (superseded by `app.html`). | ✅ |
| `explore/search.html`, `galaxy.html`, `visualize.html` | Ingredient filter, and two visualizations (galaxy was deprioritized). | ✅ |
| `analysis/` | Overlap/pantry report + cluster data. | ✅ |

**Proof the platform idea works:** in testing, a recipe pasted in **schema.org format** ("Garlic Pasta", with `recipeIngredient: ["2 cloves garlic", "1 lb spaghetti", "salt"]`) was ingested, its ingredients normalized to `garlic, spaghetti, salt`, indexed, and became fully searchable/matchable **in the same engine as the Robot Cookbook**. That's the whole thesis in one test: *new source → same pipeline → one experience.*

---

## 2. Is this already a thing? (Honest competitive read)

**Yes — the core loop is well-trodden. The *combination* is not.** Be clear-eyed about this.

| App | "Cook from what you have" | Recipe import / aggregation | → grocery cart | Money |
|---|---|---|---|---|
| **SuperCook** (since 2009) | ★★★★★ core; "1–2 away" mechanic; voice/barcode | Indexes ~millions of recipes from 2,000+ sites — **links out, doesn't host** | Weak | Free; ads + affiliate |
| **Cooklist** | ★★★★★ auto-pantry via **loyalty-card sync** | Aggregates + your own | Strong; pivoting to "agentic grocery commerce" | Freemium / B2B |
| **Samsung Food** (ex-Whisk) | ★★★☆ pantry behind Premium | **Save from any site**; 240k+ recipes | One-click to **23 retailers** | Food+ $6.99/mo |
| **Mealime / Paprika / AnyList / Bring!** | ★★ (list/plan-first) | Paprika/AnyList = great web clippers | Most route to Instacart/Walmart/Kroger | $2–6/mo or one-time |
| **NYT Cooking / Cookpad** | ★ walled gardens | Own recipes only | weak | Subscription |
| **Yummly** | — | — | — | **SHUT DOWN Dec 2024** (Whirlpool) |

**Takeaways that matter for us:**
- **The idea is not novel.** SuperCook and Cooklist already do ingredient-based discovery + missing-items lists. Don't pitch this as "no one's done it."
- **No one cleanly does all four of our pillars at once:** (1) flexible input *including photo*, (2) "1–2 away" matching, (3) cross-source aggregation (built-in + user upload + open web import) in one clean reader, (4) fast one-click cart.
- **Real, narrow gaps to exploit:** photo/receipt input is rare and immature; SuperCook *links out* instead of giving a clean unified reader; the walled gardens trap you; most are ad-heavy.
- **A cautionary fact:** **Yummly died despite a $100M acquisition; Cookpad revenue is shrinking.** Standalone *recipe content* is a weak business. The value is in the **commerce loop** (the cart), which reinforces our cart-centric thesis — but warns us not to bet on "recipes alone."

**Bottom line:** we're entering a real category with beatable (not dominant) incumbents — SuperCook gets only ~600K visits/month. Our edge has to be **execution + breadth of input modes + a clean unified experience**, not a secret idea.

---

## 3. What's actually defensible (the moat question)

Blunt answer: **there is no strong "moat" at the idea level.**
- Recipes are largely **not copyrightable** (US law: a bare ingredient list + functional steps isn't protected — only the prose/photos/headnotes are). So no one, including us, "owns" recipe data.
- Instacart is a **shared rail** anyone can integrate.
- Pantry-matching algorithms are easy to replicate.

What you *can* build that's durable, in priority order:
1. **Per-user data & habit** — once someone's pantry, preferences, saved recipes, and household live in your app, *their* switching cost is high. (This is Cooklist's bet.) **Design to accumulate this from day one.**
2. **A trusted, clean, ad-free curated corpus** — the Robot Cookbook is the seed; a consistent, well-formatted reading experience is a UX moat SuperCook lacks.
3. **Breadth + smoothness of input** (type / tap / photo) — the freshest wedge, cheap to build now that vision models are cheap.
4. **Speed-to-cart** — most apps stop at a list; a tight 2-tap checkout is an execution moat.

These are **execution moats** (you win by out-shipping), not lock-in moats. That's normal for consumer apps and totally viable — just don't expect the idea itself to protect you.

---

## 4. Architecture: how "plug anything in" actually works

The key design decision — and it's already implemented — is **one common recipe shape, with a thin "adapter" per source.** Everything downstream is source-agnostic.

```
  SOURCES ───────────────►  [adapter] ──► COMMON RECIPE SCHEMA ──► normalize ──► MATCH ENGINE ──► UI flow ──► CART
  • Robot Cookbook (our JSON)     │            { title,                 (ingredients.js)   (coverage,    (browse/      (Instacart
  • User paste / .json upload     │              ingredients[],                            ranking,      kitchen/      payload)
  • Recipe URL (schema.org)       │              steps[], url, … }                         nudges)       basket)
  • TheMealDB / Spoonacular  ─────┘
  • Photo → vision model
```

### 4.1 The common recipe schema
Every source is mapped to exactly this object (it's what `app.html` already uses):
```jsonc
{
  "id": "robot:42",            // <sourceId>:<index>
  "title": "Butter Chicken",
  "creator": "Future Canoe",
  "ingredients": [ { "item": "2 cloves garlic, minced", "quantity": "", "notes": "" }, … ],
  "steps": ["…"],
  "url": "https://…",
  "tags": ["indian","curry"],
  "feasibility": "moderate",   // optional
  "_src": "robot", "_srcName": "The Robot Cookbook"
}
```
`item` is allowed to be **raw text** ("2 cloves garlic, minced"). The engine cleans it at index time — so adapters can be dumb, which is what makes new sources cheap.

### 4.2 Adapters (each is ~10–30 lines)
- **Our JSON** — pass-through.
- **schema.org `Recipe` JSON-LD** — `name→title`, `author.name→creator`, `recipeIngredient[]→ingredients`, `recipeInstructions→steps`. *(built)*
- **Plain text** — first line = title, remaining lines = ingredients. *(built)*
- **TheMealDB** — map `strIngredientN`/`strMeasureN` pairs. *(Phase 2, trivial)*
- **Photo** — vision model returns our schema directly. *(Phase 2)*

> **Why this matters:** adding "import from AllRecipes" or "import TheMealDB" later is *writing one adapter function*, not re-architecting. That's the platform.

---

## 5. The ingredient brain (`ingredients.js`)

**Plain English:** recipes spell the same thing 10 different ways ("dark brown sugar", "light brown sugar", "brown sugar, packed"). If we don't unify them, matching is garbage. So we squash every spelling down to one canonical name.

**How it works (pipeline per ingredient string):**
1. lowercase; strip parentheticals `(…)`, comma-tails, `/`-alternatives, trailing "for drizzling".
2. strip leading amounts/units ("2 cloves", "1½ cups", "1 lb") — but never the last word, so the spice **"cloves"** survives while "2 cloves garlic" → `garlic`.
3. drop descriptor words (fresh, chopped, large…) and filler.
4. singularize; apply a hand-tuned **synonym map** (mayo→mayonnaise, scallion→green onion, kosher salt→salt, …).

**Result today:** 747 raw strings → **554 canonical ingredients**; the big merges (salt 54, garlic 33, egg 33) are correct.

**Limits & how it scales (important honesty):**
- It's a **heuristic + hand-tuned map**, not AI. At 132 recipes that's ideal (fast, deterministic, inspectable). At 100k recipes the long tail of weird ingredients grows and the hand map can't keep up.
- **Scaling path:** (a) grow the synonym map from real data (log unmatched items, promote frequent ones); (b) back it with a canonical **ingredient database** (e.g. USDA FoodData Central / Open Food Facts) for synonyms & categories; (c) for the stubborn tail, an **embedding model** maps "san marzano tomatoes" near "tomato" by meaning. Keep the cheap deterministic path for the 95% and only reach for embeddings on misses.

---

## 6. The matching engine + Big-O (does it scale?)

**Plain English:** "what can I cook" = for each recipe, how many of its ingredients do you have? This is cheap at our size and stays cheap with one well-known trick at large size.

Let **R** = #recipes, **Ī** = avg ingredients/recipe (~10), **P** = pantry size, **I** = #distinct ingredients.

### 6.1 What we do now (naive, and fine for a long time)
- **Build index** (once per source change): for each recipe, compute its set of canonical ingredients → `O(R · Ī)` time and memory. *(This is `buildIndex`.)*
- **Recompute coverage** (every time the pantry changes): loop all recipes, check each ingredient against the pantry `Set` (O(1) lookups) → **`O(R · Ī)`**.
  - At R=132: ~1,300 operations per keystroke — **instant**.
  - This stays smooth in the browser up to roughly **R ≈ 30,000–50,000** recipes.

### 6.2 The trick for scale (incremental inverted index)
When R gets large, don't re-scan everything on each change. Keep:
- an **inverted index** `ingredient → [recipes that use it]`,
- a per-recipe counter `missingCount[recipe]` (how many of its ingredients you still lack).

When you **add** ingredient *g*: for each recipe in `index[g]`, do `missingCount[r] -= 1`; if it hits 0, that recipe just became **makeable**. Work done = number of recipes that use *g* (its "postings"), **not** all recipes. Removing an ingredient is the mirror image. This makes a pantry edit cost **proportional to how popular that one ingredient is**, typically tiny — so it scales to **millions of recipes**.

- **Search**: substring scan is `O(R · textlen)` (fine now). At scale, use a real text index — client-side prebuilt index for tens of thousands, or **server-side full-text** (Cloudflare D1's SQLite FTS5, or a search service) beyond that.
- **Clustering** (the analysis report): all-pairs Jaccard is `O(R²·Ī)` — fine for 132, **quadratic and bad** for large R. It's an **offline batch job**, and at scale you'd switch to approximate methods (MinHash/LSH). Not on the hot path, so it never blocks the app.

### 6.3 When to move matching off the browser
- **< ~50k recipes:** do everything client-side (where we are). Zero backend cost, instant.
- **> ~50k, or shared/account data:** move the index + matching server-side (a Worker holding the inverted index, or D1 queries), return only the page of results the user sees. The algorithm is identical — it just runs on the server.

**Verdict:** the engine is not a scaling risk. The naive version covers you to tens of thousands of recipes for free, and the incremental-index upgrade is a known, bounded piece of work when you need it.

---

## 7. The UX flow (one understandable journey)

The whole app is **one page that reads like a story** — no separate "tools."

```
┌─────────────────────────────────────────────────────────┐
│  🤖 What can I cook?         [ search… ]      📚 Sources  │
├─────────────────────────────────────────────────────────┤
│  1 · What's in your kitchen?                              │
│    [ paste a receipt / list ]  [Add]  [Sample] [📷 Photo] │
│    tap staples: (salt)(eggs)(garlic)(rice)(soy sauce)…    │
│    have: ⟨spinach⟩ ⟨chicken⟩ ⟨garlic⟩ …   clear           │
├─────────────────────────────────────────────────────────┤
│  [12 make now] [18 one-two away] [10 in kitchen]          │
│  💡 Grab mirin and you unlock 4 more recipes  [+ add]     │
│                                                           │
│  ✅ You can make these now (12)                            │
│   [card][card][card] …  (✅ you have everything)          │
│  🛒 One or two items away (18)                            │
│   [card · need: mirin]  [card · need: 2]  …  [+ add to list]│
├─────────────────────────────────────────────────────────┤
│  🧺 9 to buy   soy sauce, mirin…   [View list][→ Instacart]│ ← sticky
└─────────────────────────────────────────────────────────┘
```

**Each capability is a step, not a separate app:**
- **Browse** = the default state (no pantry yet) — just your recipes.
- **Search** = the box; filters the same pool live.
- **"What can I cook"** = happens the instant you add ingredients; cards re-sort into make-now / 1–2-away.
- **The analysis** (staples, unlock counts) = surfaces as the **💡 nudge** in context, not a separate report.
- **Shopping** = the sticky basket that fills as you tap "add to list"; **Instacart/Amazon** at the bottom.
- **Sources** = one modal: toggle cookbooks on/off, **add your own** (paste / upload / — URL & photo in Phase 2).
- **Persistence**: your pantry and your added cookbooks are saved in the browser (`localStorage`) so they're there when you come back. (Becomes a real account in Phase 2.)

**Edge cases handled:** receipt junk (codes, prices, "KS/ORG", units) stripped; unrecognized lines listed honestly ("Didn't recognise: paper towels"); non-food ignored; empty states guide the user; messy imported ingredient strings normalized at match time.

---

## 8. Phase 2 — the backend (one small thing unlocks four big features)

**Plain English:** a website can't keep secret keys or run an AI model or fetch other websites from inside your browser (security rules forbid it). So the magic features need a tiny "kitchen out back" — a small serverless function. **One little backend unlocks all four** of: photo parsing, URL import, paid recipe APIs, and the real Instacart cart.

### 8.1 Where it runs
You're already on **Cloudflare Pages**. Cloudflare **Pages Functions** (same as Workers) are the natural fit — a `/functions/api/*.js` folder deploys automatically with the site. Free tier: **100,000 requests/day**; paid is a flat **$5/mo**.

**Storage** (when you add accounts/saved data):
- **D1** (Cloudflare's SQLite) — users, saved recipes, pantries. Free tier ~5M row-reads/day.
- **R2** (object storage) — raw uploaded photos. Cheap, **free egress**.
- **KV** — cache of parsed recipes keyed by URL, so you never parse the same page twice.

### 8.2 The four endpoints (skeletons are in `explore/backend/`)

| Endpoint | Does | Needs | Cost |
|---|---|---|---|
| `POST /api/parse-photo` | Photo of pantry/receipt/cookbook page → structured items/recipe via **Claude vision** | Anthropic API key | **~$0.008 / photo** (Haiku 4.5) |
| `POST /api/parse-text` | A **sloppy cookbook text dump** (many recipes, prose) → clean structured recipes via Claude | Anthropic API key | **< 1¢ / page** (text only) |
| `POST /api/import-youtube` | A **YouTube cooking video** → recipe (pull transcript+description → Claude) — the original Robot Cookbook pipeline. ⚠️ transcript fetch is fragile server-side | Anthropic API key | ~1–2¢ / video |
| `POST /api/import-url` | Fetch a recipe URL server-side, parse **schema.org JSON-LD** → common schema | — (just fetch) | ~free (compute only) |
| `POST /api/instacart` | Send the missing-items list to Instacart's **"Create shopping list page"** API → returns a pre-filled cart URL | Instacart dev key | API free; you *earn* affiliate $ |
| `GET /api/recipes?have=…` | (optional) live lookups to TheMealDB/Spoonacular | TheMealDB free / Spoonacular key | TheMealDB free |

The app already calls `parse-text`, `import-youtube`, and `instacart` and degrades
gracefully ("needs Phase 2") until keys are set — so turning them on is *only* adding
keys + deploying, no front-end work. A free no-LLM **blank-line cookbook paste** and the
**live TheMealDB** source already work today with no backend at all.

### 8.3 Two terms-of-service facts that shape the design
1. **Spoonacular & Edamam forbid storing their recipes** (cache ≤1 hour, delete on access loss). So treat them as **live lookups only** — never your database. Your *persistent* content must come from **(a) user URL imports, (b) user photo parses, (c) TheMealDB (permissive), (d) our own cookbook.** This is a real architectural constraint, not a nicety.
2. **Recipe import is legally low-risk if done right:** ingredient lists + steps aren't copyrightable, but prose/photos are, and site ToS still apply. So: store **only structured fields, attribute + link back, and import only when the user gives you the URL** (one at a time, like Paprika/AnyList) — *not* mass server-side crawling (SuperCook's riskier model).

### 8.4 The Instacart specifics
- `POST https://connect.instacart.com/idp/v1/products/products_link` with `{ title, link_type:"shopping_list", line_items:[{name}] }` → returns `products_link_url` (the pre-filled cart). *(Our app already builds this exact payload.)*
- Auth: Bearer dev key. **Dev environment works immediately; production access ~30–40 days** via their approval.
- Affiliate via **Impact**: published rates are **up to $10 per new-customer order (CPA)** and **up to 15%** (influencer); Developer-Platform partner rates are negotiated. Amazon's grocery affiliate is only **1%** — Instacart is the better earner.

---

## 9. Costs & unit economics (rough, verified ranges)

Assume each active user/month ≈ 10 photo parses + ~30 URL imports + light browsing; content from URL-import + TheMealDB (so **no** Spoonacular line in the base case).

| | 100 users | 1,000 users | 10,000 users |
|---|---|---|---|
| Cloudflare (compute + storage) | $0 (free) | ~$5/mo | ~$8–13/mo |
| **Claude vision** (≈$0.008 × 10/user) | ~$8 | ~$80 | ~$800 |
| Recipe data (URL + TheMealDB) | $0 | $0 (+£10 once for MealDB v2) | $0 |
| Instacart API | $0 (earns affiliate) | $0 | $0 |
| **≈ Monthly total** | **~$8** | **~$85** | **~$810** |

**The cost driver is photo parsing**, and it scales linearly with photos. **Levers:** cache parses (never re-parse the same image/URL), use the **Batch API** (−50%) for non-urgent parses, keep the cheap **Haiku** model. Everything else is roughly free until real scale.

**Revenue sketch:** at 10k users, infra ≈ $810/mo. To break even you'd need, e.g., ~81 new-Instacart-customer orders/mo at the $10 CPA — plausible but not guaranteed; affiliate income only really compounds at scale. **Don't expect affiliate to pay the bills early.** A small **Premium tier** (unlimited photo scans + imports + meal plan, ~$3–5/mo, benchmarked against Mealime/Samsung Food) is the realistic early revenue, with affiliate as upside and B2B/retail-media as the long-game ceiling.

---

## 10. Risks (and how we blunt them)

1. **Instacart dependency (highest risk).** It's a negotiated, revocable rail — and platforms *do* close APIs (Quizlet stopped issuing keys; Netflix killed its public API in 2014). **Mitigation:** abstract the cart layer (we already output a generic list); support **multiple exits** (Walmart, Kroger, Amazon Fresh, plain copy-paste) so Instacart is never the *only* door. Our basket already degrades gracefully to "Copy list."
2. **Scraping / ToS.** Import **only user-supplied URLs**, store structured fields, attribute + link, respect robots.txt. Avoid bulk crawling.
3. **No data moat.** Accept it; compete on UX, input breadth, speed-to-cart, and **accumulated per-user data**. Design for that data from day one.
4. **Category is hard to monetize** (Yummly died, Cookpad shrinking). **Mitigation:** lead with the **commerce loop**, keep costs near-zero until traction, don't over-invest in content.
5. **Vision-parse accuracy** (messy receipts/handwriting). **Mitigation:** Haiku first, escalate to Sonnet only on low confidence; always let the user correct parsed items (the app already shows recognized vs. unrecognized).

---

## 11. Roadmap

**Phase 1 — DONE (static, no backend, no cost):**
the unified `app.html` flow + multi-source engine + "bring your own recipes" (paste/upload) + pantry persistence + copy/Instacart-payload/Amazon-search. Proves the platform with the Robot Cookbook as source #1.

**Phase 2 — the one small backend (unlocks the magic):**
a Cloudflare Pages Functions layer (skeletons in `explore/backend/`) →
- 📷 **photo → items** (Claude vision) — the headline feature,
- 🔗 **URL import** (schema.org) — "add any recipe on the web,"
- 🛒 **real Instacart cart** (one click),
- (optional) **TheMealDB** live source for instant breadth.
Plus: a real **account + saved pantry/recipes** (D1) so data follows the user across devices.

**Phase 3 — grow & differentiate:**
multi-retailer cart (Walmart/Kroger), household sharing, meal-planning, a **Premium** tier, smarter normalization (embeddings on the tail), and — only at audience scale — CPG/retail-media B2B.

**What I'd do next:** stand up Phase 2's **photo-parse + Instacart** endpoints first (the two "wow" features), behind your free dev keys, since the rest of the app is already built to receive them.

---

## Appendix A — file map
```
recipe-cookbook/
  index.html                      ← LIVE site (untouched)
  recipes.json                    ← source #1 data (132 recipes)
  assets/ingredients.js           ← the ingredient brain (shared)
  analysis/                       ← overlap report + clusters
  explore/                        ← all review work (not deployed)
    app.html                      ← ★ the unified flow (centerpiece)
    kitchen.html / search.html / galaxy.html / visualize.html / index.html
    backend/                      ← Phase-2 serverless skeletons + README
    PLATFORM.md                   ← this document
```

## Appendix B — sources (researched 2026-06)
Cloudflare [Workers pricing](https://developers.cloudflare.com/workers/platform/pricing/) · [D1](https://developers.cloudflare.com/d1/platform/pricing/) · [R2](https://developers.cloudflare.com/r2/pricing/) · [KV](https://developers.cloudflare.com/kv/platform/pricing/) — Claude [API pricing](https://platform.claude.com/docs/en/about-claude/pricing) · [vision](https://platform.claude.com/docs/en/build-with-claude/vision) — [Spoonacular pricing](https://spoonacular.com/food-api/pricing) · [Edamam](https://developer.edamam.com/edamam-recipe-api) · [TheMealDB](https://www.themealdb.com/api.php) — [recipe-scrapers](https://docs.recipe-scrapers.com/) · [schema.org/Recipe](https://schema.org/Recipe) — Instacart [create shopping list page](https://docs.instacart.com/developer_platform_api/api/products/create_shopping_list_page/) · [affiliate](https://company.instacart.com/affiliate) · [approval](https://docs.instacart.com/developer_platform_api/guide/concepts/launch_activities/approval_process/) — competitors: [SuperCook](https://www.supercook.com/) · [Cooklist](https://cooklist.com/) · [Samsung Food](https://samsungfood.com/) · [Yummly shutdown](https://thespoon.tech/whirlpool-lays-off-entire-team-for-cooking-and-recipe-app-yummly/) — recipe [copyright (Copyright Alliance)](https://copyrightalliance.org/are-recipes-cookbooks-protected-by-copyright/) — API precedent: [Quizlet](https://github.com/joequery/quizlet/issues/4) · [Netflix](https://techcrunch.com/2014/06/13/netflix-api-shutdown/)
