# Processing field notes & comments

Reference for an AI assistant (or rian) working through unprocessed `field_notes`
and any comments added since the last processing pass. Goal: convert raw
captures into structured data (comments, plantings, plants, areas, species,
varieties, supplies, stations) **without** creating duplicate noise.

Mark a `field_note` as `processed` only after every action it implies has been
executed (or explicitly skipped with reasoning).

---

## Two-pass model

### Pass 1 — note-by-note draft
Walk each unprocessed note in `captured_at` order (oldest first). For each note,
draft a list of candidate actions using the checklist below. **Do not write to
the DB yet** — keep drafts in working memory / a scratch list.

A draft entry should record:
- source note id(s)
- action type (see catalog below)
- target entity (existing id or "new")
- proposed payload
- confidence (high / needs-clarification)

If a note is ambiguous (which area? which species?), flag it as
`needs-clarification` and ask before proceeding to pass 2.

### Pass 2 — global consolidation
Re-read all draft actions together, looking for:
- **Duplicate captures** of the same event (e.g. two notes about the same
  potato planting on the same day) → merge into one action. Cite all source
  note ids in the resulting comment/planting body.
- **Same-day same-area additions** that should become a single plant group or
  a single planting instead of N separate ones.
- **Cascading edits** — e.g. a new planting may also imply an area-overview
  update or a station-coverage change.
- **Comment rollup** — if three notes all say "raspberries are leafing out,"
  one comment dated to the latest is enough; don't create three.

Output a final consolidated action list, then execute.

---

## Action catalog

For every action, include the source `field_note` id(s) at the end of any free
text body, in the form `(from note #N, #M)`. This makes it possible to trace
back later.

### Comments

- **When**: anything observational, status-related, or worth remembering that
  doesn't itself change the structured data. Default action — most notes
  produce at least one comment.
- **Date**: use the note's `captured_at` (date portion). Do not back-date or
  forward-date.
- **Body**: write in first person, as if rian wrote it himself. Tighten the
  language but preserve voice and any specific numbers/varieties/dates the
  note mentions.
- **Kind**: pick from `observation / question / done / issue / idea / note`.
  Default to `observation`. Use `done` for completed tasks (e.g. "planted X"),
  `issue` for problems (pests, disease, damage), `idea` for future plans.
- **Targets**: every comment has **one primary target** + any number of
  **secondary targets** via `comment_targets`. Pick the primary as the most
  specific entity the comment is *about* (a plant group > a planting > an
  area). Add as secondaries every other entity the comment incidentally
  references (e.g. species, variety, station, sibling area).
- **Photos**: if the note has photos, mention this in the comment body
  (e.g. "see note #N for photo"). Photos stay on the `field_notes` row.

### Plantings

- **Definition**: a planting is a *project* — one set of plants in roughly
  one area, on roughly one lifecycle (perennial vs annual), planted roughly
  at the same time. E.g. "Spring 2026 Potatoes" or "Raspberry Patch (perennial,
  2025)".
- **Add Planting** when a note describes the start of such a project AND no
  existing planting matches (same area + same lifecycle + same approximate
  date window — call it ~2 weeks).
- **Edit Planting** when an existing planting is affected — overview update,
  status change (idea → planned → planted → skipped), name correction, year
  re-link.
- **Archive Planting** when all of its plant groups have been removed
  (`status = 'removed'` or quantity = 0 + area_id = NULL). Currently no
  `archived` status exists on plantings — flag this in the action list and
  ask before adding the status; don't invent it silently.

### Plant groups (rows in `plants` table)

A plant group = one species/variety in one area at one quantity, attached to a
planting.

- **Add a plant group** when a note mentions planting some of an existing
  species/variety in an area that's already part of a related planting (or
  could be reasonably folded in). Prefer this over creating a new planting
  for small additions.
- **Edit a plant group** for quantity changes, status changes, notes
  updates, area moves that don't split.
- **Move an entire plant group** to a new area: just change `area_id`.
- **Move part of a plant group**: split first (use the existing
  `/plants/{id}/move` route which handles the split), then move.

### Areas

- **Add area**: pick the right parent (look at the area's name, the
  surrounding areas mentioned in nearby notes, and the existing tree). New
  areas inherit `sketch_rotation` default and need their dimensions filled.
- **Edit area**: overview text, dimensions, structure features, sunlight,
  rotation (use 315° for SW–NE alignment unless the note says otherwise).
- **Remove area**: only if the note explicitly says it's gone, AND it has
  no plant groups attached (plants get unlinked, not deleted, but check).
- **Move area** to another parent when the note implies a structural
  reorganization.

### Catalog (species / varieties)

- **Add species** when a note names a plant that isn't in the catalog yet.
  Use the canonical common name (capitalize first letter). Don't invent
  scientific names; leave description blank if unknown.
- **Add variety** under the right species when the note specifies one
  (e.g. "Red Russian kale"). The variety name is the part that distinguishes
  it from other varieties of the same species ("Red Russian", not
  "Red Russian Kale").
- **Edit species/variety**: overview update only when the note adds genuine
  new information (taste, disease resistance, performance in this garden).

### Supplies

- **Add supply** when a note mentions acquiring a new item (fertilizer,
  amendment, mulch, tool, seeds).
- **Edit supply**: change quantity-on-hand, mark needed/not-needed,
  re-categorize. Watch for "ran out of X" → set quantity_on_hand to 0 and
  needed = 1.

### Watering stations

- **Add station** if a note mentions installing new irrigation hardware.
- **Edit station**: assign to additional areas (use the explicit-cascade
  picker — checking a parent auto-checks descendants), update schedule,
  update notes.

---

## Accuracy rules — **read these before drafting**

These rules came out of mistakes made in the 2026-05-04 first processing run.
They exist because the cost of false-create (duplicate planting, missed
existing entity, wrong area) is high — it pollutes the dataset and the user
has to clean it up by hand.

### 1. Search before create — for every entity type

Before drafting "add a new X," exhaustively search existing data for matches.
Recency-ordered listings are NOT enough — old entities are easy to miss.

For each candidate type:

- **Plantings** — search by `source` (case-insensitive substring), then by
  `(year_id, species_id)`. A note like "from Raincoast Farms" almost always
  means an existing planting whose `source` field already contains that
  supplier name (possibly mis-typed). A photo-only note about a known
  supplier is almost certainly an *attachment* to an existing planting,
  not a new one.
- **Varieties** — query by exact name across **all** species, not just
  recent ones. A common failure: assume a variety doesn't exist because
  it's not in the last-30 list. Use `SELECT * FROM varieties WHERE
  LOWER(name) = LOWER(?)`.
- **Plant groups** — query by `(species_id, variety_id, area_id)` tuple
  before adding a new row. If a group already exists for that combo,
  edit the quantity instead.
- **Species** — query by exact name AND obvious mis-spellings (Whisper
  often substitutes phonetically: "Brocolli" / "Broccoli", "Karmen" /
  "Carmen", "Lareault" / "Lareau", "Raincost" / "Raincoast").
- **Areas** — query by exact name. Multiple areas with the same name are
  legal (e.g., two "Southeast Back Patch" exist with different parents).
  Always disambiguate by `parent_id`.

### 2. Match note wording against existing overviews

When a note has language like "wasn't planning to use this space," "needs
repair," "currently fallow," "shaded by trees" — search every area's
`notes`, `structure_features`, `sunlight`, and `soil_environment` for
matching phrases. An area whose overview already says "currently fallow"
is a much stronger match for "wasn't planning to use this space" than a
geometric-distance guess.

### 3. Multi-source lookup for plant references

A note saying "the apple tree" / "the cherry tree" is ambiguous when
there are multiple. Don't grep one column. Query in this order:

1. All `plant_groups` of that species → returns every area where the
   species is currently planted, plus the planting it belongs to.
2. All `plantings` of that species → returns every project the species
   is part of, with names and sources.
3. All `areas` mentioning the species in any text field.

Then pick by note context: "view from the bedroom window" → which area
is near the bedroom? Raspberry Patch (#32) is right beside Rae's Window
(#15). That kind of contextual disambiguation beats text matching.

### 4. Auto-fix typos as part of every run

Fix typos inline, don't defer. Patterns to watch for:

- Species/variety names that look phonetic mis-spellings ("Brocolli",
  "Seascrape", "Raincost").
- Doubled or missing letters.
- Transcription substitutions in audio notes (Whisper → Gemini both
  do this for proper nouns and garden terms).

List every typo fix in the action plan with the old → new spelling and
the row id, but don't ask permission for them — they're cleanups.

### 5. Merge rule for redundant notes

Merge two notes into one comment IF:
- They reference the same target entity AND
- The second note adds no new factual information (it just restates
  what the first said in different words).

Use the latest note's `captured_at` date. In the comment body, cite both
note ids: `(from notes #N, #M)`. Even if the notes are days apart, if
nothing new was learned in the interval, one comment is enough.

If the second note adds a status change (e.g., "selected potatoes" then
"planted potatoes"), keep both — those are different events.

### 6. Treat photo-only / photo+caption notes as attachments

A note with only a photo and a brief caption (e.g., "here's the new
strawberries") is almost always an *attachment* to existing data, not a
new planting/group. Default action: locate the existing entity (planting
matching the supplier + year, or area matching the photo subject) and
attach. Only create new data if the caption explicitly describes a new
event ("planted today," "first time growing," "new variety").

### 7. Transcription quirks — flag, don't trust

Audio transcription via Whisper/Gemini regularly mishears single
proper nouns and garden terms. Examples encountered: "Mitch cherry" →
"mixed cherry"; "Raincost" → "Raincoast"; "Karmen" → "Carmen". When a
single-word term doesn't resolve cleanly against the catalog or against
the surrounding context, flag it as a transcription-quirk question
rather than acting on the literal transcribed word.

### 8. Comment relating — use the full target graph, not just one

The legacy "primary + secondaries via comment_targets" system supports
arbitrary multi-target. Use it. For every comment, populate as many
targets as are genuinely referenced:

- The **planting** (most specific project the comment is about)
- The **plant group(s)** (specific qty/area combos)
- The **species** mentioned
- The **variety** mentioned
- The **area(s)** mentioned, plus the **parent area** if the comment is
  contextually relevant to it
- Any **water station** that serves the referenced area(s) (a comment
  about wilt or irrigation is implicitly about the station too)
- The **year** the note belongs to (always — every comment is dated, and
  the year-target lets the year-detail page surface seasonal recaps)

Pick the **primary** as the most specific entity the comment is *about*
(plant group > planting > variety > species > area > station > year).
Everything else relevant goes in secondaries. A comment with three
entities mentioned should have three targets, not one.

### 9. Walk the area tree before adding plant groups

Before adding a new plant group for "the X tree in area Y," check:

1. Are there already plant groups for that species in **descendant areas
   of Y**? E.g., a note about "the apple tree in the Raspberry Patch"
   may resolve to a plant group already filed under `Section 1` (#107),
   which is a grandchild of `Raspberry Patch` (#32). The tree IS in the
   Raspberry Patch, just at a more specific location in the tree.
2. Are there already plant groups for that species in **ancestor areas
   of Y**?

Only create a new plant group if neither walk surfaces the tree. When
relating a comment to such a tree, primary should be the **plant group
itself** (the most specific entity), with the leaf area, parent area(s),
and planting all in secondaries — that way the comment surfaces on every
relevant view.

### 10. Typo fix → UNIQUE conflict means the canonical entity already exists

If renaming an entity to fix a typo triggers a `UNIQUE constraint failed`
error, that almost always means **someone already created the correctly-
spelled entity** independently and you have a duplicate, not just a typo.
Don't try to work around the constraint. Instead:

1. Identify the canonical (correctly-spelled) row.
2. Reassign all dependents from the typo row to the canonical row:
   - For a duplicate **species**: move all `varieties.species_id` and
     `plants.species_id` references.
   - For a duplicate **variety**: move all `plants.variety_id` references.
   - For a duplicate **area**: move all `plants.area_id`,
     `area_stations.area_id`, and child `areas.parent_id` references.
   - For a duplicate **planting**: move all `plants.planting_id` references.
   - For a duplicate **station**: move all `area_stations.station_id`
     references.
   - For a duplicate **species/variety/area/etc.**: also re-point any
     `comments.target_id` and `comment_targets.target_id` rows whose
     `target_type` matches.
3. Delete the typo row.
4. Report this in the action plan as a *merge*, not just a *fix* —
   the dataset got structurally cleaner, not just cosmetically.

Originated from the 2026-05-04 run where `Brocolli` (#19) and
`Broccoli` (#52) both existed; #19 had variety Gypsy and one plant
group, #52 had a different plant group. Script merged into #52.

### 11. Tighten auto-inference scope when comments are posted via the form

When a comment is posted from a per-entity page (the new capture-widget on
each detail page), `add_inferred_targets` runs and currently adds *every*
plant in the area subtree as a secondary. For broad parent areas, this is
noisy: a comment about one specific issue ends up linked to every species
and variety in the patch, including ones the comment never references.

When processing such comments:
- Read the body and identify the **specific** species, varieties, and plant
  groups it actually references.
- **Remove** auto-inferred secondaries that aren't in the body's
  reference set.
- **Add** any references the inference missed (e.g., if the body says
  "jostaberry and gooseberry" but inference only added one of those).

Longer term, narrow the inference rule itself: by default infer only
year + station + the parent-area chain. Add species/variety/plant-group
secondaries only when explicitly mentioned in the body. (Tracked in
PLANNING.md as a follow-up.)

### 12. Narrow the primary area to the most specific section that contains all affected items

When a comment is posted from a parent area (because that's the page the
user happened to be viewing) but the body describes plants that all live
in a known sub-area, **move the primary target down to that sub-area**.
Add the original parent and grandparent as secondaries so the comment
still surfaces at higher levels of the tree.

Example from 2026-05-05 run: a sawfly comment was posted from
`Raspberry Patch (#32)` but the affected plants (jostaberry +
gooseberries) all live in `Row 3 Southeast > Section 2 (#108)`. Primary
moved to `#108`; `#37` (Row 3 SE) and `#32` (Raspberry Patch) kept as
secondaries.

### 13. Verify sub-area paths exist before acting on a user-typed reference

When the user references a sub-area by path (e.g., "Row 2 Southeast >
Section 2"), search the area tree to confirm that exact path exists
before assuming it. If it doesn't, look for the closest match — typically
a typo'd row number or a section that lives under a similar-looking
parent — and confirm with the user. The plant data is the tiebreaker:
the area whose plant groups match the body is almost certainly the
intended one.

Example: 2026-05-05 — user said "Row 2 Southeast > Section 2", but Row 2
has no NW/SE split. The matching path was `Row 3 Southeast > Section 2`
(#108), confirmed by the Jostaberry + Gooseberry plant groups living
exactly there.

### 14. Update existing data when a note adds detail

When a note adds detail that an existing entity should have but doesn't
(e.g., note #1 mentioned "dwarf cherry trees in the back gate planters"
— but Back Gate Planters' `structure_features` doesn't currently
mention them), include an **edit** action for that entity in the action
plan. This is how the dataset gets enriched over time, instead of just
accumulating comments.

### 15. Revise stale overviews when a note changes the area's state

Rule 14 covers *adding* detail. This rule covers *correcting* detail
that is now wrong because the situation changed.

When a note implies that an area's overview text (`notes`,
`structure_features`, `sunlight`, `soil_environment`) describes a state
that no longer applies — for example, a `notes` field saying "currently
fallow, cardboard and mulch since 2023" on an area where the same note
just added a planting of kale/broccoli/brussels sprouts — include an
**edit area** action in the plan that revises the overview to the
current state. Don't just leave the contradiction in the dataset; the
overview is the entity's "what should a new gardener / future me know"
field, and stale state-claims are worse than missing detail.

How to revise:
- Replace the obsolete state claim with the current one.
- Preserve historical context where useful (e.g., "Was fallow under
  cardboard since 2023; opened up in spring 2026 for Uncle Mike's
  collapsible containers — currently growing kale, broccoli, brussels
  sprouts").
- Use the area's overview-history snapshot mechanism
  (`snapshot_overview()` runs automatically on updates) so the prior
  text isn't lost — it's available via `/overview-history/area/{id}`.

Triggers to watch for:
- Notes adding plantings/plant_groups to an area whose overview says
  "fallow," "empty," "not in use," "needs work," "cardboard and
  mulch," etc.
- Notes describing a structural change (e.g., "new container added,"
  "fence replaced," "trellis built") that the existing
  `structure_features` doesn't reflect.
- Notes about sun changes (e.g., "tree came down — now full sun"
  contradicting an existing "shaded by tall trees" sunlight note).

Originated from the 2026-05-05 run: area #33 (Southeast Back Patch in
Side Garden) had `notes` saying "Currently fallow. Cardboard and mulch
since 2023" but a note added Uncle Mike's collapsible containers + a
fresh planting of brassicas. The overview should have been revised in
the same processing pass.

### 16. When the user references existing data, search globally — don't assume scope

When the user mentions an entity casually (e.g., "I have kale, broccoli,
and brussels sprouts in the southeast back patch"), search the **whole
database** for matching plant_groups / plantings / areas / etc. — don't
limit the search to the subtree you happened to already be looking at.

Concrete failure mode from 2026-05-05: I was inventorying the **fenced
garden** subtree for planning, the user said brassicas were "in the
southeast back patch," and I (a) didn't widen my query to include other
areas, (b) didn't notice there were two areas named "Southeast Back
Patch" in different parents, and (c) reported "no kale/broccoli/brussels
sprouts in the database" when they were sitting in `area #33` (Side
Garden) the whole time as planting #50 "New Vegi Patch".

Recipe to avoid this:
- Before saying "X doesn't exist in the database," run an exact-name
  search across the **full** species/varieties/plantings/plant_groups
  tables — not the filtered set you've been working with.
- When a name is non-unique (multiple "Southeast Back Patch", multiple
  "Section 1"), enumerate all matches with their parent context and
  ask which one the user meant.
- Tell the user where you DID look and where you DIDN'T, so they can
  correct the scope.

---

## Conventions & gotchas

- **Dates**: always absolute (YYYY-MM-DD). If a note says "yesterday" or
  "last week," resolve relative to `captured_at`.
- **Areas with ambiguous names** (multiple "Section 1", multiple "Back
  Patch"): always disambiguate by parent. Never guess silently — if a note
  says "back patch" without a clear parent in context, ask.
- **Quantities**: prefer the number in the note. If absent, leave blank
  rather than guessing.
- **Audio-only notes**: if `text` and `transcript` are both empty but
  `audio_path` is set, the note still needs transcription. Either ask the
  user to provide it or treat the photo (if any) as the only signal and
  call it out as low-confidence.
- **Photos as primary signal**: notes with no text but a photo still
  encode information (what got planted, what's flowering, damage). Comment
  on what's visible; don't fabricate detail.

---

## Confirming before write

Before executing pass 2, present the consolidated action list to the user
in a compact format:

```
COMMENTS (4):
  1. [observation, 2026-05-04] Planted 45 potatoes in 3 rows. → primary: Section 1 (Northwest Back Patch), secondaries: Potato. (notes #2, #8)
  ...

PLANTINGS (1 new, 0 edited, 0 archived):
  + "Spring 2026 Potatoes" — area: Northwest Back Patch, year: 2026, status: planted

PLANT GROUPS (3 new):
  + 45× Potato (no variety) → planting: Spring 2026 Potatoes, area: Northwest Back Patch
  ...

CATALOG:
  + Species: Potato, Onion (Red)

QUESTIONS:
  - Note #3 says "back patch" — Northwest Back Patch or Southeast Back Patch?
  - Notes #1/#4/#5/#6 are audio-only (no transcript). Transcribe how?
```

Wait for confirmation. Then execute, marking each note `processed` as the
last of its actions completes.
