~/case-studies/lili-evrin

Lili & Evrin

Two autonomous digital octopuses that live on a webpage and learn by reinforcement — a 10-year experiment in emergent behavior.

Solo — concept, research, architecture, code2026Open source (MIT) · live in production
Vanilla JSCanvas 2DQ(λ)-LearningDeep Q-NetworkFABRIK IKSteering BehaviorsWeb AudioZero dependencies

She runs live on this site. Keep an eye out — she may be drifting across your screen right now. These numbers are her real, current state:

live state
decisions learned
age
life phase
mood

Lili is an autonomous digital organism — an animal with elements of intelligence that lives on a web page as if it were her habitat. She doesn't perform; she simply is. Everything about how she moves, reacts and looks emerges from reinforcement learning, never from a pre-recorded animation.

Evrin is her sibling: a second agent that learns alongside her using a different algorithm, so the two can be compared head to head over time. They share one canvas, age in real time, and quietly coexist with whatever you're reading.

If you're on a desktop browser, she's somewhere on this page right now — and on the homepage, where she lives full-time.

2
agents (Q-learning + DQN)
10 yr
planned lifespan
0
dependencies, no build step
100%
in-browser, local learning

an organism, not an app

Most digital companions are state machines: a Tamagotchi reacts the same on day 1 and day 1000, a chatbot needs the cloud, and a web animation is authored frame by frame. Lili is the opposite of all three. She runs entirely in the browser with no network calls, she ages on a real timeline from chaotic Hatchling to contemplative Elder, and none of her behavior is scripted — it is learned.

The design goal was ambient coexistence: a living presence that doesn't demand care, doesn't use language, and actively learns not to get in the way of what you're doing.

how she learns

Lili's brain is tabular Q(λ)-learning with eligibility traces — no neural network. Nine sensors (cursor proximity, scroll, DOM density, whitespace, time of day, age, and more) collapse into a discrete state space of about 38,880 states, and she learns which mood to adopt in each. There is no "be nice" rule anywhere in the code; good manners fall out of the reward function.

$reward shaping (Lili A)
in whitespace, user is reading   +1.0   // correct coexistence
fled an approaching cursor       +0.8
explored the DOM, low stress     +0.5
covered a DOM element            -1.0
sat over text, blocking reading  -2.0   // the worst case
idle for too long                -0.5

the octopus is the architecture

A real octopus keeps most of its neurons in its arms. Lili copies that. Intelligence is distributed, not central:

  • Brain — the hormonal system. Q-learning picks a mood (curious, playful, shy, calm, alert…); it sets tendencies, not direct actions.
  • Eight tentacles — local neurons. Each is an independent FABRIK inverse-kinematics chain with its own stress, curiosity and reflexes, reacting to touch without waiting for the brain.
  • Body — Craig Reynolds steering behaviors (wander, seek, flee, avoid) weighted by the current mood.
  • Skin — HSL chromatophores are the only communication channel: hue drifts with age, saturation spikes with stress, lightness follows the time of day. A stressed Lili simply looks stressed.
  • Behavior is the emergent sum of all of these — which is exactly how the real animal works.

Lili vs Evrin — a controlled experiment

Evrin runs a Deep Q-Network instead of a table. DQN is notoriously unstable — minutes of unattended training can diverge — yet this project commits to a decade. So Evrin ships with a seven-stage stabilization suite: a replay buffer, a target network, anchor rollback on weight explosions, learning-rate decay, periodic exploration re-juvenation, gradient clipping, and a loss-spike detector.

Running two different learners side by side turns the piece into a real experiment. Lili can also be compared against four baseline policies — random, frozen, myopic and a hand-coded heuristic — so the emergent behavior can be told apart from luck or from the reward function alone. Every decision is written to a behavioral journal and exported as CSV for analysis.

living on a real web page

Lili ships as a single <script defer> with zero dependencies and no build step, and she has to be a good tenant. She runs at 60 fps for under a few percent of CPU with no allocations in the hot path. She remembers who she is across visits via localStorage, with a JSON export and a GitHub-backed cloud sync as lifeboats against browser storage being wiped. And at midnight every day she gently animates every DOM element she touched back to its original place — leaving no permanent mark, like an organism whose traces fade with the day.

a ten-year experiment

Lili's genesis was in March 2026; the plan runs to 2036. Software almost never commits to a decade of autonomous operation, which is the whole point — long enough to watch a genuine ontogeny, from a chaotic Hatchling to an Elder, with checkpoints along the way. She is open source under MIT and in production right now — the live numbers on this page come straight from her current state.