Sequential Decision Analytics — Applied to Life

The Life Objective Function

What are you actually maximizing?

Joy, happiness, and regret are different quantities. One is the experienced level. One is the gap between reality and expectations. One is the gap between the policy you ran and the policy that was available.

QTY 01
Joy
$J = A(S_t) + \beta \cdot H_t$
The experienced level. Absolute value state plus happiness scaled by expectation sensitivity.
Maximize
QTY 02
Happiness
$H_t = \sum_i \lambda_i(i_t - E_t^i)$
The expectation gap. Reality minus expectations, dimension-weighted. Drives action when negative.
Feel
QTY 03
Regret
$R = \int (J^* - J)\, dt$
The counterfactual gap between the policy you ran and the best policy available.
Minimize

The Life Metaproblem

Before the framework can maximize anything, you have to choose what you are maximizing. That choice is called the contribution function $C$. The framework optimizes given $C$ — it does not pick $C$ for you. Different choices of $C$ produce entirely different lives, even with identical machinery underneath.

The standard choice for joy. If you are maximizing joy weighted by survival, then $C(S_t, X_t^\pi) = \omega_t \cdot J(S_t)$. The rest of this page assumes this choice.
Other choices are real. If you are maximizing Olympic gold, $C$ pays out a one-time bonus when $h_t$ at age 25 exceeds the threshold. If you are maximizing legacy, $C$ depends on the states of other people, summed across them. If you are maximizing wealth, $C$ collapses to $\lambda_w \cdot w_t$ alone. Each is a different game.
The metaproblem. Choosing $C$ is Step 1 of the Six Steps, applied to life itself. Most people never make this choice consciously. They inherit one — from culture, family, peers — and run optimization inside an objective they never selected. That is how someone wins at life by every conventional measure and still feels hollow. They optimized $V^C$ beautifully. $C$ was just not theirs.
$$V_t^C(S_t) \;=\; \max_{\pi = (f \in \mathcal{F},\, \theta \in \Theta)}\; \mathbb{E}\!\left[\,\sum_{t'=t}^{T} C\!\left(S_{t'},\, X_{t'}^\pi(S_{t'} \mid \theta)\right)\right]$$
$V_t^C(S_t)$ is the value of being in state $S_t$ at time $t$, conditional on having chosen objective $C$. The agent searches over policy $\pi = (f \in \mathcal{F},\, \theta \in \Theta)$. $\theta$ includes the expectation profile $E^*$ as a distinguished sub-parameter: $\theta = (\theta_{-E},\, E^*)$ — called out because expectations enter the contribution in two places (happiness gap $H_t$ and action elasticity), but still just parameters. The contribution function $C = \omega_t \cdot J(S_t, E_t)$ where $J = A(S_t) + \beta H_t$.

The State Vector

$$S_t \;=\; \bigl(\,h_t,\; w_t,\; r_t,\; p_t,\; \omega_t,\; E_t^h,\; E_t^w,\; E_t^r,\; E_t^p\,\bigr)$$
Five real-state variables — health, wealth, relationships, purpose, survival — plus one expectation variable per real-state dimension. Expectations are partly inherited from starting state, partly shaped by policy.
$h_t$   health
drives survival directly through biological pathway
$w_t$   wealth
compounds; influences other variables indirectly
$r_t$   relationships
densest hub in the system
$p_t$   purpose
shapes behavior across all other variables
$\omega_t$   survival
endogenous to $h_t$, $r_t$, and $p_t$
$E_t^i$   expectations
dimension-specific; one per real-state variable

Joy

$$J(S_t, E_t) \;=\; A(S_t) \;+\; \beta \cdot H_t$$
Joy is Absolute Value State plus Happiness scaled by Expectation Sensitivity $\beta \geq 0$. $\beta$ controls how strongly the gap between reality and expectations amplifies or crushes the absolute experience. High $\beta$: the gap dominates. Low $\beta$: you mostly feel what you have.
Two people, two levels of joy. A healthy, well-connected multi-millionaire who just moved in next to Bezos. High $A$ across every dimension — but $H_t$ is deeply negative because his new peer group rewrote his expectations overnight. Now compare a lead foreman at the local factory who has spent twenty years rising through the ranks, consistently exceeding what he thought he would achieve. Modest $A$, but $H_t$ is strongly positive. With high enough $\beta$, the foreman's joy is higher. That is not a paradox. That is the formula.
Stoic practice is deliberate $\beta$ reduction. Not the elimination of happiness or the cultivation of low expectations. The decoupling of lived experience from the gap. A low-$\beta$ person feels what they have.

Absolute Value State

$$A(S_t) \;=\; \sum_{i \in \{h, w, r, p\}} \lambda_i(t) \cdot i_t$$
The weighted sum of your actual state variables — health, wealth, relationships, purpose — independent of what you expected to have. This is what you have, full stop. The $\lambda$s drift with age: kids weight stuff and immediate experience; adults weight relationships and purpose; elders weight legacy and meaning.

Happiness

$$H_t \;=\; \sum_{i \in \{h, w, r, p\}} \lambda_i(t) \cdot \bigl(i_t - E_t^i\bigr)$$
Happiness is the gap between reality and expectations, dimension-weighted. It can be positive or negative. When $H_t > 0$, reality is beating expectations. When $H_t < 0$, expectations exceed reality — dissatisfaction.
The cancer-patient inversion. A recovering patient has positive $H_t$ because reality exceeds their lowered expectations. A healthy person who expected to feel athletic has negative $H_t$ despite a much higher $h_t$. The level of the variable does not determine happiness — the gap does.
Unhappiness drives action. $H_t < 0$ is dissatisfaction. Through the elasticity $\eta_i$, that dissatisfaction converts into policy adjustments that raise $i_t$. You do not want to eliminate unhappiness. You want it calibrated: enough to drive the policy, not so much — scaled by $\beta$ — that it destroys the experience.

The Contribution Function

$$C\!\left(S_t,\, X_t^\pi\right) \;=\; \omega_t \cdot J(S_t, E_t)$$
Each year contributes joy weighted by the probability of being alive to experience it. Joy now depends on both the state $S_t$ and the expectation profile $E_t$ — both of which are in $S_t$, but written explicitly to make the dependence clear.

Survival Transition

$$\omega_{t+1} \;=\; \omega_t \cdot \bigl(1 - \mu(h_t,\, r_t,\, p_t,\, t)\bigr)$$
The mortality rate $\mu$ depends on $h_t$ (biological pathway), $r_t$ (loneliness mortality is robust and not fully mediated by health behavior), and $p_t$ (will-to-live, ikigai, and post-retirement mortality effects). $w_t$ does not enter directly — it routes through $h_t$.

Expectation Transition

$$E_{t+1}^i \;=\; g_i\!\bigl(E_t^i,\; i_t,\; X_t^\pi,\; W_{t+1}\bigr)$$
Expectations update each period. $g_i$ governs how: they adapt toward recent reality ($i_t$), respond to deliberate policy choices ($X_t^\pi$ — Stoic practice, reflection, goal-setting), and absorb exogenous shocks ($W_{t+1}$ — a diagnosis, a peer group change, a windfall that rewrites your sense of what is normal). The fan-of-forecasts in The Lives is $g_i$ made visible over a lifetime.

The Coupling Structure

The transition function $S^M$ encodes how each state variable feeds the others. Strong couplings are robust and unimodal. Mild couplings are smaller. Bimodal couplings, marked with *, can flip sign depending on the quality or type of the source variable, not just its level.

From → To $h_t$ $w_t$ $r_t$ $p_t$ $\omega_t$
$h_t$ mild mild mild STRONG
$w_t$ mild self-loop mild mild
$r_t$ mild STRONG STRONG STRONG
$p_t$ mild mild mild mild

* marks a bimodal coupling. The sign depends on the quality of the source variable, not its level.

The Policy Search

$$\pi^* \;=\; \underset{f \in \mathcal{F},\; \theta \in \Theta}{\arg\max}\; \mathbb{E}\!\left[\sum_{t=0}^{T} C\!\left(S_t,\, X_t^{f,\theta}(S_t)\right)\right], \qquad \theta = (\theta_{-E},\, E^*)$$
Two-level search. (1) Choose a policy class $f \in \mathcal{F}$ — the structure of the decision function. (2) Tune its parameters $\theta$ within that class. The expectation profile $E^*$ is part of $\theta$ — not a separate search variable, but a distinguished sub-parameter worth naming explicitly because it enters the objective in two places: through the happiness gap $H_t = \sum_i \lambda_i(i_t - E_t^i)$, and through the elasticity of action $\eta_i = \partial X_t^\pi / \partial E_t^i$.
What policies actually look like. A policy for maintaining health might be: brush your teeth every day — the parameter is once, twice, or three times. Work out regularly — the parameters are frequency, intensity, and type. These are simple rules. A complex policy is identifying and pursuing a marriage partner — or deciding not to. The parameters include what you value, what you are willing to compromise on, and how long you search. In practice, your full life policy is a large hybrid: simple daily rules layered with long-horizon structural bets, financial projections, and deeply embedded cultural and societal norms you may never have consciously examined.
The explore-exploit trade-off. Searching over policies raises an immediate problem: you cannot evaluate a policy without running it, and running it costs time. Do you stick with what is working (exploit), or try something new that might work better (explore)? The practical answer: you try policies, observe outcomes, and keep what works. You trade off exploration and exploitation across long horizons. Major unexpected life events — a job loss, a diagnosis, a move — often force a new round of policy exploration. That is not a failure. It is the system working as designed. New information about your state should update your policy.

Regret

$$R_{\text{post}} \;=\; \int_0^T \Bigl(J\!\bigl(S_t^* \mid W\bigr) \;-\; J(S_t)\Bigr)\, dt$$
Ex post regret. $S_t^* \mid W$ is the state trajectory under the optimal policy given the same starting state $S_0$ and the same realized world $W$. $S_t$ is the trajectory under the policy actually run. Ex post regret holds the realized world fixed and asks: given everything that actually happened, how far from the best policy did you run? This is hindsight regret. It is real, but it slides easily into bias.
$$R_{\text{ante}} \;=\; \int_0^T \Bigl(J\!\bigl(S_t^* \mid \mathcal{F}_t\bigr) \;-\; J(S_t)\Bigr)\, dt$$
Ex ante regret. $S_t^* \mid \mathcal{F}_t$ is the state trajectory under the best policy given only the information available at time $t$ — the filtration $\mathcal{F}_t$. This holds $W$ fixed to what was knowable at each decision point, not to the outcome. No one should regret not buying Bitcoin in 2010. That was not in $\mathcal{F}_{2010}$. Ex ante regret is the fairer and more actionable quantity for a life framework.
Two properties of both. Regret is policy-level, not decision-level — you do not regret a single bad call, you regret a policy that produced bad calls systematically. And regret holds $W$ fixed in both versions — bad luck is not regret.
The risk metric is part of the policy. Minimizing $\mathbb{E}[R_{\text{ante}}]$ produces one policy. Minimizing $\text{CVaR}_{95}(R_{\text{ante}})$ produces a different one. CVaR$_{95}$ — Conditional Value at Risk at the 95th percentile — is the average regret in the worst 5% of possible life outcomes. It asks: if things go badly, how badly do they go? Optimizing for CVaR$_{95}$ means building a life robust to bad luck, even at the cost of some expected value. Most people implicitly do this and call it being responsible.

Expectation Elasticity

Expectations do two jobs at once. They are a reference point for happiness — $H_t = \sum_i \lambda_i(i_t - E_t^i)$ is the gap, and happiness is what you feel. And they are an input to the policy — high expectations drive action, low expectations keep you still. The two roles point in opposite directions on net welfare.

$$\eta_i \;=\; \frac{\partial X_t^\pi}{\partial E_t^i}$$
$\eta_i$ is the elasticity of action with respect to expectation on dimension $i$. High $\eta$ means raising expectations actually changes what you do. Low $\eta$ means it just makes you miserable without changing behavior.
Per-dimension pattern. $\eta_w$ is high — expectations shape career choices, risk appetite, negotiation. $\eta_h$ is moderate — expectations shape lifestyle, but above a ceiling produce anxiety not action. $\eta_r$ is high but bimodal — can drive vulnerability and repair, or trigger withdrawal, depending on attachment style. $\eta_p$ is very high — people who expect their work to matter find ways to make it matter.
The dilemma. Optimal expectation level depends on $\eta_i$. High-$\eta$ dimensions reward bold expectations because the action gain dominates the gap cost. Low-$\eta$ dimensions reward calibrated expectations because there is no compensating action benefit. This kills two pieces of generic advice at once. "Have low expectations to be happy" optimizes for gap-cost while ignoring action-loss. "Shoot for the stars" applies high expectations uniformly without checking whether the dimension actually has the elasticity to convert ambition into outcome.

Expectations Are Partially Controllable

The model treats $E^*$ as a choice variable. In practice, expectations are not free parameters — they are inherited, biological, environmental, and only partially adjustable. The agent has limited but real authority over $E^*$, and that authority varies by dimension and by individual.

What sets your expectations regardless of your wishes. Peer group and social environment. Childhood imprinting (especially for $r$ and $p$). Biological temperament. Cultural narrative absorption. These forces install your starting expectations and drag you toward your environment's mean.
What lets you adjust expectations with effort. Deliberate practice (Stoicism, reflection, meditation, gratitude work). Environmental choice (who you live among, what you consume). Explicit goal-setting. Cognitive reframing in the moment. With sustained effort over time, expectations on a given dimension do shift — but not all the way, and not instantly.
The exploit and its cost. If $E^*$ were fully controllable, there is a degenerate solution: set $E_t^i$ slightly below $i_t$ on every dimension, harvest positive joy automatically, never try hard at anything. This is real. People do it on purpose. Monks. Secular contented underachievers. The cost is twofold: it kills elasticity-driven action on every dimension simultaneously, and it requires constant effort to hold expectations against environmental drag. The exploit exists. It is not free.
Implication for the search. $E^*$ in the master objective should be read as the expectation profile achievable given your starting point and willingness to do the work, not any expectation profile you imagine. The feasible set of expectation profiles is bounded. The search happens inside those bounds.

What This Tells You

01
Joy is absolute value plus happiness, scaled by how sensitive you are to the gap. $J = A + \beta H$. A healthy, well-connected multi-millionaire who just moved in next to Bezos has high $A$ but negative $H_t$ — his new peer group rewrote his expectations overnight. A factory foreman who has spent twenty years exceeding his own expectations has modest $A$ but strongly positive $H_t$. With high enough $\beta$, the foreman's joy is higher. Both absolute conditions and the expectation gap matter. $\beta$ determines which dominates.
02
Happiness is what you feel. Joy is what you have. Like temperature — you do not feel absolute degrees, you feel hot or cold relative to your internal baseline. $H_t$ is that baseline gap. You can have low joy and high happiness (the cancer recovery — conditions are bad but improving fast), or high joy and low happiness (the depressed millionaire — conditions are great but expectations keep outrunning them). Maximize joy. Monitor happiness as feedback. A sustained negative $H_t$ is both a signal and an engine — it drives the policy changes that raise $i_t$.
03
Survival is endogenous. $\omega_t$ depends on $h_t$, $r_t$, and $p_t$. The lonely die younger. The purposeless die younger. The unhealthy die younger. Neglect compounds twice: once in joy, once in survival.
04
$r_t$ is the densest hub. Three strong outgoing arrows — to wealth, purpose, and survival. Standard cultural scripts treat relationships as a consumption good — something to enjoy once other things are sorted. The system says they are a production good.
05
The policy is the lever, not the decision. $X_t^\pi$ is well-defined — it is what the policy produces at time $t$ given the current state. But optimizing over individual decisions is myopic: it ignores how today's decision shapes $S_{t+1}$, $S_{t+2}$, and every state that follows. The policy search reasons over the whole trajectory. You design the policy. The policy produces the decisions.
06
Expectations are not free. They cost you in disappointment when reality misses, but they pay you in action when reality is shaped by belief. The optimal expectation level is dimension-specific, governed by $\eta_i$. Set bold expectations on $w$ and $p$ where elasticity is high. Set calibrated expectations on $h$ where elasticity has a ceiling. Set context-dependent expectations on $r$ where the elasticity sign depends on you. The blanket prescriptions — "manage expectations" or "dream big" — both fail because they treat a per-dimension question as a global one.
07
The choice of $C$ is upstream of everything. The framework optimizes given $C$. It does not pick $C$ for you. Most people inherit a $C$ from their culture and run optimization beautifully inside an objective they never selected. That is how someone wins at life by every conventional measure and feels hollow. Choose your $C$ deliberately. It is the metaproblem.
08
You can adjust your expectations, but not arbitrarily. The Stoic exploit — "set $E$ low, get free joy" — is real but bounded. Expectations are inherited from environment and biology, adjustable with sustained effort, dragged toward your peer group's mean. The exploit costs willpower to maintain and kills elasticity-driven action across every dimension at once. Use the lever. Know its limits.