Domain Adaptation

Video (best)

Yannic Kilcher — “Domain Adaptation / Transfer Learning overview”
Link: https://arxiv.org/abs/2010.03978
Why: No single Yannic Kilcher or 3Blue1Brown video squarely covers sim-to-real / domain adaptation as a unified topic. Stanford CS231N guest lectures touch on domain adaptation but are fragmented across years.
Level: intermediate

Note: The closest verified option is a Stanford CS231N lecture segment, but no single canonical YouTube explainer from the preferred educators cleanly covers sim-to-real + domain adaptation together.

Blog / Written explainer (best)

Lilian Weng — “Domain Randomization for Sim-to-Real Transfer”
Link: https://lilianweng.github.io/posts/2019-05-05-domain-randomization/
Why: Lilian Weng’s blog is consistently the gold standard for structured ML topic overviews. This post covers domain randomization, sim-to-real gap, system identification, and progressive transfer with clear diagrams and paper citations — directly mapping to all related concepts listed for this topic.
Level: intermediate/advanced

Deep dive

Author — Lilian Weng — “Meta-Learning: Learning to Learn Fast” + domain adaptation survey literature
Link: https://lilianweng.github.io/posts/2018-11-30-meta-learning/

Better candidate: The survey paper “A Survey on Transfer Learning” (Pan & Yang, 2010) and OpenAI’s technical blog on domain randomization serve as the most comprehensive references, but a single deep-dive blog post specifically unifying sim-to-real + cross-embodiment transfer does not clearly exist from the preferred authors.

url: https://openai.com/index/learning-dexterity/ [VERIFY — OpenAI Dactyl blog post covering sim-to-real in depth]
Why: OpenAI’s Dexterous In-Hand Manipulation (Dactyl) write-up is one of the most thorough public technical references for sim-to-real transfer, domain randomization, and system identification applied at scale. It bridges theory and practice concretely.
Level: advanced

Original paper

Tobin et al. (2017) — “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World”
Link: https://arxiv.org/abs/1703.06907
Why: This is the seminal, highly readable paper that introduced domain randomization as a principled approach to closing the sim-to-real gap. It is widely cited, clearly written, and directly foundational to all related concepts (reality gap, sim-to-real transfer, generalization). Accessible to readers without deep robotics background.
Level: intermediate

Code walkthrough

None identified
Why: No single well-maintained, pedagogically structured code walkthrough from a trusted source (fast.ai, Hugging Face, PyTorch tutorials) specifically covers sim-to-real domain adaptation end-to-end. Isaac Gym / Isaac Lab examples exist but lack narrative explanation.

Closest option: NVIDIA Isaac Lab tutorials cover sim-to-real workflows in code, but are documentation-style rather than pedagogical walkthroughs. [NOT VERIFIED]

Coverage notes

Strong: Written/blog coverage (Lilian Weng), seminal paper (Tobin et al. 2017), OpenAI technical posts on domain randomization
Weak: Video explainers — no preferred educator has produced a focused, high-quality video on sim-to-real or domain adaptation as a unified concept
Gap: Cross-embodiment transfer specifically is very underserved across all resource types; progressive transfer and system identification as sub-topics lack standalone pedagogical resources. Code walkthroughs with narrative explanation are essentially absent for this topic.

Cross-validation

This topic appears in 2 courses: intro-to-multimodal, intro-to-physical-ai

For intro-to-physical-ai: sim-to-real gap, domain randomization, and system identification are core — Tobin et al. + Weng blog are strong anchors
For intro-to-multimodal: domain adaptation in the sense of cross-modal/cross-domain generalization is less well served by these robotics-focused resources; a separate resource targeting distribution shift in vision-language models would be needed

Additional Resources for Tutor Depth

9 sources — papers, official docs, working code, benchmarks, and deep explainers that give the AI tutor precision on this topic.

📄 Domain Adaptation Generalization Bound (A-distance + labeling disagreement)

Paper · source

Upper bound on target risk including (i) source empirical risk, (ii) domain discrepancy via (d_{\mathcal H}) (A-distance), and (iii) conditional/labeling-function disagreement term (\lambda).

Key content

Setup (Sec. 2): Representation (R:X\to Z). Induced feature distributions: (\Pr_{\tilde D}[B]=\Pr_D[R^{-1}(B)]). Induced labeling: (\tilde f(z)=\mathbb E_D[f(x)\mid R(x)=z]).
Predictor (h:Z\to{0,1}). Errors:
[ \epsilon_S(h)=\mathbb E_{z\sim \tilde D_S}\big[|\tilde f(z)-h(z)|\big],\quad \epsilon_T(h)=\mathbb E_{z\sim \tilde D_T}\big[|\tilde f(z)-h(z)|\big]. ]
A-distance / (\mathcal H)-divergence (Sec. 3.1–3.2): For subset family (\mathcal A),
[ d_{\mathcal A}(D,D’)=2\sup_{A\in\mathcal A}|\Pr_D[A]-\Pr_{D’}[A]|. ] For hypothesis class (\mathcal H), use (\mathcal A={Z_h:h\in\mathcal H}) and denote (d_{\mathcal H}(\tilde D_S,\tilde D_T)).
Labeling disagreement / conditional shift term (Sec. 3.1): (\tilde f) is (\lambda)-close to (\mathcal H) if
[ \inf_{h\in\mathcal H}\big(\epsilon_S(h)+\epsilon_T(h)\big)\le \lambda. ]
Main bound (Thm. 1, Sec. 3.2): For VC-dim (d), labeled source sample size (m), w.p. (\ge 1-\delta), (\forall h\in\mathcal H):
[ \epsilon_T(h)\le \hat\epsilon_S(h)+\sqrt{\frac{4}{m}\Big(d\log\frac{2em}{d}+\log\frac{4}{\delta}\Big)}+d_{\mathcal H}(\tilde D_S,\tilde D_T)+\lambda. ]
Computable version with unlabeled samples (Thm. 2): With unlabeled (\tilde U_S,\tilde U_T) of size (m’):
[ \epsilon_T(h)\le \hat\epsilon_S(h)+4\sqrt{\frac{d\log\frac{2em}{d}+\log\frac{4}{\delta}}{m}}+\lambda+d_{\mathcal H}(\tilde U_S,\tilde U_T)+4\sqrt{\frac{d\log(2m’)+\log\frac{4}{\delta}}{m’}}. ]
Estimating (d_{\mathcal H}) from domain-classification (Sec. 4): Given (\tilde U_S,\tilde U_T) (each size (m’)), define domain-discrimination error
[ \mathrm{err}(h)=\frac{1}{2m’}\sum_{i=1}^{2m’}\big|h(z_i)-\mathbf 1[z_i\in \tilde U_S]\big|. ] Then
[ d_{\mathcal A}(\tilde U_S,\tilde U_T)=2\Big(1-2\min_{h’\in\mathcal H}\mathrm{err}(h’)\Big). ] For linear separators, exact optimization NP-hard; authors approximate via convex surrogate (modified Huber loss) + SGD.
Empirical numbers (Sec. 5.3, Fig. 2b; PoS WSJ→MEDLINE, projection dim (d=200)):
- Identity: Huber loss 0.003, A-distance 1.796, target error 0.253
- Random Proj: Huber loss 0.254, A-distance 0.223, target error 0.561
- SCL: Huber loss 0.07, A-distance 0.211, target error 0.216
  Data: 100 labeled WSJ sentences (~2500 words); 1M unlabeled words (500k/domain) to estimate A-distance.

📄 Humanoid-Gym zero-shot sim-to-real recipe (Isaac Gym → MuJoCo → real)

Paper · source

Practical sim-to-real pipeline for humanoid locomotion with domain randomization + sim2sim validation; reported zero-shot transfer on two real humanoids.

Key content

Problem framing (Sec. III-A): Real deployment is POMDP (partial observations), while training can use privileged info via Asymmetric Actor-Critic + PPO with GAE.
- Policy objective (Eq. 1): PPO clipped surrogate loss (standard PPO form) using advantage estimates.
- Value/advantage (Eq. 2): GAE-based advantage requiring updated value function.
Observations/actions (Sec. III-B, Table I):
- Action: target joint positions for a PD controller.
- Policy inputs (single frame): clock (2), commands (3), joint pos (12), joint vel (12), angular vel (3), Euler angles (3), last actions (12), periodic stance mask (2), feet contact detection (2).
- Privileged (state-only) examples: friction (1), body mass (1), base linear vel (3), push force (2), push torques (3), tracking difference (12).
- Gait design: sinusoidal reference motion + periodic stance/contact mask synchronized to DS/SS phases.
Control rates: policy at 50 Hz; internal PD at 500 Hz.
Training setup (Appendix Table II): 8192 environments, episode length 2400 steps, discount 0.994, GAE λ 0.95, entropy coef 0.001, learning rate 1e-5, frame stack obs 15, privileged stack 3, single obs dim 47, privileged dim 73.
Domain randomization (Sec. IV-A, Appendix Table III): joint pos noise ±0.05 rad (Gaussian), joint vel ±0.5 rad/s (Gaussian), ang vel ±0.1 rad/s (Gaussian), Euler ±0.03 rad (Gaussian), system delay 0–10 ms (Uniform), friction 0.1–2.0 (Uniform), motor strength 95–105% (Gaussian scaling), payload ±5 kg (Gaussian additive).
Reward components (Sec. III-C, Appendix Table IV scales): lin vel track 1.2, ang vel track 1.0, orientation 1.0, base height 0.5, contact pattern 1.0, joint pos tracking 1.5, default joint 0.2, energy -1e-4, action smoothness -0.01, large contact -0.01; “velocity mismatch” term present (scale 0.5) but commands set to zero to encourage stable walking.
Workflow/results (Sec. IV): Train in Isaac Gym (GPU, fast) → validate robustness in MuJoCo (more accurate) (“sim2sim”) → zero-shot sim-to-real on XBot-S (1.2 m) and XBot-L (1.65 m); policies traverse flat and uneven terrains. MuJoCo calibrated to match real joint swing trajectories/phase portraits more closely than Isaac Gym.

📄 Invariant representations can fail in Domain Adaptation

Paper · source

Formal bounds/counterexamples for invariant representation learning; sufficient/necessary conditions via conditional shift + label-marginal mismatch.

Key content

DA setup/notation (Sec. 2): Source domain ⟨D_S, f_S⟩, target ⟨D_T, f_T⟩ with deterministic labels Y=f(X). Hypothesis h: X→{0,1}. Risk: ε_S(h,f):=E_{x~D_S}[|h(x)-f(x)|]; ε_S(h):=ε_S(h,f_S) (similarly ε_T).
H-divergence (Def. 2.1): A_H:={h^{-1}(1) | h∈H}.
d_H(D,D′):= sup_{A∈A_H} |Pr_D(A) − Pr_{D′}(A)|.
Classic DA bound (Thm 2.1, Eq. (1)): For VC-dim d, w.p. ≥1−δ, ∀h∈H:
ε_T(h) ≤ \hat ε_S(h) + ½ d_{HΔH}(\hat D_S,\hat D_T) + λ* + O(√((d log n + log(1/δ))/n)),
where h*:=argmin_{h∈H} ε_S(h)+ε_T(h), λ*:=ε_S(h*)+ε_T(h*).
Counterexample (Sec. 4.1, Fig. 1): X=Z=R.
D_S=U(−1,0), f_S(x)=0 if x≤−½ else 1.
D_T=U(1,2), f_T(x)=0 if x≥3/2 else 1.
There exists h*(x)=1 iff x∈(−½,3/2) with 0 error on both.
But with g(x)=I_{x≤0}(x+1)+I_{x>0}(x−1): induced D_ZS=D_ZT=U(0,1) (perfectly invariant), yet ∀h: ε_S(h∘g)+ε_T(h∘g)=1 (smaller source error ⇒ larger target error). Here λ*_g=1.
Sufficient-condition bound without λ* (Thm 4.1): For H⊆[0,1]^X, ∀h∈H:
ε_T(h) ≤ ε_S(h) + d_{\tilde H}(D_S,D_T) + min{E_{D_S}|f_S−f_T|, E_{D_T}|f_S−f_T|},
where \tilde H := { sgn(|h(x)−h′(x)|−t) : h,h′∈H, t∈[0,1] }.
Note: E_{D_S}|f_S−f_T|=ε_S(f_T), E_{D_T}|f_S−f_T|=ε_T(f_S) (cross-domain errors).
Info-theoretic lower bound (Sec. 4.3): With Markov chain X→^g Z→^h Ŷ and JS distance d_JS:
Lemma 4.8: d_JS(D_{Y_S},D_{Y_T}) ≤ d_JS(D_{Z_S},D_{Z_T}) + √ε_S(h∘g)+√ε_T(h∘g).
Thm 4.3: if d_JS(D_{Y_S},D_{Y_T}) ≥ d_JS(D_{Z_S},D_{Z_T}), then
ε_S(h∘g)+ε_T(h∘g) ≥ ½ ( d_JS(D_{Y_S},D_{Y_T}) − d_JS(D_{Z_S},D_{Z_T}) )^2.
⇒ If label marginals differ, forcing invariance (small d_JS(D_ZS,D_ZT)) can force large joint error.
Empirical pipeline (Sec. 5): DANN on MNIST/USPS/SVHN (10 classes). Preprocess to grayscale 16×16. Classifier: 2 conv layers (5×5 kernels; 10 then 20 channels) → FC 1280→100 → softmax(10). Discriminator: conv features → FC 500→100 → 1-unit domain output. Observation: target accuracy rises quickly (<10 iters) then decreases with continued training (over-training hurts when label distributions differ).

📄 SimOpt / Bayesian Domain Randomization Loop

Paper · source

Closed-loop procedure to update simulator parameter distributions from a few real rollouts (KL-constrained Bayesian-style update) interleaved with RL policy training.

Key content

Domain randomization objective (Eq. 1, Sec. III-A): sample sim params (\xi \sim p_\phi(\xi)) to induce dynamics (P_{\xi\sim p_\phi}). Train policy (\pi_\theta(a|s)) to maximize
[ \max_\theta \ \mathbb{E}{\xi\sim p\phi}\big[\mathbb{E}{\pi\theta}[R(\tau)]\big] ] where (\tau=(s_0,a_0,\dots,s_T,a_T)), (R(\tau)=\sum_{t=0}^T \gamma^t R(s_t,a_t)).
Sim-to-real matching objective (Eq. 2, Sec. III-B): minimize expected discrepancy between real and sim observation trajectories:
[ \min_\phi \ \mathbb{E}{\xi\sim p\phi}\big[\mathbb{E}{\pi{\theta,p_\phi}}[D(\tau^{ob}\xi,\tau^{ob}{real})]\big] ] Policy inputs and discrepancy observations need not match; only partial real observations required.
Iterative KL-trust-region update (Eq. 3 + Alg. 1 “SimOpt”):
[ \min_{\phi_{i+1}} \mathbb{E}{\xi{i+1}\sim p_{\phi_{i+1}}}\big[\mathbb{E}{\pi{\theta,p_{\phi_i}}}[D(\tau^{ob}{\xi{i+1}},\tau^{ob}{real})]\big] \ \text{s.t.}\ D{KL}(p_{\phi_{i+1}}|p_{\phi_i})\le \epsilon ] Algorithm 1 steps: init (p_{\phi_0}); loop: train RL in sim with (p_{\phi_i}); collect real rollout; sample (\xi\sim p_{\phi_i}) and sim rollouts; compute cost (c(\xi)=D(\cdot)); update (p_{\phi_{i+1}}) with KL step (\epsilon).
Discrepancy function (Eq. 4): weighted (\ell_1+\ell_2) over time with per-dimension weights (W):
(D = w_{\ell_1}\sum_{i=0}^T |W(o_{i,\xi}-o_{i,real})| + w_{\ell_2}\sum_{i=0}^T |W(o_{i,\xi}-o_{i,real})|_2^2). Gaussian temporal smoothing used (std 5 timesteps, trunc 4).
Implementation defaults: (p_\phi(\xi)=\mathcal{N}(\mu,\Sigma)) (full covariance). Update via REPS (relative entropy policy search) gradient-free, treating simulator as black box.
Empirical results (key numbers):
- Swing-peg-in-hole (real ABB Yumi): per SimOpt iter: 100 RL iters (~7 min), 3 real rollouts, 3 REPS updates, 9600 sim samples/update, 453 timesteps/sample. After 2 SimOpt iterations, 90% success over 20 trials.
- Drawer opening (real Franka Panda): per iter: 200 RL iters (~22 min), 3 real rollouts, 20 REPS updates, 9600 samples/update. After 1 SimOpt update, 20/20 successful openings.
- Why not “very wide” randomization: wide distributions can include infeasible instances (e.g., peg too large / rope too short) and hinder learning; in sim drawer task, policy only opened drawer at cabinet-position std 2 cm; larger (up to 10 cm) led to conservative/failing policies. SimOpt handled target offsets 15 cm (3 iterations) and 22 cm (5 iterations) by progressively shifting the distribution.

📄 i-S2R (Iterative Sim-to-Real) for Human-Robot Table Tennis

Paper · source

Iterative sim↔real RL pipeline that updates a human behavior model from real interaction data to reduce sim-to-real gap in tight HRI; includes transfer/generalization results.

Key content

Objective / MDP (Section 3): MDP ((S,A,R,p)), policy (\pi_\theta:S\to A), maximize expected return
[ \mathbb{E}\left[\sum_{t=1}^{N} r(s_t,\pi_\theta(s_t))\right] ] Episodes simplified: start with a hit (no serve); episode = single ball throw + return; reward encourages cooperative returns → longer rallies.
Iterative-Sim-to-Real procedure (Section 4, Fig. 2):
1. Collect initial human-only data (D_0) (player hits across table, no robot).
2. Fit initial human ball distribution model (M_0).
3. Train policy in sim on balls sampled from (M_k) → (\theta_k^S).
4. Deploy + real fine-tune with human-in-loop → (\theta_k^R); collect interaction hits → update dataset (D_{k+1}).
5. Update human model (M_{k+1}) from (D_{k+1}); continue sim training from (\theta_k^R). Repeat until model deltas shrink (they found ~3 iterations sufficient).
Human behavior model (Section 4): uniform distribution defined by 16 numbers: min/max of initial ball position (6), velocity (6), and landing (x,y) on robot side (4). Fit per-trajectory initial pos/vel by Nelder–Mead minimizing Euclidean trajectory error; remove outliers via DBSCAN; take per-dimension min/max.
RL optimizer (Section 3): Blackbox Gradient Sensing (BGS), an ES method optimizing smoothed objective (Eq. 1):
[ F_\sigma(\theta)=\mathbb{E}_{\delta\sim\mathcal{N}(0,I)}[F(\theta+\sigma\delta)] ] Uses orthogonal perturbation ensembles + “elite-choice” sample selection; other RL (PPO, SAC, QT-OPT) didn’t transfer as well.
System / policy defaults (Section 5): 8-DOF robot (6-DOF arm + 2D linear actuator). Obs = 3D ball pos + 8 joint angles = 11D; stack past 7 obs → input size (8\times 11). Action = 8 joint velocities at 75 Hz. Policy net: 3-layer 1D dilated gated CNN. Sim: PyBullet; add uniform ball obs noise = 2× ball diameter per timestep; must simulate sensor/action latency (Gaussian from real measurements) or transfer “completely failed.”
Key empirical results (Abstract, Section 6–7):
- Final performance: 22 hits average, 150 best rally.
- For 80% of players, i-S2R rallies 70%–175% longer than S2R+FT baseline (beginner ≈70%, intermediate ≈175%).
- Aggregated: i-S2R ≈ 9% higher rally length than S2R+FT.
- Cross-eval generalization: i-S2R retains ~70% of self-performance vs S2R+FT ~30% (Fig. 7).
- S2R-Oracle+FT (trained on penultimate learned human model) matches i-S2R with only 35% of real fine-tuning budget → gains largely from improved human model.

📊 SKADA-Bench (UDA) — realistic validation + standardized results

Benchmark · source

Standardized UDA comparison tables + explicit realistic validation protocol (nested CV, unsupervised scorers), enabling fair number quoting.

Key content

UDA definition (Sec. 1): adapt a model trained on labeled source to unlabeled target under distribution shift.
Shift types (Sec. 2.1):
- Covariate shift: (P_s(y\mid x)=P_t(y\mid x)), but (P_s(x)\neq P_t(x)).
- Target shift: (P_s(y)\neq P_t(y)) with conditionals preserved.
- Conditional shift: (P_s(y\mid x)\neq P_t(y\mid x)).
- Subspace shift: exists subspace (Z=g(X)) where distributions align and a classifier on (Z) transfers.
Realistic model selection protocol (Sec. 3.1):
- Nested cross-validation: outer loop creates train/test splits; inner loop selects DA hyperparameters using unlabeled target validation via unsupervised scorers.
- Splits: 5 random stratified repeats, 80/20 train/test (outer + inner), except Deep DA: only 1 outer split (compute).
- Timeout: 4 hours per method for nested loop.
- Base estimator selection (before DA tuning): grid-search on source among Logistic Regression, RBF-SVM, XGBoost; some methods require SVM (JDOT, DASVM).
- Deep backbones: 2-layer CNN (MNIST/USPS), ImageNet-pretrained ResNet50 (Office31/OfficeHome), ShallowFBCSPNet (BCI).
Unsupervised scorers (Sec. 2.2): IW, DEV, Prediction Entropy (PE), SND, Circular Validation (CircV), MixVal. CircV not used for Deep DA (too expensive).
Key empirical table (Table 2, realistic best scorer):
- Train Src baseline accuracies: Cov 0.88, Tar 0.85, Cond 0.66, Sub 0.19; Office31 0.65; OfficeHome 0.56; MNIST/USPS 0.54; 20News 0.59; Amazon 0.70; Mushrooms 0.72; Phishing 0.91; BCI 0.55.
- Top average ranks (real datasets): LinOT rank 4.06 (Office31 0.66; OfficeHome 0.57; MNIST/USPS 0.64; 20News 0.82; Amazon 0.70; Mushrooms 0.76; Phishing 0.91; BCI 0.61; scorer CircV). CORAL rank 5.08 (scorer CircV).
- Conditional shift (simulated): mapping methods strong: EntOT 0.82, ClassRegOT 0.81 vs Train Src 0.66 (near Train Tgt 0.82).
Scorer selection frequency (Sec. 4.1): CircV best for 10/20 methods, IW best for 4/20.
Scorer–accuracy correlation (Sec. 4.2, Pearson r): MixVal 0.51, IW 0.44, CircV 0.40; SND/DEV/PE “do not provide a good proxy” (low correlation). High variance: IW/CircV score ≈1 can correspond to accuracy 0.5–1.0.
Compute cost (Sec. 4): shallow experiments 1,215 CPU-hours; deep experiments 244 GPU-hours.
Design rationale: supervised target validation is unrealistic; nested CV + unsupervised scorers estimate real-world performance and reveal performance drops vs supervised tuning; simple linear transforms (LinOT, CORAL, JPCA, SA) are “safe defaults” when shift type unknown.

📊 VisDA-2017 (Sim-to-Real) UDA Benchmark

Benchmark · source

Official VisDA-2017 challenge description + dataset/task definitions + baseline & top challenge scores (classification/segmentation) + evaluation metrics

Key content

Problem setup (UDA, sim→real): Train on labeled source (synthetic) and adapt using unlabeled target; no target annotations used for training. Two different target domains: one for validation (hyperparams) and a different one for test to prevent tuning on test labels (Section 3).
VisDA-C (classification) domains & scale (12 classes):
- Source/train: CAD-synthetic renderings, 152,397 images, 1,907 3D models.
- Target/val: MS COCO crops, 55,388 images (person capped at 4,000).
- Target/test: YouTube-BB frame crops, 72,372 images.
- Total across splits: >280K images.
Metric (classification): mean accuracy averaged over categories (reported at 40k iterations in baselines).
Baseline training defaults (AlexNet): ImageNet init (except last FC=12); SGD, momentum 0.9 (weight decay/base LR given but not legible in excerpt).
ResNeXt-152: last FC=12 with Xavier init; output layer LR = 10× other layers; LR schedule: (lr(p)=lr_0(1+\alpha p)^{-\beta}), (p\in[0,1]), (\alpha=10,\beta=0.75).
Baseline results (VisDA-C):
- Oracle (in-domain) AlexNet: synthetic 99.92%, real-val 87.62%.
- Source-only AlexNet → real-val: 28.12%; Deep CORAL 45.53%; DAN 51.62%.
- Test domain: Oracle AlexNet 92.08%, Oracle ResNeXt-152 93.40%; Source-only AlexNet 30.81%; DAN 49.78%, Deep CORAL 45.29%.
- Challenge top score (test): GFColourLabUEA improved ResNet-152 source-only 45.3% → 92.8% using Mean Teacher + label propagation (student CE on source + student/teacher consistency MSE on both domains; teacher = EMA of student weights).
VisDA-S (segmentation) domains (19 classes) & metric: GTA5 (source, 24,966 labeled frames) → CityScapes (val labeled) → Nexar/BDD (test, 1,500 images); metric mIoU.
- Dilation F.E. source: 21.4 mIoU on CityScapes val; oracle 64.0. On Nexar test: source 25.9 mIoU. Hoffman et al. adaptation reported ~25.5 mIoU on val (also cited as 27.1 for their method in table caption).

🔍 ASID—Active Exploration for System Identification (Sim→Real→Sim→Real)

Explainer · source

Step-by-step pipeline: Fisher-information exploration → system ID → downstream policy optimization in identified simulator (test-time simulation construction)

Key content

Problem setup (Section 3): Real dynamics unknown but in parametric family (P_\theta); there exists true (\theta^*) s.t. real MDP (M = M_{\theta^*}). Simulator can sample trajectories (\tau \sim P_\theta^\pi) “for free” for any (\theta,\pi).
Learning protocol (Section 3):
1. Choose exploration policy (\pi_{\text{exp}}), run one real episode → trajectory (\tau).
2. Use (\tau) + simulator to produce task policy (\pi).
3. Deploy (\pi) in real.
Fisher information + estimation bound (Eq. 1): For unbiased estimator (\hat\theta), MSE is lower-bounded by Fisher information (I(\theta)):
[ \mathbb{E}|\hat\theta-\theta|^2 \ \ge\ \mathrm{tr}!\left(I(\theta)^{-1}\right) ] (via Cramér–Rao). Fisher information depends on trajectory distribution induced by (\pi_{\text{exp}}): (I(\theta;\pi_{\text{exp}})).
Exploration objective = A-optimal design (Eq. 2):
[ \pi_{\text{exp}}^* \in \arg\min_{\pi_{\text{exp}}}\ \mathrm{tr}!\left(I(\theta;\pi_{\text{exp}})^{-1}\right) ] Rationale: large (I) ⇒ trajectories highly sensitive to (\theta) (log-likelihood gradient large) ⇒ more informative data.
Practical implementation (Section 4.1):
- Assume next-state model (s_{t+1}=f_\theta(s_t,a_t)+\varepsilon) with Gaussian noise (Eq. 3) ⇒ (I) reduces to sensitivity of dynamics wrt (\theta) (uses (\nabla_\theta f_\theta)).
- Unknown (\theta^*): optimize expected objective under parameter distribution (p(\theta)) (domain randomization) (Eq. 4).
- If simulator non-differentiable: approximate (\nabla_\theta f_\theta) via finite differences.
- Train (\pi_{\text{exp}}) with standard RL (e.g., PPO).
System identification (Section 4.2): Fit a distribution over parameters (p(\theta)) to match real trajectory (\tau) by minimizing mismatch between (\tau) and simulated rollouts using the same action sequence. Implemented with REPS (simulation) and CEM (real experiments).
Downstream policy (Section 4.3): Train task policy entirely in the identified simulator; transfer zero-shot to real.
Key empirical results (real-world, Section 5.5):
- Rod balancing: Domain Randomization (DR) 0/3, 0/3, 0/3 (left/middle/right inertia); ASID 2/3, 1/3, 3/3 (total 6/9).
- Shuffleboard: DR 2/5 (yellow/close), 1/5 (blue/far); ASID 4/5, 3/5 (total 7/10).
- Typical data need: single real episode often suffices.

📋 DomainBed DG Experiment Surface (CLI, registries, sweeps)

Code · source

Configuration/implementation surface for domain generalization experiments (datasets, algorithms, hparams registry, training + sweep scripts, model selection).

Key content

Purpose: PyTorch suite for benchmarking domain generalization (per In Search of Lost Domain Generalization, arXiv:2007.01434). Official results for commit 7df6f06 provided in domainbed/results/2020_10_06_7df6f06/results.tex.
Algorithm registry (domainbed/algorithms.py): includes ERM, IRM, GroupDRO, Mixup, MTL, MLDG, MMD, CORAL, DANN, CDANN, SagNet, ARM, VREx, RSC, SD, AND-Mask, IGA, Fish, SelfReg, SAND-mask, Fishr, TRM, IB-ERM, IB-IRM, CAD/CondCAD, Transfer, CausIRL (CORAL or MMD), EQRM, RDM, ADRMX, ERM++, URM.
Dataset registry (domainbed/datasets.py): RotatedMNIST, ColoredMNIST, VLCS, PACS, Office-Home, TerraIncognita (subset), DomainNet, SVIRO (subset), WILDS FMoW, WILDS Camelyon17, Spawrious. Custom image datasets supported via folder structure: dataset/domain/class/image.xyz.
Backbones + hparams: implementations use ResNet50 / ResNet18; hyperparameter grids defined in domainbed/hparams_registry.py.
Model selection methods (domainbed/model_selection.py):
- IIDAccuracySelectionMethod: validation subset from training domains.
- LeaveOneOutSelectionMethod: validation subset from a held-out domain (not train/test).
- OracleSelectionMethod: validation subset from the test domain.
Core CLI workflows:
- Download: python3 -m domainbed.scripts.download --data_dir=./domainbed/data
- Train: python3 -m domainbed.scripts.train --data_dir=... --algorithm IGA --dataset ColoredMNIST --test_env 2
- Sweep launch: python -m domainbed.scripts.sweep launch --data_dir=... --output_dir=... --command_launcher MyLauncher
- Sweep scale defaults: “tens of thousands” models = (all algos × all datasets × 3 trials × 20 hparam samples). Can restrict via --algorithms, --datasets, --n_hparams, --n_trials.
- Results: python -m domainbed.scripts.collect_results --input_dir=...
- Cleanup/retry: python -m domainbed.scripts.sweep delete_incomplete then relaunch with identical args.
Tests: python -m unittest discover; with datasets: DATA_DIR=/path python -m unittest discover.

SocraticTutor LLM Wiki

Explorer

Domain Adaptation

Domain Adaptation

Video (best)

Blog / Written explainer (best)

Deep dive

Original paper

Code walkthrough

Coverage notes

Cross-validation

Additional Resources for Tutor Depth

📄 Domain Adaptation Generalization Bound (A-distance + labeling disagreement)

📄 Humanoid-Gym zero-shot sim-to-real recipe (Isaac Gym → MuJoCo → real)

📄 Invariant representations can fail in Domain Adaptation

📄 SimOpt / Bayesian Domain Randomization Loop

📄 i-S2R (Iterative Sim-to-Real) for Human-Robot Table Tennis

📊 SKADA-Bench (UDA) — realistic validation + standardized results

📊 VisDA-2017 (Sim-to-Real) UDA Benchmark

🔍 ASID—Active Exploration for System Identification (Sim→Real→Sim→Real)

📋 DomainBed DG Experiment Surface (CLI, registries, sweeps)

Graph View

Table of Contents

Backlinks

SocraticTutor LLM Wiki

Explorer

Domain Adaptation

Domain Adaptation

Video (best)

Blog / Written explainer (best)

Deep dive

Original paper

Code walkthrough

Coverage notes

Cross-validation

Additional Resources for Tutor Depth

📄 Domain Adaptation Generalization Bound (A-distance + labeling disagreement)

📄 Humanoid-Gym zero-shot sim-to-real recipe (Isaac Gym → MuJoCo → real)

📄 Invariant representations can fail in Domain Adaptation

📄 SimOpt / Bayesian Domain Randomization Loop

📄 i-S2R (Iterative Sim-to-Real) for Human-Robot Table Tennis

📊 SKADA-Bench (UDA) — realistic validation + standardized results

📊 VisDA-2017 (Sim-to-Real) UDA Benchmark

🔍 ASID—Active Exploration for System Identification (Sim→Real→Sim→Real)

📋 DomainBed DG Experiment Surface (CLI, registries, sweeps)

Related Topics

Graph View

Table of Contents

Backlinks