DCN-V2 — Deep & Cross Networks · Emergent Intelligence Lab

Following Wang et al. (2021), we study explicit feature-crossing architectures and the trade-offs of low-rank versus full-rank gating. The core building block is the cross layer:

x_{l+1} = x_0 \odot (W_l x_l + b_l) + x_l

with low-rank decomposition $W_l \approx U_l V_l^T$ and a mixture-of-experts extension $W_l = \sum_i G_i(x_l) \cdot U_i V_i^T$ that lets the model allocate capacity adaptively across feature regions.

What we’re building

A reproducible reference implementation of DCN-V2 with batched training on AWS Batch via Metaflow.
An agentic research workflow (the dcn_v2_agents repo) that scaffolds paper structure, runs consistency checks, and assembles drafts across multiple owners (positioning, method, experiment, lead).

Why it matters

Cross networks are among the most compute-efficient ways to add explicit feature interactions to deep models. We’re interested in pushing the low-rank / MoE frontier — how aggressively can we compress the cross layers before the interaction signal degrades, and where do experts route when given the freedom?