New paper · SkillAudit: From Fixed-Suite Benchmarking to Skill-Centered Assessment

Projects

Reproducible benchmarks, research systems, and open-source implementations connected to our publications.

14 projects

Research2026

Latent Thinking Optimization

Code for supervising and improving latent reasoning by treating hidden-state correctness signals as a latent reward model.

PythonLatent ReasoningReward Modeling
Research2026

LLMPopcorn

An LLM-assisted pipeline for generating and evaluating titles, cover prompts, and short-video prompts designed for audience appeal.

PythonRAGVideo Generation
Open source2026

MetaSR

The official implementation of a meta-learning framework that bridges next-item prediction and masked-language modeling for recommendation.

PythonMeta-LearningSequential Recommendation
Benchmarks2026

MMPCBench

A benchmark for measuring how multimodal language models reconstruct missing product text or imagery and support recommendation.

PythonMultimodal LLMsRecommendation
Benchmarks2026

PhysicsMind

A simulation-and-real-world benchmark for testing physical reasoning and prediction in vision-language and world models.

PythonVLM EvaluationVideo Generation
Benchmarks2026

SkillAudit

Skill-centered assessment for agent skills across utility, efficiency and cost, and safety, backed by sandboxed execution evidence.

PythonDockerBrowser Extension
Open source2025

AMBER

An adaptive meta-balancing framework for integrating heterogeneous graph signals in knowledge tracing.

PythonGraph LearningKnowledge Tracing
Open source2025

FDRec

Frequency-decoupled knowledge distillation for reducing multimodal recommendation cost while preserving useful cross-modal signals.

PythonKnowledge DistillationRecommendation
Open source2025

Guider

The official implementation of a guided-calibration framework for denoising multimodal recommender systems.

PythonPyTorchMultimodal Recommendation
Open source2025

IISAN-Versa

A decoupled parameter-efficient adaptation framework for symmetric and asymmetric multimodal foundation models in recommendation.

PythonPEFTSequential Recommendation
Research2025

SAPIENT

A conversational recommendation system that combines a learned agent with Monte Carlo tree search for multi-turn planning.

PythonMCTSConversational AI
Research2025

SOLAR

An open implementation for aligning language-model recommenders with both relevance and serendipity objectives.

PythonLLM AlignmentRecommendation
Open source2025

TARec

A teacher-assisted Wasserstein distillation pipeline for compressing multimodal recommenders into efficient ID-based student models.

PythonPyTorchKnowledge Distillation
Open source2025

UKT

Uncertainty-aware knowledge tracing that represents student states as distributions instead of fixed deterministic embeddings.

PythonKnowledge TracingUncertainty Modeling