Selected Projects

1. Dgrammar: Efficient Constrained Decoding for Diffusion Language Models
Project 1
Venue: Anonymous ACL Submission
Role: Author
Description:
  • Introduced Dgrammar, a grammar-constrained decoder for Diffusion Language Models (DLMs) that preserves block-parallel multi-token unmasking — unlike prior methods that fall back to single-token decoding on every grammar violation.
  • Combines incremental prefix checking, selective remasking via logit truncation, asynchronous mask construction, and grammar-guided autoregressive tail completion to enforce formal grammar constraints in-place under the same forward-pass logits.
  • On JSONSchemaBench (medium_test, LLaDA-8B-Instruct): improves schema-valid rate from 76.3% → 84.5%, reduces mean latency by 5.8×, p95 latency by 9.4×, and eliminates all 120 s timeouts.
Project 2
Venue: MLSys 2026 (FlashInfer AI Kernel Generation Contest)
Authors: Jeng-Yue Liu, Wilson Zheng, Haoling Pu — Carnegie Mellon University
Description:
  • Designed and optimized GPU kernels for both stages of the DeepSeek Sparse Attention (DSA) pipeline targeting 128K-token long-context LLM inference.
  • Stage 1 (Top-K Indexer): Triton-based indexer with FP8 dequantization, cuBLAS scoring, and a two-tier CUDA graph caching scheme for near-zero repeated-call overhead.
  • Stage 2 (Sparse Attention Kernel): CUDA kernel using WMMA m16n16k16 tensor cores, cp.async double-buffered KV gathering, and a split-K parallelization strategy that lifts SM utilization from ~5% to ~173% at small batch sizes.
  • Achieves 22–50× speedup over the PyTorch reference on NVIDIA B200, with kernel latency flat at 53–61 µs across all 23 benchmark workloads and abs_err = 1.56 × 10⁻², well below the contest tolerance.
Project 3
Dates: Jan. 2024 - Dec. 2024
Description:
  • [Final report] Collaborated with a team of 6 and California State University, Bakersfield to develop a GraphRAG-based news analysis tool, enabling efficient insight extraction from large datasets and reducing manual effort in social science research.
  • Pre-processed and generated QA pairs on news articles data to fine-tune GPT-4o-mini within the GraphRAG framework.
Project 4
Dates: Jul. 2024 - Oct. 2024
Description:
  • This project introduces a novel audio-query-based source separation approach, leveraging the Band-Split Mamba model and advanced latent diffusion techniques to overcome the limitations of traditional source separation methods.
Project 5
Dates: Jul. 2024 - Nov. 2024
Description:
  • Advanced to the contest semifinals by building a digital transaction platform for agricultural goods in 2 months using TypeScript (React), Node.js, and MongoDB, deployed via Render and Vercel.
  • Developed a Selenium web crawler for real-time vegetable prices to optimize fertilizer ratios for carbon reduction.
Project 6
Dates: Sep. 2022 - Dec. 2022
Description:
  • This project presents a simplified traffic flow simulation focused on the Taipei Dome Area. Using the NetLogo environment, this model aims to simulate and analyze traffic dynamics under various scenarios.