1. Dgrammar: Efficient Constrained Decoding for Diffusion Language Models Venue: Anonymous ACL Submission Role: Author Description: - Introduced Dgrammar, a grammar-constrained decoder for Diffusion Language Models (DLMs) that preserves block-parallel multi-token unmasking — unlike prior methods that fall back to single-token decoding on every grammar violation.
- Combines incremental prefix checking, selective remasking via logit truncation, asynchronous mask construction, and grammar-guided autoregressive tail completion to enforce formal grammar constraints in-place under the same forward-pass logits.
- On JSONSchemaBench (medium_test, LLaDA-8B-Instruct): improves schema-valid rate from 76.3% → 84.5%, reduces mean latency by 5.8×, p95 latency by 9.4×, and eliminates all 120 s timeouts.
|
Venue: MLSys 2026 (FlashInfer AI Kernel Generation Contest) Authors: Jeng-Yue Liu, Wilson Zheng, Haoling Pu — Carnegie Mellon University Description: - Designed and optimized GPU kernels for both stages of the DeepSeek Sparse Attention (DSA) pipeline targeting 128K-token long-context LLM inference.
- Stage 1 (Top-K Indexer): Triton-based indexer with FP8 dequantization, cuBLAS scoring, and a two-tier CUDA graph caching scheme for near-zero repeated-call overhead.
- Stage 2 (Sparse Attention Kernel): CUDA kernel using WMMA m16n16k16 tensor cores,
cp.async double-buffered KV gathering, and a split-K parallelization strategy that lifts SM utilization from ~5% to ~173% at small batch sizes. - Achieves 22–50× speedup over the PyTorch reference on NVIDIA B200, with kernel latency flat at 53–61 µs across all 23 benchmark workloads and abs_err = 1.56 × 10⁻², well below the contest tolerance.
|
Dates: Jan. 2024 - Dec. 2024 Description: - [Final report] Collaborated with a team of 6 and California State University, Bakersfield to develop a GraphRAG-based news analysis tool, enabling efficient insight extraction from large datasets and reducing manual effort in social science research.
- Pre-processed and generated QA pairs on news articles data to fine-tune GPT-4o-mini within the GraphRAG framework.
|
Dates: Jul. 2024 - Oct. 2024 Description: - This project introduces a novel audio-query-based source separation approach, leveraging the Band-Split Mamba model and advanced latent diffusion techniques to overcome the limitations of traditional source separation methods.
|
Dates: Jul. 2024 - Nov. 2024 Description: - Advanced to the contest semifinals by building a digital transaction platform for agricultural goods in 2 months using TypeScript (React), Node.js, and MongoDB, deployed via Render and Vercel.
- Developed a Selenium web crawler for real-time vegetable prices to optimize fertilizer ratios for carbon reduction.
|
Dates: Sep. 2022 - Dec. 2022 Description: - This project presents a simplified traffic flow simulation focused on the Taipei Dome Area. Using the NetLogo environment, this model aims to simulate and analyze traffic dynamics under various scenarios.
|