Curated data for silicon and embedded AI research. Built from real engineering workflows, annotated by domain engineers.
Available Datasets
Datasets are available for research and evaluation. Commercial licensing available on request.
Curated collection of open-source Verilog and SystemVerilog designs. Cleaned, categorized, and annotated for LLM training and code generation research.
Request access →Testbench-design pairs with coverage data, assertion libraries, and pass/fail labels. For training verification-aware models.
Request access →Structured logs from synthesis, simulation, and timing analysis runs. Annotated with error categories, root causes, and resolutions.
Register interest →Natural language specification excerpts mapped to corresponding RTL implementations. For spec comprehension and code generation.
Register interest →Data Quality
All labels and annotations are created or reviewed by semiconductor engineers — not crowd-sourced. Domain accuracy matters.
Raw data goes through automated and manual cleaning pipelines. Duplicates, license-incompatible code, and low-quality samples are removed.
Each dataset release includes version tags, changelogs, data cards, and usage guidelines. Reproducibility is non-negotiable.
We build domain-specific datasets for research teams and companies. Tell us what you need.