MCPcopy Index your code
hub / github.com/0hq/WebGPT

github.com/0hq/WebGPT @main

repository ↗ · DeepWiki ↗ · Ask this repo → · + Follow
315 symbols 660 edges 12 files 0 documented · 0% updated 2y ago★ 3,7906 open issues
README

WebGPT

webGPT

After six years of development, WebGPU is about to launch across most major web browsers. This is massive: web applications now have near-native access to the GPU, with the added capacity of compute shaders.

WebGPT is a vanilla JS and HTML implementation of a transformer model, intended as a proof-of-concept as well as educational resource. WebGPT has been tested to be working with models up to 500 M parameters, though could likely support far more with further testing/optimization.

Current Stats

2020 M1 Mac: 3ms/token at 5M parameters with f32 precision.
2020 M1 Mac: 30ms/token at 117M parameters with f32 precision.
2020 M1 Mac: 70ms/token at 377M parameters with f32 precision.
2020 M1 Mac: 120ms/token at 775M parameters with f32 precision.
1.5B is working but unstable, sitting around 1000ms/token due to inefficiencies.

Running WebGPT

Running WebGPT is remarkably simple, as it's just a set of HTML + JS files. Since WebGPU is still in the process of being released, you'll need to open with a compatible browser. WebGPU is currently available on Chrome v113 but the most straightforward way to ensure proper functionality is to install Chrome Canary or Edge Canary.

I've included two different models: a toy GPT-Shakespeare model (which is severly undertrained haha) and GPT-2 117M. See main.js for more information on how to run these models. If you want to import custom models, take a look at misc/conversion_scripts.

If you want to try out WebGPT, visit the demo website here KMeans.org. I'd generally reccomend cloning the repo and running locally, just because loading the weights remotely is significantly slower.
Note: You'll need to use Git LFS to download the model files, after cloning the repository.

file sizes

Roadmap / Fixing Stupid Decisions

  • [x] Embeddings / de-embeddings on GPU.
  • [x] Initializing pipelines on every step is incredibly inefficient.
  • [x] Key-value caching.
  • [x] Reuse buffers.
  • [x] Kernel shared memory for matmul!
  • [x] Destroy buffers after use!
  • [x] Create kernel instruction classes + optimize pipeline creation.
  • [X] Fuse all kernels.
  • [X] Optimize all other kernels.
  • [X] Compute pass splitting for larger models (maxStorageBufferBindingSize)
  • [ ] Run selection ops on GPU (topk, selection softmax)
  • [ ] Attention kernel is optimized for small models, not for large models where each head having it's own matmul is more efficient.
  • [ ] Investigate why attention cache isn't giving proper speed-ups.
  • [ ] Make simple instructional version without special stuff.
  • [ ] Optimize workgroup sizes, specifically for single row/col operations.
  • [ ] Convert into a package.
  • [ ] Write better comments + make Youtube explainer.

Acknowledgements

When I started this project I had no idea how transformers worked or how to implement them (or GPUs or matmul kernels or WebGPU or tokenization for that matter), so Andrej Karpathy's series on neural networks and building GPT from scratch were invaluable: Andrej's Youtube. I've also used some code as well from the nanoGPT repository: nanoGPT.

I copied from LatitudeGames' implementation of OpenAI's GPT-3 tokenizer in Javascript: GPT-3-Encoder.

Core symbols most depended-on inside this repo

push
called by 60
other/test.js
initBuffer
called by 24
condensed/condensed.js
wgSize
called by 21
globals.js
initBindGroup
called by 20
instructions.js
initBindGroup
called by 20
condensed/condensed.js
initTensor
called by 18
condensed/condensed.js
wgSize
called by 17
condensed/condensed.js
newInstance
called by 16
instructions.js

Shape

Method 191
Class 70
Function 54

Languages

TypeScript98%
Python2%

Modules by API surface

condensed/condensed.js102 symbols
instructions.js61 symbols
other/scratchpad.js33 symbols
other/test.js30 symbols
tokenizer.js25 symbols
model.js20 symbols
globals.js13 symbols
visuals.js12 symbols
other/validation/validation.js8 symbols
other/int8-gemm.js4 symbols
other/conversion_scripts/convert_pretrained_pytorch.py4 symbols
other/conversion_scripts/convert_checkpoint_pytorch.py3 symbols

For agents

$ claude mcp add WebGPT \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact