Disclaimer: AI is an area of active research with known problems such as biased generation and misinformation. Do not use this application for high-stakes decisions or advice.
DeepAuto Lightweight LLM
Our efficient LLM serving framework drastically accelerate long-context
Transformers inference in a plug-and-play manner using our novel sub-quadratic
complexity attention mechanism, Hierarchical Pruned Attention (HiP).