Cloudflare, a leading connected cloud company, recently announced the general availability of its Workers AI platform and several new features aimed at simplifying how developers build and deploy AI applications. This announcement represents a significant step forward in Cloudflare's efforts to democratize AI and make it more accessible to developers around the world.
After months of open beta, Cloudflare's Workers AI platform has now achieved general availability status. This means the service has undergone rigorous testing and improvement to ensure higher reliability and performance.
Cloudflare's Workers AI is an inference platform that enables developers to run machine learning models on Cloudflare's global network with just a few lines of code. Provides a serverless, scalable solution for GPU-accelerated AI inference, allowing developers to deploy pre-trained models for tasks such as text generation, image recognition, and speech recognition without managing infrastructure or GPUs. Make it available for use.
Workers AI allows developers to run machine learning models on Cloudflare's global network and leverage the company's distributed infrastructure to provide low-latency inference capabilities.
Cloudflare currently operates GPUs in more than 150 data centers and plans to expand to nearly all of its 300-plus data centers worldwide by the end of 2024.
Cloudflare has expanded its partnership with Hugging Face to now offer a curated list of popular open source models ideal for serverless GPU inference across its extensive global network. Developers can deploy models from Hugging Face with one click. This partnership makes Cloudflare one of the few companies offering serverless GPU inference for Hugging Face models.
There are currently 14 carefully selected Hugging Face models optimized for Cloudflare's serverless inference platform, supporting tasks such as text generation, embedding, and sentence similarity. Developers can simply select a model from Hugging Face, click “Deploy to Cloudflare Workers AI,” and instantly distribute it across Cloudflare's global network of over 150 cities where GPUs are deployed.
Developers can interact with LLMs such as Mistral and Llama 2 through a simple REST API. You can also create domain-specific chatbots with access to contextual data using advanced techniques such as search extension generation.
One of the main benefits of Workers AI is that it's serverless, meaning developers don't have to manage or scale GPUs or infrastructure and only pay for the resources they consume. This pay-as-you-go model makes AI inference more affordable and accessible, especially for small organizations and startups.
As part of the GA release, Cloudflare introduced several performance and reliability enhancements to Workers AI. Load balancing has been upgraded to route requests to more GPUs across Cloudflare's global network. This allows requests to be seamlessly routed to another city even if they have to wait in a queue at a particular location, reducing latency and improving overall performance.
Additionally, Cloudflare has increased the rate limit for most large language models from 50 requests per minute in the beta phase to 300 requests per minute. Smaller models have rate limits ranging from 1,500 to 3,000 requests per minute, further enhancing the scalability and responsiveness of the platform.
One of the most requested features for Workers AI is the ability to perform fine-tuned inference. Cloudflare has taken a step in this direction by enabling Bring Your Own Low-Rank Adaptation. This BYO LoRA technique allows developers to adapt a subset of a model's parameters to a specific task, rather than rewriting all parameters as in a fully fine-tuned model.
Cloudflare's support for custom LoRA weights and adapters enables efficient multi-tenancy in model hosting, allowing customers to deploy and access fine-tuned models based on custom datasets.
While there are currently some limitations, such as no support for quantized LoRA models and limits on adapter size and rank, Cloudflare is further expanding its fine-tuning capabilities and will eventually build on Workers AI. We plan to directly support fine-tuning jobs and fully fine-tuned models. platform.
Cloudflare also offers an AI gateway. It is a powerful platform that serves as a control plane for managing and controlling the use of AI models and services across your organization.
It sits between applications and AI providers such as OpenAI, Hugging Face, and Replicate, allowing developers to connect their applications to these providers with a single line of code change.
Cloudflare AI Gateway serves as the management and governance control plane for the use of AI models and services within your enterprise. It acts as a conduit between model providers and organizations, providing a streamlined way for developers to link their applications to these services with minimal code adjustments.
This gateway provides centralized control and enables a single interface for different AI services, simplifying integration and empowering your organization's consumption of AI capabilities. Boasts observability with extensive analytics and monitoring to ensure transparency into application performance and usage. Addresses important aspects of security and governance by enabling policy enforcement and access control.
Finally, Cloudflare has added Python support to Workers, a serverless platform for deploying web functions and applications. Since its inception, Workers has only supported JavaScript as the language for writing edge execution functions. The addition of Python allows Cloudflare to serve a large community of Python developers, allowing their applications to harness the power of Cloudflare's global network.
Cloudflare is challenging AWS by continually improving the capabilities of its edge network. AWS Lambda, Amazon's serverless platform, does not yet support GPU-based model inference, and its load balancer and API gateway have not been updated for AI inference endpoints. Interestingly, Cloudflare's AI Gateway includes built-in support for Amazon Bedrock API endpoints, providing a consistent interface for developers.
With Cloudflare extending the availability of GPU nodes across multiple points of presence, developers now have access to cutting-edge AI models with low latency and the best price/performance ratio. AI Gateway brings proven API management and governance to the management of AI endpoints from a variety of providers.
follow me twitter Or LinkedIn. check out my website.