Cloudflare takes on AWS by bringing serverless AI to the edge

Serverless AI inference

cloudflare

Cloudflare, a leading connected cloud company, recently announced the general availability of its Workers AI platform and several new features aimed at simplifying how developers build and deploy AI applications. This announcement represents a significant step forward in Cloudflare's efforts to democratize AI and make it more accessible to developers around the world.

After months of open beta, Cloudflare's Workers AI platform has now achieved general availability status. This means the service has undergone rigorous testing and improvement to ensure higher reliability and performance.

Cloudflare's Workers AI is an inference platform that enables developers to run machine learning models on Cloudflare's global network with just a few lines of code. Provides a serverless, scalable solution for GPU-accelerated AI inference, allowing developers to deploy pre-trained models for tasks such as text generation, image recognition, and speech recognition without managing infrastructure or GPUs. Make it available for use.

Workers AI allows developers to run machine learning models on Cloudflare's global network and leverage the company's distributed infrastructure to provide low-latency inference capabilities.

Cloudflare currently operates GPUs in more than 150 data centers and plans to expand to nearly all of its 300-plus data centers worldwide by the end of 2024.

GPU PoP location

cloudflare

Cloudflare has expanded its partnership with Hugging Face to now offer a curated list of popular open source models ideal for serverless GPU inference across its extensive global network. Developers can deploy models from Hugging Face with one click. This partnership makes Cloudflare one of the few companies offering serverless GPU inference for Hugging Face models.

There are currently 14 carefully selected Hugging Face models optimized for Cloudflare's serverless inference platform, supporting tasks such as text generation, embedding, and sentence similarity. Developers can simply select a model from Hugging Face, click “Deploy to Cloudflare Workers AI,” and instantly distribute it across Cloudflare's global network of over 150 cities where GPUs are deployed.

Single-click expansion of hugging face model

cloudflare

Developers can interact with LLMs such as Mistral and Llama 2 through a simple REST API. You can also create domain-specific chatbots with access to contextual data using advanced techniques such as search extension generation.

One of the main benefits of Workers AI is that it's serverless, meaning developers don't have to manage or scale GPUs or infrastructure and only pay for the resources they consume. This pay-as-you-go model makes AI inference more affordable and accessible, especially for small organizations and startups.

As part of the GA release, Cloudflare introduced several performance and reliability enhancements to Workers AI. Load balancing has been upgraded to route requests to more GPUs across Cloudflare's global network. This allows requests to be seamlessly routed to another city even if they have to wait in a queue at a particular location, reducing latency and improving overall performance.

Additionally, Cloudflare has increased the rate limit for most large language models from 50 requests per minute in the beta phase to 300 requests per minute. Smaller models have rate limits ranging from 1,500 to 3,000 requests per minute, further enhancing the scalability and responsiveness of the platform.

One of the most requested features for Workers AI is the ability to perform fine-tuned inference. Cloudflare has taken a step in this direction by enabling Bring Your Own Low-Rank Adaptation. This BYO LoRA technique allows developers to adapt a subset of a model's parameters to a specific task, rather than rewriting all parameters as in a fully fine-tuned model.

Cloudflare's support for custom LoRA weights and adapters enables efficient multi-tenancy in model hosting, allowing customers to deploy and access fine-tuned models based on custom datasets.

While there are currently some limitations, such as no support for quantized LoRA models and limits on adapter size and rank, Cloudflare is further expanding its fine-tuning capabilities and will eventually build on Workers AI. We plan to directly support fine-tuning jobs and fully fine-tuned models. platform.

Cloudflare also offers an AI gateway. It is a powerful platform that serves as a control plane for managing and controlling the use of AI models and services across your organization.

It sits between applications and AI providers such as OpenAI, Hugging Face, and Replicate, allowing developers to connect their applications to these providers with a single line of code change.

Cloudflare AI Gateway serves as the management and governance control plane for the use of AI models and services within your enterprise. It acts as a conduit between model providers and organizations, providing a streamlined way for developers to link their applications to these services with minimal code adjustments.

This gateway provides centralized control and enables a single interface for different AI services, simplifying integration and empowering your organization's consumption of AI capabilities. Boasts observability with extensive analytics and monitoring to ensure transparency into application performance and usage. Addresses important aspects of security and governance by enabling policy enforcement and access control.

Finally, Cloudflare has added Python support to Workers, a serverless platform for deploying web functions and applications. Since its inception, Workers has only supported JavaScript as the language for writing edge execution functions. The addition of Python allows Cloudflare to serve a large community of Python developers, allowing their applications to harness the power of Cloudflare's global network.

Cloudflare is challenging AWS by continually improving the capabilities of its edge network. AWS Lambda, Amazon's serverless platform, does not yet support GPU-based model inference, and its load balancer and API gateway have not been updated for AI inference endpoints. Interestingly, Cloudflare's AI Gateway includes built-in support for Amazon Bedrock API endpoints, providing a consistent interface for developers.

With Cloudflare extending the availability of GPU nodes across multiple points of presence, developers now have access to cutting-edge AI models with low latency and the best price/performance ratio. AI Gateway brings proven API management and governance to the management of AI endpoints from a variety of providers.

follow me twitter Or LinkedIn. check out my website.

Janakiram MSV is an analyst, advisor and architect at Janakiram & Associates. He was the founder and CTO of Get Cloud Ready Consulting, a niche cloud migration and cloud operations company acquired by Aditi Technologies. Through speaking, writing, and analysis, he helps companies take advantage of emerging technologies.

Janakiram is one of the first few Microsoft certified Azure professionals in India. He is one of the few professionals who holds the Amazon Certified Solutions Architect, Amazon Certified Developer, and Amazon Certified SysOps Administrator qualifications. Janakiram is a Google Certified Professional and his Cloud Architect. He has been recognized by Google as a Google Developer Expert (GDE) for his expertise in cloud and his IoT technologies. He was awarded the title of Most Valuable Professional and Regional Director by his Microsoft Corporation. Janakiram is an Intel Software Innovator, an award given by Intel for his contributions to the community in AI and IoT. Janakiram is a visiting faculty member at the International Institute of Information Technology (IIIT-H), where he teaches Big Data, Cloud Computing, Containers, and DevOps to students enrolled in the master's program. He is an ambassador for the Cloud Native Computing Foundation.

Janakiram was a senior analyst with the Gigaom Research analyst network, where he analyzed the cloud services landscape. During her 18-year corporate career, Janakiram worked for world-class product companies such as Microsoft Corporation, Amazon Web Services, and Alcatel-Lucent. His last role was as a technology evangelist where he joined AWS as its first employee in India. Prior to that, Janakiram worked at Microsoft Corporation for over 10 years, where he was involved in sales, marketing, and promotion of Microsoft Applications platforms and tools. At the time he retired from Microsoft, he was a cloud architect with a focus on Azure.

What's Hot

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

Cloudflare takes on AWS by bringing serverless AI to the edge

Opinion | Loneliness is a problem that AI cannot solve

Why we need Gemini AI to improve the Assistant on Google's Nest speakers

Two simple rules to guide all your AI efforts

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

AI-powered SEO software market [2024-2031] Size, Trends, Sales, Revenue Forecasts HubSpot. Marketo. Oracle – Economica

AMD Ryzen AI CPU beats Intel Core Ultra in AI LLM and GenAI benchmarks, delivers lower power consumption and lower cost with XDNA

Microsoft investigates harmful AI-powered chatbot 'Copilot'

AnkerWork S600 review: An AI-powered speakerphone that actually works

Our Picks

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

Most Popular

OnlyFans creator dishes dirt on dating

Anya Taylor-Joy has big plans to rival Gwyneth Paltrow's £197m business Goop as she prepares to launch a lifestyle business

OnlyFans star suffers from online stalking by family member: 'It hurts my stomach'

Subscribe to Updates

What's Hot

Cloudflare takes on AWS by bringing serverless AI to the edge

Related Posts