• Clément Delangue heads Hugging Face, an open-source-AI company backed by Amazon and Nvidia.
  • It aims to make AI accessible to everyone, instead of letting it be controlled by any one tech firm.
  • In this Q&A, Delangue discusses open source vs. OpenAI, training-data ethics, and the future of AI.

Hugging Face's cofounder and CEO, Clément Delangue, wants to make artificial intelligence accessible to every company.

It's an open-source platform where scientists, researchers, and engineers build, train, and deploy AI models. With his company, Delangue wants to follow in the footsteps of companies such as Red Hat by making open-source AI a profitable endeavor.

"If we don't support openness, open science, and open-source AI, just a few companies will be able to do it," Delangue told Business Insider.

Hugging Face's product is mostly free, though it has a paid premium version primarily used by large companies. Investors include Amazon, Google, Nvidia, IBM, and Salesforce. In August, it announced it raised $235 million at a $4.5 billion valuation.

The recent chaos at ChatGPT's creator, OpenAI — during which its CEO, Sam Altman, was fired and rehired after all its employees said they would quit — caused some partners to start looking for a "plan B" for their AI-model needs, BI recently reported. The episode has made open-source models look more attractive because they don't rely on a single company that could suddenly lose all its employees.

The volatility around Altman's ouster is an opportunity for the open-source community, Giada Pistilli, Hugging Face's principal ethicist, told BI.

"We shouldn't put in the hands of a small company the future of AI in general. Even if AGI would be technically possible to achieve one day, it would be better to be distributed," Pistilli said, referring to artificial general intelligence, or the ability for AI to achieve complex human capabilities such as common sense and consciousness.

The following Q&A is from September when Delangue spoke with BI for its "AI 100" list.

This interview has been edited for length and clarity.

Help us understand who your customers are and which companies might have a large number of projects hosted on Hugging Face.

We're basically the most used open platform for AI builders. We have over 2 million AI builders using us to build AI features, workflows, or products. For example, we have thousands of users from Google, from Microsoft, from Amazon, all the way down to small companies, small startups, building with AI. We have over 20,000 organizations using us, so we are participating a lot in this democratization of artificial intelligence.

What do you see your customers using Hugging Face's AI models for most?

It's very broad. Because of the usage, there are over 1 million models, datasets, and apps that have been shared on the platform. It's a range of domains and use cases, including chatbots, autocomplete summarization, image analysis or generation, audio generation, video generation, biology, and chemistry. So AI is really becoming the new default to build all tech. We're really seeing that on the platform, in the sense that every single technology product today is starting to include some form of AI.

Are there any particular types of projects where you've seen a big rise in recent adoption?

That's a good question. We've been super excited about what is called "multimodal." So for example, a model that has been released recently called IDEF1X takes an image and analyzes this image, and you can ask questions and have conversations about this image. So it's been exciting to see this new capability. We're starting to see more and more applications in biology and chemistry that we're excited about.

Do you think that open-source chatbots will ever overtake ChatGPT?

I think they fit different needs, right? I think open source is great when companies want more control, when they want more privacy, when they want to optimize things for their own use cases, when they want to specialize and customize chatbots, for example, for customer support, right? You don't need something like ChatGPT that is going to tell you the meaning of life when you're doing, for example, a banking chatbot or customer-success chatbot. So they fit different needs, but it's good to have both for companies and for the field in general.

We know you've gotten funding from Google and Amazon and Nvidia. How are you trying to build a sustainable business model? And how do you manage computing costs?

We have an interesting company model, kind of like a freemium model, where most of the platform is open source and free and then some of the platform is paid and premium, especially for big companies or usage of the platform in private mode. And so the premium paid enterprise revenue funds the free open-source usage, especially around compute. So we have a very economically sustainable model. We historically haven't had to raise as much money as some of the other AI startups. Some of them raised, like, $1 billion or more than that. Fortunately for us, we've been focusing on a more sustainable model.

Would you say that Hugging Face is "GPU-poor?" How many GPUs does it have? Does that even matter, and why?

No. I mean this differentiation is a bit simplistic, I would say, because what's the point in having a lot of GPUs if you don't do much with them? So it's not so much about how many GPUs you have but the quality of the models of the science that you create and how positive it is for the field. So that's kind of the most important thing. We've been lucky to have trained some of the most impactful models out there — for example, StarCoder, which is the best coding open-source model, something like IDEF1X that I mentioned before, which is the best open-access multimodal model.

We've also trained large language models in the past, so that's kind of what matters most. We have as many GPUs as we need, as we want to use, but it's important for us to make sure we use these GPUs for good impact for the field and for the community.

Should AI companies pay for training data?

This is a very complex question that is being worked out now. We are kind of, like, in a new world, and the previous rules are a bit unclear in this new world. So it's good that we're asking the question now and trying to find the right way to do it. We've introduced this concept of opt out and opt in for datasets. So that's been kind of, like, an interesting experiment, an interesting initiative for us in this specific topic. There's much more to do there, and I'm excited for what is going to be done in the future on this topic.

Do you think Common Crawl and others scraping the internet and using all that data for AI-model training is ethical and fair use? What do you think will happen to the web if this is allowed to continue?

It's a very complex question that doesn't have a simple answer. What's ethical or not depends a lot on the use case, on the values that you take into account. We need more transparency because I think the starting point for all these conversations should be, "Do we know what models have been trained on? So are the datasets that have been used public or not? Has it been disclosed what these models have been trained on or not?"

We've been pushing a concept called model cards and data sheets, which is kind of, like, this idea of documenting the datasets and the data sources of models so that we can take that into account and then take the appropriate measures. That's kind of, like, the big thing that we're pushing on the topic, create more transparency so that we can find the right balance between building capabilities but, at the same time, rewarding the content creators, making sure that there's an incentive for them to create content and that they get the fair rewards for their work.

Are you a tech-industry employee, or do you have insight to share?

Contact Ashley Stewart via email ([email protected]), or send a secure message from a nonwork device via Signal (+1-425-344-8242).

Read the original article on Business Insider