AI's major alien concept is transparency
The fast-paced development of AI has a major problem: users' privacy became an afterthought. In this post, I am exploring the intricate world of AI partnerships and data usage, from OpenAI's Data Partnerships to Slack's AI features.
July 8, 2024 - added: Perplexity, Figma, and OpenAI.
AI is the talk of the town. The news are packed with reporting on which apps implemented their new AI integrations, which companies are trying to change the world with their new AI-powered idea, and who made data scraping deals with OpenAI. Regarding the latter, OpenAI introduced Data Partnerships in November 2023. In an announcement post, the company mentioned their interest in large-scale datasets that reflect human society and that are not already easily accessible online to the public today. With those datasets, OpenAI wants to create an open-source archive to train language models, but also prepare private datasets for training proprietary AI models. In the following months after that announcement, OpenAI partnered with the Icelandic Government, the Free Law Project, Axel Springer (one of the biggest publishers and owner of Business Insider, Politico, Protocol, among others), Le Monde and Prisa Media, the Financial Times, WordPress and Tumblr, Stack Overflow, and Reddit. Most of those deals are about using data provided by those platforms to train AI models and improve ChatGPT. However, as tech companies tend to use everything else besides clear language, it is unclear what kind of data is used in what way, or for how long the training of the AI models has already been a thing before announcing a new partnership.
The voices against those kind of partnerships are predominating. Once OpenAI and Stack Overflow announced their partnership to scrub the site's forum posts to train ChatGPT, loads of users started removing or editing their questions and answers to prevent them from being used to train AI. It was a form of protest which resulted with bans from the Stack Overflow moderators. Ben, who is a UI Programmer at Epic Games, shared his experience editing his most successful answers to try avoid having his work stolen by OpenAI, in a Mastodon post, highlighting that his account got suspended and his edited posts revoked.
Ultimately, transparency seems to be the major alien concept for any company operating within the field of AI or partnering with another AI-driven company. But why? It seems so easy to be transparent about the actions you take. If you need my data and you inform me about the reasons you need it, how you are going to process, store, and secure it, I am open to discuss whether my data can make a real difference. Just scraping data, train AI models, and leaving the people out who get forced to provide that data is not the right move.
It is fascinating to see this whole sector grow at an immense speed, while there are still so many uncertainties, privacy issues, and the fact that so many stuff is happening behind closed doors. Although, most of the products within that sector rely on users, it feels like they do not care about them. All they are caring about is data.
One of the most recent examples for that behavior is Slack. As loads of platforms and apps out there, Slack charged ahead with an AI vision. Slack AI's goal is to help you work smarter and save time with AI features. Among them, finding answers faster, summarising conversations, getting a daily recap of messages missed, and more. I am amazed how busy people at Slack must be that they think AI is the savior for those "tedious" tasks. In case you receive so many messages on Slack that you feel the need for a daily recap of the messages you have missed, you might need to rework your overall communication workflows. If you need human conversations in Slack summarised by AI, I feel sorry for you that you prefer some computer generated tone of voice instead of reading through that conversation to fully understand opinions and ideas provided by your colleagues.
As many other companies, Slack is using its own user data to train some of its AI services. And obviously, everyone is opted in by default. After a post on Hacker News went viral, people found out that the terms for the collection of user data to train its AI services are tucked away in an out-of-date and confusing privacy policy. It is unclear where Slack is applying its AI privacy principles. There is a clear lack of transparency. However, there is also a lack of brain cells, since Slack requires you to send an email to a specific support address in case you would like to opt out of allowing the scraping to train Slack's AI models. By the way, according to the mentioned privacy policy, Slack is using customer data to specifically train "global models" which Slack uses to power channel and emoji recommendations as well as search results. So, to summarize, you share your data with Slack to train AI models, which recommend you ... emojis. Let us pour billions of dollars into an industry that provides us and perfections emoji recommendations. Lovely. I know, I am getting cynical now.
Nevertheless, it is unclear to me how something like that can happen? How is this possible? How is this even legal? What drives those companies, those leaders? It does not feel like that they seek for change, innovation, or solving life-changing problems.
AI is developing fast, way too fast. It is developing that fast, that all those companies switch off their brains for the sake of banking millions of dollars and leaving user privacy aside. Getting back to the Slack example, in a piece by TechCrunch, they shared the following paragraph:
In a reply to one critical take on Threads from engineer and writer Gergely Orosz, Slack engineer Aaron Maurer conceded that the company needs to update the page to reflect “how these privacy principles play with Slack AI."
Slack has been entitled to be home of some of the smartest people in the world. Yet, the decision to ship AI-powered features, opt-in all your users to train "global models", scrape your users' data without informing them about those changes, not providing any insights how the data gets handled, prevent confusion and uncertainty, seems like one of the dumbest decision made.
The fast-paced world of AI development is no excuse for not caring about user data. Once our data gets involved, the least thing to expect from the company handling it, is transparency.
Examples of shady practices, unethical behaviors, and lack of transparency by AI-driven companies
WIRED did an in-depth investigation, that I highly recommend to read. But it does not end there, as Business Insider reported that OpenAI and Anthropic are doing the same shady stuff, Fast Company wrote about how Perplexity CEO Aravind Scrivinas responded to the plagiarism and infringement accusations, and WIRED followed up with more investigations.
After The Verge contacted OpenAI, they quickly shipped an update which now encrypts the chats. Obviously, it could be worse, but the fact that OpenAI only offers their macOS app via their own website instead of the macOS App Store where they would need to follow Apple's sandboxing requirements, makes me worried.
It does not feel like Figma thought about all the consequences this automatic opt-in could have for individual designers, agencies, and companies. Nevertheless, they received a quick reality-check recently, as they had to pause the "Make Design" AI feature as it got accused to be trained on existing apps. The feature faced criticism after it seemingly mimicked the layout of Apple’s Weather app.
Yet another example of shipping garbage instead of thinking it through.
Till next time! 👋
Support: Do you have a friend who is looking for inspiration, news about design, and useful tools and apps? Forward this newsletter to a friend or simply share this issue and show some support. You can also show some love by simply clicking the button down below and keep this newsletter a sustainable side-project by buying me a coffee. ☕️ 🥰
Discussion