Can you trust the company?

Every option here hands something to a company's servers — your files, or, with a login layer, your users' identities. This page is how to size up that company: what to check, where each one lands, and the patterns that repeat so you can read a new service yourself.

The honest framing: for most of what this wiki helps you share — a public site, a template, a finished tool, a doc — the answer is fine. The thing you publish is the thing you meant to hand out. The questions below bite hardest on what sits next to the public thing: an unpublished repo, regulated personal data behind a form, the emails and passwords of people who log in, your own working chats.

Last verified: 2026-06-07 · Confidence: high — each company entry is checked against that company's own docs; data residency is the recurring soft spot.

What's worth checking

A handful of questions settle almost every case. Skim them before you store anything you'd mind a stranger reading.

Does it train AI on what you put in? The big one. On free/personal tiers this is often on by default with a toggle to flip (Anthropic, OpenAI, Vercel, GitHub, Lovable) — or, with Replit and Hugging Face, decided by whether you make the repo public; on commercial tiers it's usually off by contract. Find the switch, know which side it's on — and note that two services here (Clerk, Hugging Face) are simply silent on it, which is its own answer. [confirmed] (varies by company — see below)
Whose data is it — yours or your users'? A file host (GitHub, Vercel, Google) holds your stuff. A login layer (Clerk) or an app host with a database (Fly) holds other people's identities — emails, names, passwords, records. That makes you the one legally responsible for it, and makes a signed data agreement matter more. [confirmed]
Can you delete it, and how long is it kept? Look for a deletion path and a retention window. The numbers range from ~30 days (most) to 90 (Clerk) to 180 (Google) — wider than you'd guess. Treat deletion as final and keep your own copy. [confirmed]
Free vs enterprise — what does paying actually buy? Rarely a better privacy stance (the no-training line usually applies to everyone). It buys paperwork and control: a signed data agreement, admin logs, region pinning. [confirmed]
Where does the data physically live? Almost always US by default. EU residency is usually an enterprise feature (two exceptions pick the region free: Fly, and Lovable — though Lovable's locks once the app is live), and a UK region is rarer still — only Fly (any plan) and GitLab's paid Dedicated tier pin to London. Even an in-region choice doesn't change whose law reaches the data. [confirmed]

This is informational, not a warning. Most readers can note these and move on; they matter when you're handling someone else's regulated data, not when you're publishing a workshop schedule.

Per company

Each entry below is checked against that company's own terms, privacy policy, and docs — with the exact toggle named where one exists.

Anthropic (Claude) — host for a Claude Artifact. Personal plans train on your chats unless you switch it off (Help Improve Claude → off); commercial plans never do.
OpenAI (ChatGPT) — where a lot of this work gets drafted, not a host for anything you publish here. Personal plans (Free/Plus/Pro) train on your chats unless you switch it off (Improve the model for everyone → off); business plans (Team/Enterprise/Edu/API) never do.
GitHub — host for a GitHub repo. Doesn't mine your private repos — except code Copilot sees while you work, on by default since 2026-04-24 with a one-click off switch.
GitLab — the main alternative to a GitHub repo, under repo alternatives. Plainly never trains on your code — public or private — though its Duo assistant is on by default (named off switch). gitlab.com is US-only; you pin a region (EU/UK, incl. London) only on paid Dedicated, or by self-hosting, where GitLab holds nothing.
Google — host for a shared Google Doc. The most tier-split company here: paid Workspace has a written no-training promise; a free Gmail account's Gemini assistant trains by default (toggle named).
Vercel — host for deploying a website. Free (Hobby) tier turns AI training on by default since 2026-03-31; paid and Enterprise are off.
Netlify — host for deploying a website. No training unless you opt in — nothing to switch off — and you keep ownership of your content.
Cloudflare — host for deploying a website. Plainest stance here: doesn't train LLMs at all, no toggle to remember, same on free and Enterprise.
Fly.io — host for deploying an app. Runs your app and often its database, so it holds your users' data too. The one host that lets you pin the region (EU/UK) on any plan, no enterprise upgrade.
Replit — host for a shared Replit project. Holds your code and whatever your running app stores. The training question is decided by public vs private: a public project is MIT-licensed for anyone to copy and may train Replit's models; private is neither.
Notion — host for a shared Notion page. One of the clean exceptions: no training on any plan, by default, and it contractually binds the AI companies it uses (Anthropic, OpenAI) to the same — no toggle to police. A subscription product, not an ad business.
Clerk — the login layer for a gated website. Holds your users' identities — emails, names, passwords. Silent on AI training; US-only hosting.
Hugging Face — host for a Hugging Face Hub share. A creation-time choice sets visibility — public (openly licensed for anyone to copy and train on), private, or gated (public to find, but each downloader hands you their name and email). Silent on training your content; US-default, EU region on Team/Enterprise.
Lovable — host for a shared Lovable app. Runs your app and its backend (database, logins, uploads, on Supabase), so it holds your users' data too. Trains on your (de-identified) content unless you opt out; you pick the region (EU/US/APAC) free, but it locks once the app's backend is live.
Raycast — host for a shared Raycast extension. Holds little — account details, opt-in Cloud Sync, momentary AI prompts it never trains on or logs. The real exposure is your code, set by the publish choice: a public Store listing puts the source in a public GitHub repo for anyone to copy; an Organization keeps it to your team. US-only, no published DPA.

Common considerations

These are the patterns that recurred across all fifteen entries above. Learn them once and you can read any new service's terms the same way.

Training splits by tier — and "silent" is now one of the answers

Across most companies, "does it train on my inputs?" has the same shape: free/personal tiers may train (often on by default, with a toggle); commercial/enterprise tiers don't, by contract. That split is now the clearest on the two model makers whose products you actually type into — Anthropic and OpenAI both train personal-plan chats unless you opt out (Help Improve Claude → off; Improve the model for everyone → off), and both stop entirely on a business plan. The same drift toward on by default shows on the hosts: Vercel flipped Hobby on (2026-03-31), GitHub flipped individual Copilot on (2026-04-24), Lovable trains on your de-identified content unless you opt out, and Google's consumer Gemini assistant trains by default outside the EEA/UK. Two companies decide it not by a settings toggle but by a publish/creation choice: Replit (a public project may train its models and is MIT-licensed for anyone to copy; a private one doesn't) and Hugging Face (a public repo isn't just findable — it carries an irrevocable open license, so anyone, including other AI labs, can legally download and train on it; private or gated is the only off-switch). In both, the decision is made the moment you create the repo. Cloudflare, Netlify, Notion, GitLab, and Raycast are the clean exceptions — no training on your content, same for everyone — though two of them still ship an AI assistant that's on by default (GitLab Duo, named off switch; Raycast AI, which it neither trains on nor logs), so "on by default" there means the feature is enabled, not that your data feeds a model. Notion goes one step further, contractually binding the AI companies it builds on (Anthropic, OpenAI) not to train on your content either, so the model makers' own personal-tier default doesn't leak back in through the side door. And two companies give a third kind of answer — their docs say nothing either way: Clerk has no stated program but no written promise it never happens, and Hugging Face is silent on whether your private uploads feed training, so its "they don't" reads from the absence of any training license, not a quoted promise. So with any new service: find the AI-training setting first; check which way it points on your plan; and if there's no statement at all, treat that as unconfirmed, not as a no. [confirmed] / [unclear]

One toggle, and "already used" can't be un-used

Where training is on, it's almost always one named switch that takes effect immediately (Anthropic: Help Improve Claude; OpenAI: Improve the model for everyone; Vercel: Team Settings → Data Preferences; GitHub: Copilot settings; Google: Gemini Apps Activity → Keep Activity off; Lovable: Settings → Privacy & security → Data collection opt out, self-serve on Business/Enterprise, by email to privacy@lovable.dev on Free/Pro). Replit and Hugging Face are the ones where the "switch" isn't in settings — keeping the repo private (or, on Hugging Face, gated) is what turns training off. Two things to carry over: the switch only stops future training runs — anything already fed into a model that's started training can't be pulled back out — and on a published/public share the training question barely applies anyway, because the artifact was meant to be public. [confirmed]

Whose data — and who's legally on the hook

A pattern the file hosts hide but the auth and app hosts make obvious: the company is almost always your processor, not the controller — you are the controller, legally responsible for the data, and the host just stores and processes it on your instructions. This is spelled out identically by Anthropic (enterprise), OpenAI (business), Google, Clerk, Fly, Replit, Notion, and Lovable. It barely matters when the data is your own public files. It matters a lot when the host holds other people's data: Clerk stores your users' emails, names, and passwords; Fly's, Replit's, and Lovable's databases store whatever your running app collects about its users (Lovable's backend runs on Supabase). With those, you answer for a breach, which is exactly why the signed data agreement (next) stops being optional. [confirmed]

Deletion is real but final — and the window is wider than "30 days"

Every company gives you a delete path and says deleted data goes. But the retention window varies more than you'd expect: ~30 days is the common one (Anthropic's back-end purge, OpenAI's deleted-chat window, Netlify's post-termination grace, Vercel's backup drop, Fly's account-deletion window, Replit's account-deletion purge, Notion's trash and account-deletion grace, Lovable's account-deletion window, the API window), and several carry a longer backup tail behind that headline number — Lovable's backups and logs persist up to 90 days and Clerk commits to 90 days after a contract ends, while Google promises 180 days including backups. Hugging Face and Raycast are the outliers that publish no fixed window at all — Hugging Face for live content, Raycast naming only the deletion trigger ("until your account is deleted") and not a timeframe — so assume deletion is final and keep your own copy before you delete. GitLab adds two wrinkles of its own: a paid account "may not be able to" be deleted while the subscription is live, and anything you contributed to a public project is embedded and "will not be able to delete," much like an irrevocable public license. Two caveats recur regardless of the number: deletion on their side is final — keep your own copy (no company here promises recovery), and a copy a recipient already downloaded is theirs to keep — deleting the original never reaches a clone on someone else's machine (sharpest on Hugging Face, where a public repo's open license is irrevocable). [confirmed] / [estimate] / [unclear]

Enterprise buys paperwork and control, not a different stance

On every host, the good privacy default (no training, no selling, you own your content) applies on the free tier too — paying doesn't unlock a better promise. What enterprise adds is consistent: a signed data agreement (DPA) with Standard Contractual Clauses, admin controls (SSO, audit logs, org policies), region pinning, contractual no-training in writing, and — for health data — a BAA. That's the bundle a compliance review asks for. A useful exception worth knowing: a few companies don't lock the paperwork behind a sales call — Fly and Vercel publish a DPA that covers everyone, and Fly offers its HIPAA BAA on request rather than only on an enterprise contract. If you're an individual sharing a public tool, free is genuinely fine; the enterprise extras are for teams handling regulated data. [confirmed]

US by default; EU residency is usually a paid lever; UK only on Fly and GitLab Dedicated; US law still reaches it

Storage is US-primary across the board, with the EU often in the mix for caching or backups. Pinning data to the EU is normally an enterprise feature (GitHub's data-residency tier, Cloudflare's Data Localization Suite, Google Workspace data regions on Business Standard+, Vercel's EU function region, OpenAI's business/Enterprise/Edu/API residency, Notion's Frankfurt/Ireland region, Hugging Face's EU region on Team/Enterprise, GitLab's region choice on its paid single-tenant Dedicated tier) — with two standout exceptions where you pick a region on any plan, no upgrade: Fly (EU/UK) and Lovable (EU/US/Asia-Pacific) — though Lovable's choice locks the moment the app's backend goes live, so pick Europe before you add real data if EU residency matters. At the other end, Replit has no EU region at all (US-primary, sometimes India), even on Enterprise, and Raycast is flatly US-only with no region choice on any tier — so "keep this data in the EU" is simply off the table on both. A UK region is rarer than an EU one but not absent: only Fly (London on any plan) and GitLab Dedicated (London on the paid single-tenant tier) actually offer one; everywhere else the EU boundary is the closest GDPR-aligned option, and transfers lean on SCCs / the EU–US Data Privacy Framework. One sharper point the app hosts surface: choosing an EU region controls where the bytes physically sit, not whose law reaches them — a US company can still be compelled under US law (the CLOUD Act) wherever the server is. Under GDPR the default US mix is usually fine for a public or personal site — name it only if a grant or DPA forbids data leaving the EU/UK, in which case the residency feature is your lever. [confirmed] / [estimate]

Where the docs go quiet

Two recurring blind spots. The first is data residency specifics — the place official pages most often stay silent, so several entries fall back to secondary sources marked unofficial. The second is whether a service trains on data at all — Clerk's docs simply don't address it, Fly publishes no explicit no-training promise (only an ownership clause that points the right way), and Hugging Face is silent on whether your private uploads feed training (the no-training read leans on the absence of any training license, not a quoted promise). A third, smaller pattern: a company can be clear on training yet quiet on the paperwork — Raycast states its no-training stance plainly but publishes no DPA or sub-processor list, names its EU/UK transfer safeguard only as "contractual protections" rather than Standard Contractual Clauses, and gives a deletion trigger without a timeframe; a compliance reviewer would have to ask directly. When a company's own docs don't state something, that's the honest answer: they don't say. Assume nothing the docs don't state, keep your own copy, and if a no-training guarantee is load-bearing for you, ask the company directly rather than inferring one. [unclear]

Sources

Provenance for each company sits on that company's own entry: Anthropic, OpenAI, GitHub, GitLab, Google, Vercel, Netlify, Cloudflare, Fly.io, Replit, Notion, Clerk, Hugging Face, Lovable, Raycast. Each cites the company's terms, privacy policy, and docs directly, and flags any secondary source as unofficial with the date seen.