Is there anything I can do to prevent my company's paying customers from uploading reports or data feeds they purchase from my company into public or private LLM's once they have acquired the data from us?

AI & Machine Learning Data Governance Generative AI+1 more

1.1k views6 Comments

Sort By:

Oldest

VP of IT in Real Estate3 months ago

Contractual restrictions is the first thing that comes to mind for me. Next would be to use some sort of watermark style content in your data to make it more detectable if your data appears in a public LLM. I'm not sure there is any way to know if your data is used in a private LLM as you would not have any access to see it. In a google search I made while considering this I found a highly technical PHD thesis style discussion of the problem here: https://www.chenwang.net.cn/publications/MeFA-TIFS22.pdf

Sr. Director, Enterprise Applications and IT Services3 months ago

Contractual restrictions are the only way. What is the actual concern? LLMs are session-guarded and stateless. This is no different than your data being uploaded to other public and private cloud services.

1 1 Reply

CISO in Finance (non-banking)3 months ago

We could be concerned about loss of IP and loss of business opportunities if information we put behind a paywall becomes publicly available. Contractual restrictions are the first thing that came to mind as well, but interested if the peer community has any other suggestions. Thanks for the reply.

Please join or sign in to view more content.

By joining the Peer Community, you'll get:

Peer Discussions and Polls
One-Minute Insights
Connect with like-minded individuals

VP of IT in Finance (non-banking)3 months ago

if the purchase construct gives full ownership to the buyer then they can leverage as needed

Head of Demand to Value Data, Digital & Technology in Healthcare and Biotech3 months ago

Impossible to stop, and of course this has legal implications based on your agreements, but you can do quite a few things to limit, prevent or at least create awareness if this is happening. Naming a few:

1. Data Watermarking - embedded invisible markers for tracing to source
2. Encryption - so you need keys to decrypt and access (you can dig into types of encryption that can be used to allow how some data can be 'used' without encryption keys)
3. Access Controls / Audit Logging - agree more complex if you've 'sold' the data
4. Smart Contracts - using for example blockchain to support data usage policy activation and controls

I'm sure there are more - also depending on where you host the data you can apply ML Data Controls to change/anonymize data when it's pulled down, monitor usage etc