Is there anything I can do to prevent my company's paying customers from uploading reports or data feeds they purchase from my company into public or private LLM's once they have acquired the data from us?
Sort By:
Oldest
VP of IT in Real Estate3 months ago
Contractual restrictions is the first thing that comes to mind for me. Next would be to use some sort of watermark style content in your data to make it more detectable if your data appears in a public LLM. I'm not sure there is any way to know if your data is used in a private LLM as you would not have any access to see it. In a google search I made while considering this I found a highly technical PHD thesis style discussion of the problem here: https://www.chenwang.net.cn/publications/MeFA-TIFS22.pdfSr. Director, Enterprise Applications and IT Services3 months ago
Contractual restrictions are the only way. What is the actual concern? LLMs are session-guarded and stateless. This is no different than your data being uploaded to other public and private cloud services.CISO in Finance (non-banking)3 months ago
We could be concerned about loss of IP and loss of business opportunities if information we put behind a paywall becomes publicly available. Contractual restrictions are the first thing that came to mind as well, but interested if the peer community has any other suggestions. Thanks for the reply.
VP of IT in Finance (non-banking)3 months ago
if the purchase construct gives full ownership to the buyer then they can leverage as neededHead of Demand to Value Data, Digital & Technology in Healthcare and Biotech3 months ago
Impossible to stop, and of course this has legal implications based on your agreements, but you can do quite a few things to limit, prevent or at least create awareness if this is happening. Naming a few:1. Data Watermarking - embedded invisible markers for tracing to source
2. Encryption - so you need keys to decrypt and access (you can dig into types of encryption that can be used to allow how some data can be 'used' without encryption keys)
3. Access Controls / Audit Logging - agree more complex if you've 'sold' the data
4. Smart Contracts - using for example blockchain to support data usage policy activation and controls
I'm sure there are more - also depending on where you host the data you can apply ML Data Controls to change/anonymize data when it's pulled down, monitor usage etc
CISO in Finance (non-banking)3 months ago
Thank you for your input, everyone!