Why should we (not) authorize AI solutions' developers to train their AI models on our public data?

11k views1 Upvote4 Comments
Sort By:
Oldest
CIO in Services (non-Government)3 months ago
There are several reason why we should or should not allow it.

Always take in to account any Privacy concerns. Public data often includes personal information about individuals. Allowing developers unrestricted access to this data for training AI models can compromise people's privacy rights. Even if the data is anonymized, there's always a risk of re-identification through advanced data linkage techniques.

Then there is potential misuse of data. Developers might use public data for purposes that are not in the public interest or go against ethical standards. There's a potential for data to be used in ways that harm individuals or groups, such as discriminatory practices in AI decision-making.

How do we control it? Developers might use public data for purposes that are not in the public interest or go against ethical standards. There's a potential for data to be used in ways that harm individuals or groups, such as discriminatory practices in AI decision-making.

Then we have the point on sensible AI. Using public data without explicit consent can raise ethical questions about fairness and justice. It may disproportionately benefit developers and tech companies without providing adequate benefits or protections to the individuals whose data is being used.

So yes you can use public data but take some guard rails in your framework so you don't have to struggle on justifying later on.
1
Principal Software Engineer, Data Engineering in Energy and Utilities3 months ago
It depends on what is expected out of the AI Models. 
1) If the output has to include both generic information and private information, public data training would help. For private information, RAG can be used.
2) If the output should not have hallucinated results and is more Org/User domain-specific, then RAG would be the best approach for contextual grounding.
Information Security Analyst in Governmenta month ago
Good Morning,  public data has data quality issues and we need to take that into consideration when building any AI model.  There are also unattended biases.  We've taken the approach to share city data publicly via chatbots but ensuring that we have controls in places to review and limit specific responses for public queries.  For example, we want users to focus on the scope of city services/data provided by city agencies, not necessarily other news outside the scope.  Start small and build incrementally.
lock icon

Please join or sign in to view more content.

By joining the Peer Community, you'll get:

  • Peer Discussions and Polls
  • One-Minute Insights
  • Connect with like-minded individuals
Senior Director - Partner Solutions in Consumer Goodsa month ago
This is a complex question with an even more complex response - short of saying - It Depends!!

Innovation, scientific and economic growth are direct factors which will be advantaged by allowing our public data to be trained on.  But, it is more complex and it depends come in because . . . 

- Let's say your data is public but has some personal information that may be subject to data privacy laws - who will be responsible?
- Let's say there are copyright considerations in your public data, what is your expectations on fair use?
- If you are in EU, it gets even more complicated with the life of data if indexed wrt GDPR

Content you might like

VP of Global IT and Cybersecurity in Manufacturing6 years ago
Have clear business requirements up front, make sure the proposal includes items such as scope, timeline, cost, resources.
Read More Comments
22.1k views3 Upvotes28 Comments

Open AI (Game Changer: adoption w/ChatGPT)41%

Google (Game Changer: inventor of Transformers, Bard)19%

Microsoft (Game Changer: real time BingGPT+Search plus enterprise enablement)19%

Meta (Game Changer: LLM that can run on single GPU)6%

Amazon (Game Changer: TBD)4%

X.AI / Elon Musk (Game Changer: TBD)3%

Baidu (Chinese tech giant, with GPT version released in March)2%

Someone completely new6%

View Results
46.7k views49 Upvotes15 Comments
CFO3 days ago
I recommend that you consider finding an outside third party to perform the audit.  I have had to do something similar with an unprofitable division/product line that reports directly to our CEO. We outsourced with Alvarez ...read more
1
130 views1 Comment
6 views

TCO19%

Pricing26%

Integrations21%

Alignment with Cloud Provider7%

Security10%

Alignment with Existing IT Skills4%

Product / Feature Set7%

Vendor Relationship / Reputation

Other (comment)

View Results
5.7k views3 Upvotes1 Comment