How taxonomies, ontologies, and knowledge graphs both unlock and ground generative AI
I had the honor of presenting at Semantic Data New York 2025: Taxonomy, Ontology, and Knowledge Graphs, on October 14. This event, co-located with DAM New York 2025 and now in its second year, showcased semantic data as a powerful strategic complement (or “gateway drug,” as Madi Weland Solomon put it) to generative AI. Speakers explored how it can unlock the potential hidden in Pandora’s generative AI black box while managing the risks it carries and upholding truth, trust, and transparency in information. We also demystified the pathway to semantic maturity and reassured those just getting started with taxonomies that they can start small and leave plenty of room to grow by leaning on core semantic standards. Here are my top four takeaways from the conference.
1. Semantic data opens up the potential of generative AI
- Ashleigh Faith’s How Asset Tags help make Better AI offered practical guidance on using semantic data to enhance generative AI outputs.
- Tracy Forzaglia shared a case study of a publishing company in which standards (which vary from U.S. state to U.S. state), AI, and human editors converge to optimize a tagging workflow.
2. Semantic data mitigates risk by giving generative AI guardrails
- Faith also reminded us that LLMs are designed to hallucinate and fill gaps with plausible content, prioritizing fluidity over facts, and called attention to accurate but vague outputs as an underestimated problem.
- Ahren Lehnert’s Language, Semantics, and the Shaping of “Truth” cautioned that, where use cases carry high risk (for example, legal liability or reputation damage), we should tread lightly with experimentation and heavily with stakeholder involvement.
- Laura Rodriguez also stressed the importance of clear, consistent, and persistent communication with stakeholders as a critical tool to maintain semantic integrity in Creating Taxonomy Governance in a Platform Environment.
- In Information Quality is Information Ethics, Gary Carlson introduced a practical tool to mitigate risk: an information quality and ethics framework for identifying threats to information quality (i.e., risks of entropy or noise) at every stage in the information life cycle.
3. Semantic data is a spectrum, not a big bang
- Jessica Talisman’s The Ontology Pipeline Explained included a breakdown of her Ontology Pipeline model, which framed semantic maturity as a step-by-step process, as well as her back-to-basics messages that “metadata is the love language of the enterprise” and it’s perfectly fine to start with a spreadsheet.
- In Just Enough Semantics, I gave practical tips for organizations to start (or continue) moving along the semantic data spectrum, root their semantic efforts in standards (namely, ANSI/NISO Z39.19 and SKOS), and conscientiously work around common system constraints without compromising semantic integrity.
4. Semantic data’s future: predictions and guesses
- Lehnert’s talk also anticipated a movement towards gaining “semantic literacy” and proving generative AI’s ROI with metrics that weigh the positives, such as time saved, against the negatives, such as time wasted on “data slop.”
- In Yale’s Cultural Heritage Knowledge Graph – Lessons Learned, Robert Sanderson predicted that “AI will never create good ontologies because it doesn’t have a worldview.”
Semantic Data New York 2025 was an excellent precursor to Taxonomy Boot Camp, which will be celebrating its 20th birthday this November. I look forward to continuing the conversation around taxonomies, ontologies, and knowledge graphs, including generative AI’s unfolding impacts on how information is organized and the perennial importance of building trust, fostering transparency, upholding ethics, mitigating risk, and proving value.