Taxonomy, Tech, and Transformation: Highlights from Taxonomy Boot Camp 2024

Dovecot was thrilled to have five representatives attend this year’s Taxonomy Boot Camp conference to share and learn about the latest developments in taxonomy, ontology, and semantics. 

  • Stephanie Lemieux, President, Conference chair, MC, and facilitator for a number of remarks and panels.
  • Michele Ann Jenkins, Senior Consultant, facilitated the Taxonomy 101 Workshop, presented “Aligning AI Approaches for Taxonomy & Tagging” and, in the Enterprise Search conference (co-located with Taxonomy Boot Camp), presented “The Role Taxonomies Can Play in Enterprise Search”
  • Katherine Black, Senior Consultant, facilitated the Taxonomy 101 Workshop
  • Melissa Knudtson Monsalve, Taxonomy Consultant
  • Stephanie Duncan, Taxonomy Consultant, presented “Future-Proofing Your Organization’s Taxonomy With a Governance Plan”

The event provided an incredible opportunity to dive deep into the latest developments in organizing and structuring information for a wide range of use cases. Now, after the dust has settled, we got together to share some key takeaways and insights.

Artificial intelligence (of course)

AI was, as expected, a big topic of discussion!

Katherine: Shannon Moore and Max Gaibort’s presentation “LLMs & ‘Human-in-the-Loop’ Taxonomy Development at EA Games” was a masterclass in how to integrate AI into taxonomy work. They demonstrated the painstaking process of manually taxonomizing user language and retagging content over two years, followed by using AI methods. It was an excellent real-world example of how AI requires significant manual effort for cleanup and validation to work effectively.

Michele: I agree. That and Rebekah Randle and Andy Fitzgerald’s “Using AI and ML to Do Taxonomy Heavy Lifting” were great because they showed actual in-production use of AI tools for classification. Like you, Katherine, I especially appreciated that they did not hedge about the huge amount of manual effort needed to get these projects off the ground and keep a close eye on everything. 

Katherine: Michele, you and Erik Lee gave a great presentation on this topic, “Aligning AI Approaches for Taxonomy and Tagging.” It was a pragmatic reality check on the effective use of AI. You and Erik really showed where and how it fits into our broader toolkit alongside less glamorous but reliable methods and tools.

Semantic debt

The idea of  “semantic debt” came up more than once, expanding on the idea of “technical debt” —first introduced by computer scientist Ward Cunningham in the early 1990s to explain the long-term costs and trade-offs associated with quick, suboptimal software development decisions—to the semantic structure. 

Katherine: Ahren Lehnert’s “Semantic Layers and the Ghost in the Machine” was great. Ahren highlighted the role of taxonomists as interpreters, and addressed issues like bias in taxonomies and semantic debt.

Michele: We’re always making hard decisions and sometimes you need to move forward with a less-than-ideal solution. That’s when you start to accrue “semantic debt.” Long-term planning needs to include time and resources for resolving semantic debt—going back and fixing things.

Melissa: Agreed, Michele. Ahren’s other presentation, “Stand Still Like the Hummingbird: Enterprise Taxonomy Strategy When Nothing Stands Still,” touched on exactly why that’s important: taxonomies are records of business continuity. As he put it, “Businesses change, but taxonomies are forever.”

Creative inspiration

A great part of every Taxonomy Boot Camp are the real-world solutions showcased in the diverse range of case studies. 

Melissa: This was a great year to gather inspiration for creative solutions and innovative ideas. For example, in “Extensible Taxonomies for Sustainability,” Marjorie Hlava shared a case study featuring a voting system for users to weigh in on tagging quality in open-access scientific journals. What an interesting way to get user input at the point of use.

Katherine: Richard Huffine’s presentation “Mastering Metadata with a Data Catalog” offered a fascinating look into the FDIC’s vast data resources and human resources and the innovative metadata models and data catalog developed by the library team in collaboration with Enterprise Knowledge to wrangle them.

Melissa: I also thoroughly enjoyed Laura Rodriguez and Melissa Casey’s “Untangling Credentialing: A Healthcare Use Case for Data and Metadata” with its behind-the-scenes look at how HealthStream engineered contextual display labels as a starting point for delivering more personalized experiences to healthcare professionals.

Michele: More broadly, I found the panels bringing together some of our brightest minds for informal discussions to be a real highlight. The discussions were both practical and inspiring. 

Taxonomists at work

At the heart of every taxonomy are the people who make it happen!

Melissa: I like to think about Dovecot as a small company powered by relationships, so I appreciated Thomas Stilling’s keynote “Be the Change: Your Taxonomy Expertise Can Help Drive Organizational Transformation” and its “four types of communicators:” promoters/persuaders, analysts, supporters, and controllers. People sometimes wear different hats and play different roles to communicate to keep a taxonomy effort moving forward.

Stephanie Duncan: Speaking of roles, Joyce van Aalten’s “Journey from a Minimal Viable Taxonomy to a Full Taxonomy” made a good case for limiting the role of SMEs in the early stages of development of a minimum viable taxonomy in favor of input from content creators/owners, then bringing SMEs on board to shift from an “MVT” to a full taxonomy.

Melissa: That’s smart, Stephanie. More isn’t always more.

Stephanie Duncan: I also appreciated Bonnie Griffin’s “Consulting from Within: Best Practices for the Solo Taxonomist.” As she pointed out, a substantial amount of work is introducing and re-introducing people to the taxonomy in various mediums.

Katherine: Yes. A standout insight from Duane Degler’s “Enabling Exploratory Discovery Through Taxonomy” was his advice for “selling taxonomy up”—finding emotionally resonant phrases and repeating them until they resonate across the organization.

Stephanie Duncan: Absolutely. According to Bonnie, it’s best to set aside talk of industry standards and best practices and focus on how taxonomy can ease specific pain points across the organization. We can amplify our efforts to “sell taxonomy up” by “cloning ourselves”: identifying allies who can accurately introduce taxonomy as well as we could.

Beyond taxonomies: knowledge graphs and enterprise search

More and more we see taxonomists needing to branch out into more advanced techniques and technologies.

Katherine: In “Enabling Exploratory Discovery Through Taxonomy”, Duane also explored how taxonomies and knowledge graphs shape user experiences, drawing on inspirational and immersive projects from the Georgia O’Keeffe Museum and the Texas Coastal Bend Collection.

Stephanie Duncan: I enjoyed hearing Duane’s thoughts on how knowledge graphs can be used to broaden a user’s view of content by linking related content and creating relationships (“horizontal navigation”), as well as creating exploratory and immersive experiences and motivating people to learn.

Michele: Over in the Enterprise Search & Discovery conference, I heard that we don’t need metadata because we can just understand all the content, we don’t need content because we can just have metadata and data in a graph, and we definitely don’t need users because they make everything difficult. (laughing) I’m glad we have that all sorted out! But, more seriously, there’s a clear need for taxonomists to continue to educate about and communicate the value of taxonomies even in very advanced technical ecosystems. There will always be a need for out expertise to bridge the gap between humans and machines.

michele-book-chapter

New Taxonomy Book, with a Chapter on Search by Dovecot’s Michele Ann Jenkins

michele-book-chapter

Right in the middle of all the craziness that was 2020, Helen Lippell reached out to me about writing a chapter for the taxonomy book she would be editing. I was excited to hear that she was looking for practical, accessible guidance and real-world examples. I was also eager for a distracting project that I could sink my teeth into. I had had the opportunity to work with Helen on a long term project with a couple of onsite/in-person (those were the days!) work sessions and I knew she would bring her deep expertise and wonderful approachability to the book.

In my chapter, I delve into taxonomy considerations for leveraging taxonomies in search and provide a detailed case study touching on the most common use cases. Taxonomists frequently say that “taxonomy can help search”, but just how and why is often glossed over.

Taxonomies: Practical Approaches to Developing and Managing Vocabularies for Digital Information, contains chapters from leading minds in the taxonomy domain covering everything in the taxonomy development lifecycle from business buy-in and scoping, to implementation and governance.

Taxonomies is available for purchase now in physical and digital formats through the UK Publisher and will be available for North American sales later this summer.
https://www.alastore.ala.org/taxonomies
https://www.facetpublishing.co.uk/page/detail/taxonomies/?K=9781783304813

OHCHR Topic Page

International Non-Profit Launches a New Taxonomy-powered Website

Dovecot is pleased to announce that, after an enormous technical and editorial effort, a large-scale, international non-profit has launched their new, taxonomy-powered website. This Drupal site features over 20,000 pages of HTML content and tens of thousands of digital assets supporting the crucial and sensitive work of the organization across the globe.

In this site, taxonomy drives complex content aggregation and dynamic placement as well as search and filtering.

Dovecot will be presenting “Optimizing the haystack: Improving findability in content-heavy websites” with partners Bluestate and Axelerant at DrupalCon 2022 in Portland, Oregon on April 26. Be sure to say hi if you are able to attend!

Read our case study for more on how we helped this large international non-profit with taxonomy harmonization and development.

Word frequency code example

DIY Text Analysis for Taxonomists

Taxonomy, at its heart, is about making connections between concepts and labels. On the conceptual side taxonomy design requires analyzing and understanding users’ needs and mental models. On the label side there is the body of content (or “corpus” in info science speak), which may be quite large, running to millions of words (or more!). Getting a handle on that much text can be challenging for a human mind, but luckily we live in a time with technology that doesn’t break a sweat running millions of processes.

Text analysis and processing can be useful for a number of common taxonomy development tasks including:

  • Text mining for candidate terms & synonyms
  • Search log analysis
  • Statistical analysis of current metadata use (e.g. from a CMS database export)
  • Term extraction (e.g. from product names or article titles)
  • Data clean up or transformations
  • Aggregation or separation of values based on different criteria
  • Mapping free text to new controlled taxonomy terms
  • Summarizing labels used in a folder structure
  • Replacing a subset of terms
  • Frequency analysis (seeing how many times any term from a list appears in a corpus)

There are a number of high end, enterprise grade applications available for purchase or as a service that advertise advanced analysis, complex machine learning algorithms, and dazzling visualizations. But not everyone has the resources, or need for that level of support. Luckily, there are many approaches that can do a lot of the heavy lifting and provide very useful results using readily available tools that you probably already have on hand.

Excel / Open Office / gSheet are all different spreadsheets with the same core functionality including the ability to use formulas, pivot tables, and extend them with more complex programming or plug-ins (sometimes requiring additional purchase).

Command line tools are available natively in all Linux and Mac OS computers and can be added to Windows (free!). Many of these commands take only a few minutes to learn and have the added advantage of being able to apply them to multiple files (or an entire directory). Command can be combined or chained together to form more complex processes. For example “uniq -dc | sort” will return all the lines in the file that occur more than once, along with a count, and then pass that to the sort function which will sort them alphabetically.

Scripting (simple programming) may seem daunting but, with a very basic introduction to the overall approach (i.e. how to create and run code), there are so many examples available with a quick Google search, there is almost never a need to actually write code. The most common programming language for simple text manipulation is Python. Just search “normalize text python” and then cut and paste the results:

  • # convert to lower caselower_
  • string = string.lower()
  • print(lower_string)

To see examples of each of these approaches and learn more about DIY Text Analysis, check out my presentation from Taxonomy Boot Camp. You can use this as a cheat sheet for all the most useful operations to use in your taxonomy work.