Best Practices for Inclusive AI Testing

Written by Jeffrey Howard | Aug 8, 2025 3:06:52 PM

Why AI Accessibility Testing Matters

New technology doesn’t always enhance accessibility. That remains true with the proliferation of artificial intelligence (AI) and large language models (LLMs). While many of the mainstream AI tools, such as ChatGPT, Gemini, Claude, and Perplexity, are considered “relatively accessible,” their inclusiveness is only “incidental,” says Darrell Hilliker, an accessibility support engineer at Salesforce.

Trouble occurs when a platform “doesn’t have an accessibility statement or public comment,” he explains. “So we disabled people could lose access tomorrow, and it would just be taken away from us without any recourse.”

For professionals with disabilities, this uncertainty isn’t a mere inconvenience. They invest time building their lives and workflows around a new tool, with no promise that it will remain accessible.

“There's nothing worse than having an accessible app that you start to rely on and then the developer updates it and breaks things, or doesn't fix them,” says Hilliker, who has been blind since birth due to congenital glaucoma and uses assistive technology. “It’s critical that we’re afforded access not only to specialized AI apps designed specifically for disabled people, but also to all apps designed for everyone.”

Companies are becoming more accessible, designing tools to meet the diverse needs of users, regardless of their disability—and AI is improving accessibility for professionals with disabilities.

Too often, and paradoxically, however, fears of litigation and the quest to be technically compliant motivate AI and product development, leaving little thought for the impact their designs will have on people with disabilities.

An organization, for instance, will conduct an accessibility audit after its AI tool has been developed, focusing on “dry technical topics,” but he believes teams need to keep impact in mind from the beginning and concentrate on what’s at stake in real human terms:

“As designers, developers, thought leaders, whatever your role is, you get to make a stark decision,” he says. “If you don't consider accessibility, then you are denying disabled people the ability to live, learn, and work.”

Key Takeaways

AI accessibility remains precarious. Most mainstream AI tools are only "incidentally" accessible, meaning disabled users could lose access without warning or recourse, forcing them to rebuild workflows they've invested time developing.
Start accessibility testing early, and don’t leave it as an afterthought. Salesforce's "shift left" approach integrates accessibility research and usability testing from the beginning of AI development, rather than conducting technical audits only after the tool is built. This prevents expensive fixes and ensures products don't exclude people with disabilities from the start.
Engage disabled users throughout the development process, not just at the end. Inclusive AI testing requires involving people with disabilities as usability testers during research phases, not just collecting anecdotal insights from team members. Salesforce uses external focus groups with disabled users to evaluate design mockups and validate solutions, ensuring products meet real accessibility needs. While individual perspectives help, larger sample sizes during development provide the robust feedback necessary to meet the highest accessibility standards.
Integrate WCAG recommendations as your accessibility baseline. The Web Content Accessibility Guidelines (WCAG 2.2) provide essential criteria for accessible AI development, covering everything from font size and color contrast to media timing and non-text input/output—ensuring all users can perceive, operate, understand, and access AI tools.
Remember the human impact behind technical requirements. Inclusive AI testing fundamentally concerns whether disabled people deserve consideration in design decisions, making it about empathy and human dignity, not just legal compliance.

It’s All Accessibility Testing, If You’re Doing It Right

The good news: If you’re already conducting accessibility testing of other products, then inclusive AI testing shouldn’t be all that different, says Hilliker, who has been advocating for digital accessibility since the 1980s. The fundamentals remain the same: input, processing, and output.

If someone uses assistive technology, such as a keyboard-only device, screen reader, switch device, or voice recognition, can they provide input to the AI tool? Does it deliver an accessible output? Can they prompt and receive the response successfully?

“At the end of the day, it's another technology, just like podcasting when it was new, or smartphones, which have been around for a while,” he says. “These kinds of technologies disrupted the old way of doing things. From that perspective, I'm pretty sure AI is not all that different.”

Inclusive AI testing, however, has one glaring issue that designers and developers need to be cognizant of: bias in how AI models are trained.

“GAI [Generative Artificial Intelligence] tools can be useful and help people meet currently unaddressed access needs, but we need to acknowledge that risks such as bias exist, and be proactive as a field in finding accessible ways to validate GAI outputs,” says University of Washington’s Kate Glazko, a doctoral student in the Paul G. Allen School of Computer Science and Engineering.

In a 2024 study, she observed recruiters using ChatGPT to review resumes. Candidates with disability-related honors and credentials received lower marks. For example, when pressed to justify its rankings, ChatGPT claimed a resume with an autism leadership award had “less emphasis on leadership roles,” reinforcing the harmful belief that neurodivergent professionals are poor leaders.

Experienced product developers know that biases can also slip into non-AI tools, as well. So, either way, your development and design team should account for it and seek ways to address the biases that reinforce stereotypes and produce exclusionary user experiences.

Inclusive AI Testing Best Practices

Your team can confront these issues head-on by embracing a few inclusive AI testing best practices.

Start With Accessibility

The key to inclusive AI testing resides at the beginning of the development process. Salesforce refers to this as its “shift left” approach, starting with accessibility research and usability testing, rather than leaving it an afterthought, a mere audit once you’ve built your new AI tool.

Inclusive AI testing, however, is not an overlay or brief technical checklist you conduct at the end of product development. The earlier you integrate usability testing with disabled users, the more you’ll avoid expensive fixes later on—and the more confident you’ll be that your product doesn’t exclude those with disabilities.

Engage People With Disabilities Earlier in the Process

In technical terms, this means employing usability testers with disabilities at the beginning. Conduct research. Ask them about the barriers they encounter, as well as the challenges your tool helps them overcome. Listen to their frustrations, or what delights them, and observe how they interact with your AI tool.

While anecdotal insights from team members with disabilities will certainly help, you typically need a much larger sample size to ensure you meet the highest accessibility standards.

Salesforce, for instance, enlists external test users with disabilities to evaluate their digital products’ usability during the research phase. This is especially helpful when developing new features and the global tech company is uncertain which solution will be the most accessible.

“We use these focus groups to put designs in front of people with lived experience,” explains Shlomit Shteyer, Director of Technical Program Management at Salesforce. “We show them mockups of a few options, and they tell us in interview fashion what works and what doesn’t. Then we can bake that into the design and define the appropriate solution.”

Accessibility-forward organizations will also conduct thorough usability testing later in product development to validate how well their solution meets the needs of people with disabilities. You can’t accessibility test too much.

Integrate Web Content Accessibility Guidelines (WCAG) Into AI Product Development

First published in 1999, these protocols ensure your digital products and platforms abide by accessibility best practices. The most recent edition was released in December 2024, WCAG 2.2. These criteria address font size, color contrasts, media timing, and how non-text is input and output, to name a few.

Accessible design abides by four WCAG principles, making sure everyone can access and interact with web content. All users must be able to:

Perceive the information being presented (Perceivable)
Operate the interface (Operable)
Understand the information as well as the operation of the user interface (Understandable)
Access the content as technologies advance (Robust)

Your team should have a deep, working knowledge of these standards, consulting them throughout AI development.

While imperfect, Hilliker asserts WCAG is “the baseline for accessibility” and making AI more inclusive: “I know people say, ‘Well, something can be hard to use or inaccessible and still meet WCAG.’ Yeah, okay. But we've got to start somewhere, right? WCAG is that starting point.”

Don’t Overlook the Importance of Representation in AI Testing

The gold standard in user testing demands active inclusion of diverse perspectives. Bringing users with disabilities to the table isn’t enough. Inclusive AI testing involves integrating members of many marginalized communities and minority groups.

“If you're going to make a product that people want to use, you probably don't want to just have a few guys sitting in a basement or a garage designing it,” he remarks. “You want to have other types of people participating in that whole ideation and design process.”

This will also help correct against cultural biases and persistent stereotypes, which remain a major issue for LLMs that power many AI-powered solutions and AI-generated content.

According to a late 2024 study of LLMs, 54 experts in cultural studies, linguistics, and sociology observed biased incidents from single prompts occur 86.1% of the time. These biases included gender, geographical/national bias, and socio-economic bias. For example, in Japan, bias showed up in responses concerning socio-economic status, implying that good employment was only possible if you graduated from a prestigious university.

There’s also growing concern, for similar reasons, that AI is already leaving non-English speakers behind, as well.

Remember: Inclusive AI Testing Is About People

It’s easy to become too caught up in the technical or legal side of accessibility. Hilliker reminds AI and product developers that they’re creating experiences for real people, designing platforms and tools that can change a person’s life—or further alienate them from the rest of their family, community, or professional life.

“I always want to get back to why we are doing what we're doing,” he says. “It’s because disabled people deserve consideration. You're asking the question: ’Do the needs and rights of disabled people deserve consideration or not?’ If you're a good person, the answer is yes. You think that disabled people deserve consideration, and so you're going to do accessibility.”

Inclusive AI design and accessibility tie directly into what he believes makes collaboration possible, enabling us to build powerful tools, unlocking historic levels of innovation, alleviating suffering, and driving social progress.

That’s our ability to put ourselves in the position of other people: “Empathy is not a bug,” adds Hilliker, “but a critical feature of society.”

A founding partner of InclusionHub, Salesforce is helping bring greater accessibility and digital inclusion to the professional world. Visit its a11y website to learn more.

View full post