So how many of you geeks have tried to break the internet? Well, Googling “Google” won’t do it (despite my kids thinking it will). Well I did my own thought experiment today, I pitted the old text based captcha against Google’s vision API. And guess what? The response from Google is shown above. Look at the answer from Google in Block 2… Google wins, solving the Captcha first try.

Fascinating stuff. So what’s next, well Captcha has moved on, we now have to identify ourselves as Human by recognising images in a square, well Google is in Beta with some pretty powerful stuff which will solve that now, see the image below processed by Google’s Beta API:

It picks out the elements of the picture and labels them, it really is cool stuff and does anyone else feel we are in an AI arms race. Where will Captcha go next, DNA samples?

Anyway, so back to test automation… the reason I post this is that we now have some pretty powerful tools available to us that will solve seemingly unsolvable challenges from only a few years ago. OCR (Optical Character Recognition) has been a mainstay in Automation (love it or hate it) for years, relying on some sophisticated OCR libraries to read text from an image, screenshot etc. to obtain actual values from an SUT, or locate text for navigation. Annoyingly HP’s UFT always seemed to outperform the best from the Open Source or paid community, but even then I’d say it was 70% accurate at most, sensitive to fonts, screen resolution and other factors, not enough to make part of a core automation strategy.

But then in comes Google’s vision API and others, AI based OCR, which is astonishingly accurate and even beats systems designed to beat OCR, and we can use it relatively easily.

Below is a simple bit of JS code that uses the Google Puppeteer API and the Google Vision API, to launch a webpage, grab a screenshot, use AI to read the text on the page (an image on a button in this example) and click on it. 30 lines of code give or take.

const pup = require('puppeteer')
const rp = require('request-promise')
const jp = require('jsonpath')

let url = '';
let texttoclick = 'Calculate';

(async () => {
    const browser = await pup.launch({headless: false});
    const page = await browser.newPage();
    await page.goto(url);
    let ss = await page.screenshot({encoding: 'base64'});

    let body = {"requests": [{"image": {"content": ss},"features": [{"type": "TEXT_DETECTION"}]}]}

    var options = {
        method: 'POST',
        uri: '',
        body: body,
        json: true // Automatically stringifies the body to JSON

    let resp = await;
    let textobj = jp.query(resp,`$..textAnnotations[?(@.description=="`+ texttoclick + `")]`);
    console.log(JSON.stringify(textobj));[0].boundingPoly.vertices[0].x + 5,textobj[0].boundingPoly.vertices[0].y + 5)
    await new Promise(r => setTimeout(r, 2000));

    await browser.close();


It’s a brave new world out there, and this one of the few concrete examples of how AI can really enrich the test automation space!