AI identifies molecules from their featureless visible spectrum

An artificial intelligence algorithm can identify molecules from their visible spectrum, where many organic compounds are completely transparent and have no absorption peaks to speak of. ‘Forget about the peaks – that’s the main result,’ says Felipe Herrera from the University of Santiago, Chile. Once the algorithm is trained on more structures, as well as mixtures, it could enhance non-destructive optical sensing to identify explosives or environmental contaminants.

When it comes to detecting molecules by probing them with a laser, ‘infrared (IR) spectroscopy is said to be so precise that nothing can beat it’, Herrera says. The fingerprint region in IR or Raman spectra can be used by chemists and machines alike to pinpoint which molecule they are looking at. But the instruments needed to get these spectra are bulky, expensive and need an expert user, says Ross Gillanders, who works on optical sensors at the University of St Andrews, UK. ‘The advantage of using visible light is that you can really reduce the cost of a field device and potentially make it a lot more portable and user friendly as well,’ he points out.

This is why Herrera and his team trained a machine learning algorithm to identify organic molecules from a single-wavelength refractive index measurement. ‘We actually reach Raman accuracies. Depending on the molecule, we can beat it – 98% or more,’ Herrera says.

The team selected 61 compounds, including simple organics like methane and common polymers like polymethyl methacrylate, and fed the algorithm 40 years’ worth of pre-compiled open-source data, ‘which nobody was making use of on a massive scale’, Herrera says. Because the data was heavily biased towards Raman and IR spectra, the researchers filled in the gaps in each compound’s visible spectrum with physics-based modelling.

The team quickly found out that to get the best-performing algorithm, it was important to train it not just on visible data but also on infrared and Raman. ‘We could not have come up with this conclusion without machine learning,’ Herrera says. ‘The technical rationale for why this works is the optical response of materials is a global object. When you measure the infrared spectrum, you record a subset of the entire optical response information content. I think that’s the achievement of this work: it opens people’s intuition to the broader picture of what the electromagnetic response of a compound is.’

Herrera could imagine his method as part of the software for a sensor, for example at the output at a gas chromatography analyser. For now, the identification only works on single compounds, though the team is looking into making it work on mixtures. ‘What I would like to see this applied in is optical sensing, which could be done non-destructively – it could be portable with photonic devices that fit in a chip.’

‘It would be great if you could get a similar fingerprint using visible as you could do with IR,’ says Gillanders. It could allow for quick identification of hazardous materials during airport screening, help track down the source of improvised explosive devices or identify environmental contaminants in complex mixtures such as river water. However, Gillanders cautions, it’s still at very early stage. ‘I think it’ll be a while before this can go into the field, with handheld devices.’