He and a colleague identified a mechanism by which cells communicate, a Nobel-winning breakthrough that paved the way for disease-fighting drugs.
DeepMind stunned the biology world late last year when its AlphaFold2 AI model predicted the structure of proteins (a common and very difficult problem) so accurately that many declared the decades-old problem “solved.” Now researchers claim to have leapfrogged DeepMind the way DeepMind leapfrogged the rest of the world, with RoseTTAFold, a system that does nearly the same thing at a fraction of the computational cost. (Oh, and it’s free to use.)
AlphaFold2 has been the talk of the industry since November, when it blew away the competition at CASP14, a virtual competition between algorithms built to predict the physical structure of a protein given the sequence of amino acids that makes it up. The model from DeepMind was so far ahead of the others, so highly and reliably accurate, that many in the field have talked (half-seriously and in good humor) about moving on to a new field.
But one aspect that seemed to satisfy no one was DeepMind’s plans for the system. It was not exhaustively and openly described, and some worried that the company (which is owned by Alphabet/Google) was planning on more or less keeping the secret sauce to themselves — which would be their prerogative but also somewhat against the ethos of mutual aid in the scientific world.
Update: DeepMind published more detailed methods in the journal Nature today. The code is available on GitHub. This does considerably lessen the aforementioned concern, but the advance described below is still highly relevant.
That concern seems to have been at least partly mooted by work from University of Washington researchers led by David Baker and Minkyung Baek, published in the latest issue of the journal Science. Baker, you may remember, recently won a Breakthrough Prize for his team’s work combating COVID-19 with engineered proteins.
The team’s new model, RoseTTAFold, makes predictions at similar accuracy levels using methods that Baker, responding to questions via email, candidly admitted were inspired by those used by AlphaFold2.
“The AlphaFold2 group presented several new high level concepts at the CASP14 meeting. Starting from these ideas, and with a lot of collective brainstorming with colleagues in the group, Minkyung has been able to make amazing progress in very little time,” he said. (“She is amazing!” he added.)
Baker’s group more or less placed second at CASP14, no mean feat, but hearing DeepMind’s methods described even generally set them on a collision course. They developed a “three-track” neural network that simultaneously considered the amino acid sequence (one dimension), distances between residues (two dimensions), and coordinates in space (three dimensions). The implementation is beyond complex and far outside the scope of this article, but the result is a model that achieves almost the same accuracy levels — levels, it bears repeating, that were completely unprecedented less than a year ago.
What’s more, RoseTTAFold accomplishes this level of accuracy far more quickly — that is, using less computation power. As the paper puts it:
DeepMind reported using several GPUs for days to make individual predictions, whereas our predictions are made in a single pass through the network in the same manner that would be used for a server…the end-to-end version of RoseTTAFold requires ~10 min on an RTX2080 GPU to generate backbone coordinates for proteins with less than 400 residues.
Hear that? It’s the sound of thousands of microbiologists sighing in relief and discarding drafts of emails asking for supercomputer time. It may not be easy to lay one’s hands on a 2080 these days, but the point is any high-end desktop GPU can perform this task in minutes, instead of requiring a high-end cluster running for days.
The modest requirements make RoseTTAFold suitable for public hosting and distribution as well, something that might never have been in the cards for AlphaFold2.
“We have a public server that anyone can submit protein sequences to and have the structures predicted,” Baker said. “There have been over 4500 submissions since we put the server up a few weeks ago. We have also made the source code freely available.”
This may seem very niche, and it is, but protein folding has historically been one of the toughest problems in biology and one towards which countless hours of high-performance computing have been dedicated. You may recall Folding@Home, the popular distributed computing app that let people donate their computing cycles to attempting to predict protein structures. The kind of problem that might have taken a thousand computers days or weeks to do — essentially by brute-forcing solutions and checking for fit — now can be done in minutes on a single desktop.
The physical structure of proteins is of utmost importance in biology, as it is proteins that do the vast majority of tasks in our bodies, and proteins that must be modified, suppressed, enhanced, and so on for therapeutic reasons; first, however, they need to be understood, and until November that understanding could not be reliably achieved computationally. At CASP14 it was proven to be possible, and now it has been made widely available.
It is not, by a long shot, a “solution” to the problem of protein folding, though the sentiment has been expressed. Most proteins at rest in neutral conditions can now have their structure predicted, and that has huge repercussions in multiple domains, but proteins are seldom found “at rest in neutral conditions.” They twist and contort to grab or release other molecules, to block or slip through gates and other proteins, and generally to do everything they do. These interactions are far more numerous, complex, and difficult to predict, and neither AlphaFold2 nor RoseTTAFold can do so.
“There are many exciting chapters ahead… the story is just beginning,” said Baker.
If you’re curious about the science and the potential repercussions, consider reading this much more detailed and technical account of the methods and possible next steps written in the wake of AlphaFold2’s CASP14 performance.
Police officers at the University of Washington in Seattle, regarded as one of the nation’s most progressive cities, said they were the target of racist insults and harassment.
While the concept of “deepfakes,” or AI-generated synthetic imagery, has been decried primarily in connection with involuntary depictions of people, the technology is dangerous (and interesting) in other ways as well. For instance, researchers have shown that it can be used to manipulate satellite imagery to produce real-looking — but totally fake — overhead maps of cities.
The study, led by Bo Zhao from the University of Washington, is not intended to alarm anyone but rather to show the risks and opportunities involved in applying this rather infamous technology to cartography. In fact their approach has as much in common with “style transfer” techniques — redrawing images in an impressionistic, crayon and arbitrary other fashions — than with deepfakes as they are commonly understood.
The team trained a machine learning system on satellite images of three different cities: Seattle, nearby Tacoma and Beijing. Each has its own distinctive look, just as a painter or medium does. For instance, Seattle tends to have larger overhanging greenery and narrower streets, while Beijing is more monochrome and — in the images used for the study — the taller buildings cast long, dark shadows. The system learned to associate details of a street map (like Google or Apple’s) with those of the satellite view.
The resulting machine learning agent, when given a street map, returns a realistic-looking faux satellite image of what that area would look like if it were in any of those cities. In the following image, the map corresponds to the top right satellite image of Tacoma, while the lower versions show how it might look in Seattle and Beijing.
A close inspection will show that the fake maps aren’t as sharp as the real one, and there are probably some logical inconsistencies like streets that go nowhere and the like. But at a glance the Seattle and Beijing images are perfectly plausible.
One only has to think for a few minutes to conceive of uses for fake maps like this, both legitimate and otherwise. The researchers suggest that the technique could be used to simulate imagery of places for which no satellite imagery is available — like one of these cities in the days before such things were possible, or for a planned expansion or zoning change. The system doesn’t have to imitate another place altogether — it could be trained on a more densely populated part of the same city, or one with wider streets.
It could conceivably even be used, as this rather more whimsical project was, to make realistic-looking modern maps from ancient hand-drawn ones.
Should technology like this be bent to less constructive purposes, the paper also looks at ways to detect such simulated imagery using careful examination of colors and features.
The work challenges the general assumption of the “absolute reliability of satellite images or other geospatial data,” said Zhao in a UW news article, and certainly as with other media that kind thinking has to go by the wayside as new threats appear. You can read the full paper at the journal Cartography and Geographic Information Science.
Open source software is at the core of… well, practically everything online. But while much of it is diligently maintained in some ways, in others it doesn’t receive the kind of scrutiny that something so foundational ought to. $1.3 million worth of grants were announced today, split among 13 projects looking to ensure open source software and development is being done equitably, sustainably, and responsibly.
The research projects will look into a number of questions about the way open source digital infrastructure is being used, maintained, and otherwise affected. For instance, many municipalities rely on and create this sort of infrastructure constantly as the need for government software solutions grows, but what are the processes by which this is done? Which approaches or frameworks succeed, and why?
And what about the private companies that contribute to major open-source projects, often without consulting one another — how do they communicate and share priorities and dependencies? How could that be improved, and with what costs and benefits?
These and other questions aren’t the type that any single organization or local government is likely to take on spontaneously, and of course the costs of such studies aren’t trivial. But they were deemed interesting enough (and possibly likely to generate new approaches and products) by a team of experts who sorted through about 250 applications over the last year.
The grantmaking operation is funded and organized by the Ford Foundation, Alfred P. Sloan Foundation, Open Society Foundations, Omidyar Network, and the Mozilla Open Source Support Program in collaboration with the Open Collective Foundation.
“There’s a dearth of funding for looking at the needs and potential applications of free and open source infrastructure. The public interest issues behind open source have been the missing piece,” said Michael Brennan, who’s leading the grant program at the Ford Foundation.
“The president of the foundation [Darren Walker] once said, ‘a just society relies on a just Internet,’ ” he quoted. “So our question is how do we create that just Internet? How do we create and sustain an equitable Internet that serves everyone equally? We actually have a lot more questions than answers, and few people are funding research into those questions.”
Even finding the right questions is part of the question, of course, but in basic research that’s expected. Early work in a field can seem frustratingly general or inconclusive because it’s as much about establishing the scope and general direction of the work as it is about suggesting actual courses of action.
“The final portfolio wasn’t just about the ‘objectively best’ ones, but how do we find a diversity of approaches and ideas, and tackle different aspects of this work, and also be representative of the diverse and global nature of the project?” Brennan said. “This year we also accepted proposals for both research and implementation. We want to see that the research is informing the building of that equitable and sustainable infrastructure.”
You can read the full research abstracts here, but these are the short versions, with the proposer’s name:
- How are COVID data infrastructures created and transformed by builders and maintainers from the open source community? – Megan Finn (University of Washington, University of Texas, Northeastern University)
- How is digital infrastructure a critical response to fight climate change? – Narrira Lemos de Souza
- How do perceptions of unfairness when contributing to an open source project affect the sustainability of critical open source digital infrastructure projects? – Atul Pokharel (NYU)
- Supporting projects to implement research-informed best practices at the time of need on governance, sustainability, and inclusion. – Danielle Robinson (Code for Science & Society)
- Assessing Partnerships for Municipal Digital Infrastructure – Anthony Townsend (Cornell Tech)
- Implement recommendations for funders of open source infrastructure with guides, programming, and models – Eileen Wagner, Molly Wilson, Julia Kloiber, Elisa Lindinger, and Georgia Bullen (Simply Secure & Superrr)
- How we can build a “Creative Commons” for API terms of Service, as a contract to automatically read, control and enforce APIs Terms of service between infrastructure and applications? – Mehdi Medjaoui (APIdays, LesMainteneurs, Inno3)
- Indian case study of governance, implementation, and private sector role of open source infrastructure projects – Digital Asia Hub
- Will cross-company visibility into shared free and open source dependencies lead to cross-company collaboration and efforts to sustain shared dependencies? – Duane O’Brien
- How do open source tools contribute towards creating a multilingual internet? – Anushah Hossain (UC Berkeley)
- How digital infrastructure projects could embrace cooperatives as a sustainable model for working – Jorge Benet (Cooperativa Tierra Común)
- How do technical decision-makers assess the security ramifications of open source software components before adopting them in their projects and where can systemic interventions to the FOSS ecosystem be targeted to collectively improve its security? – Divyank Katira (Centre for Internet & Society in Bangalore)
- How can African participation in the development, maintenance, and application of the global open source digital infrastructure be enhanced? – Alex Comninos (Research ICT Africa (RIA) and the University of Cape Town)
The projects will receive their grants soon, and later in the year (or whenever they’re ready) the organizers will coordinate some kind of event at which they can present their results. Brennan made it clear that the funders take no stake in the projects and aren’t retaining or publishing the research themselves; they’re just coordinating and offering support where it makes sense.
$1.3 million is an interesting number. For some, it’s peanuts. A startup might burn through that cash in a month or two. But in an academic context, a hundred grand can be the difference between work getting done or being abandoned. The hope is that small injections at the base layer produce a better environment for the type of support the Ford Foundation and others provide as part of their other philanthropic and grantmaking efforts.
Google is introducing features that will allow users to take vital health measurements using just the camera they already have on their smartphone, expanding health and fitness features typically only available on dedicated wearables to a whole new group of people. Beginning next month, and available initially on Google Pixel phones exclusively (but with plans to offer it for other Android devices in future), users will be able to measure both their heart rate and their respiratory rate using just their device’s camera.
Typically, taking these measurements has required specialized hardware, including red or green light-based heart rate monitors like those found on the Apple Watch or on fitness trackers like those made by Google-acquired Fitbit. Google’s hardware and software teams, including the Google Health unit led by Director of Health Technologies Schwetak Patel, have managed to develop computer vision-based methods for taking these measurements using only smartphone cameras, which it says can produce results that are comparable to clinical-grade measurement hardware (it has produced a study to validate these results, which it’s making available in pre-print format while it seeks peer review through an academic journal).
For respiratory rate, the technology relies on a technique known as ‘optical flow,’ which monitors movements in a person’s chest as they breathe and uses that to determine their breathing rate. In its clinical validation study, which covered both typical individuals in good health, and people with existing respiratory conditions, Google’s data indicates that it’s accurate to within 1 breath per minute across all participants.
For heart rate, Google is initially using the camera to detect “subtle color changes” in a user’s finger tip, which provide an indicator about when oxygenated blood flows from your heart through to the rest of your body. The company’s validation data (again, still subject to external review) has shown accuracy within 2% margin of error, on average, across people with a range of different skin types. Google is also working on making this same technology work using color changes in a person’s face, it says, though that work is still in the exploratory phase.
Google is going to make these measurement features available to users within the next month, it says, via the Google Fit app, and initially on currently available Pixel devices made by the company itself. The plan is then to expand the features to different Android devices running Android 6 or later, sometime “in the coming months.”
“My team has been working on ways that we can unlock the potential of everyday smart devices,” Patel said in a press briefing regarding the new features. This would include smart devices in the home, or a mobile phone, and how we leverage the sensors that are starting to become more and more ubiquitous within those devices, to support health and wellness.”
Patel, who is also a computer science professor at the University of Washington and who has been recognized with an ACM Prize in Computing Award for his work in digital health, said that the availability of powerful sensors in ubiquitous consumer devices, combined with advances in AI, have meant that daily health monitoring can be much more accessible than ever before.
“I really think that’s going to be a really important area moving forward given that if you think about health care, the journey just doesn’t end at the hospital, the four walls of the hospital,” he said. “It’s really this continuous journey, as you’re living your daily life, and being able to give you feedback and be able to measure your general wellness is an important thing.”
It’s worth noting that Google is explicit about these features being intended for use in a person’s own tracking of their general wellbeing – meaning it’s not meant as a diagnostic or medical tool. That’s pretty standard for these kinds of features, since few of these companies want to take of the task of getting full FDA medical-grade device certification for tools that are meant for general consumer use. To that end, Google Fit also doesn’t provide any guidance or advise based on the results of these measurements; instead, the app provides a general disclaimer that the results aren’t intended for medical use, and also offers up some very high-level description of why you’d even want to track these stats at all.
Many of the existing dedicated wellness and health tracking products on the market, like the Oura ring, for instance, provide more guidance and actionable insight based on the measurements it takes. Google seems intent on steering well clear of that line with these features, instead leaving the use of this information fully within the hands of users. That said, it could be a valuable resource to share with your physician, particularly if you’re concerned about potential health issues already, in place of other less convenient and available continuous health monitoring.
Patek said that Google is interested in potentially exploring how sensor fusion could further enhance tracking capabilities on existing devices, and in response to a question about potentially offering this on iPhones, he said that while the focus is currently on Android, they ultimate goal is indeed to get it “to as many people as possible.”
An online tool targets only a small slice of what’s out there, but may open some eyes to how widely artificial intelligence research fed on personal images.
Sometimes it’s just not worth it to try to top Mother Nature. Such seems to have been the judgment by engineers at the University of Washington, who, deploring the absence of chemical sensors as fine as a moth’s antennas, opted to repurpose moth biology rather than invent new human technology. Behold the “Smellicopter.”
Mounted on a tiny drone platform with collision avoidance and other logic built in, the device is a prototype of what could be a very promising fusion of artificial and natural ingenuity.
“Nature really blows our human-made odor sensors out of the water,” admits UW grad student Melanie Anderson, lead author of the paper describing the Smellicopter, in a university news release. And in many industrial applications, sensitivity is of paramount importance.
If, for instance, you had one sensor that could detect toxic particles at a fraction of the concentration of that detectable by another, it would be a no-brainer to use the more sensitive of the two.
On the other hand, it’s no cake walk training moths to fly toward toxic plumes of gas and report back their findings. So the team (carefully) removed a common hawk moth’s antenna and mounted it on board. By passing a light current through it the platform can monitor the antenna’s general status, which changes when it is exposed to certain chemicals — such as those a moth might want to follow, a flower’s scent perhaps.
See it in action below:
In tests, the cybernetic moth-machine construct performed better than a traditional sensor of comparable size and power. The cells of the antenna, excited by the particles wafting over them, created a fast, reliable, and accurate signal for those chemicals they are built to detect. “Reprogramming” those sensitivities would be non-trivial, but far from impossible.
The little drone itself has a clever bit of engineering to keep the antenna pointed upwind. While perhaps pressure sensors and gyros might have worked to keep the craft pointing in the right direction, the team used the simple approach of a pair of large, light fins mounted on the back that have the effect of automatically turning the drone upwind, like a weather vane. If something smells good that way, off it goes.
It’s very much a prototype, but this sort of simplicity and sensitivity are no doubt attractive enough to potential customers like heavy industry and the military that the team will have offers coming in soon. You can read the paper describing the design of the Smellicopter in the journal IOP Bioinspiration & Biomimetics.
Lunar exploration startup Astrobotic is working on developing ultra-fast wireless charging technology for its CubeRover shoebox-sized lunar robotic explorers. The project, which is funded by NASA’s Tipping Point program with a $5.8 million award, will tap Seattle-based wireless charging startup WiBotic for expertise in high-speed, short-range wireless power, and brings in Bosch to assist with developing the AI-based data analysis that will help the robots find their way to docking stations for a wireless power-up.
Existing lunar rovers are typically powered by sunlight, but they’re actually very large (roughly car-sized or larger) and they have a lot of surface area to soak up rays via solar panels. Astrobotic’s rovers, which will initially be under five pounds in weight, won’t have much area to collect the sun’s power, and will instead have to rely on secondary power sources to keep enough energy for their exploratory operations.
That’s where WiBotic comes in. Working together with the University of Washington, the startup will be developing a “lightweight, ultra-fast proximity charging solution, compromised of a base station and power receiver” specifically for use in space-based applications. But finding these stations will be its own special challenge – particularly in a lunar context, where things like GPS don’t come into play. Instead, Bosch will leverage data collected from sensors on board the robot to generate a sensor-fusion result that can provide it with autonomous navigation capabilities. That work could be instrumental in helping future rovers navigate not only to power stations, but also to various destinations on the lunar surface as robotic science and exploration missions ramp up.
The goal is to have a demonstration rover charging system ready to show off sometime in 2023, and the partners will be working together with NASA’s Glenn Research Center to test the technology in the facility’s thermal vacuum chamber test lab.
Isaiah Stewart is ready to make his mark on the N.B.A. after a down season at Washington led many to overlook him. “I just have to remind them,” Stewart said. “I have to wake them back up.”
A former Jesuit priest and leader in bioethics, he believed that an ethicist should be part of a patient’s medical team when hard decisions have to be made.
The ways that dogs grow and age may provide potentially useful similarities with people.
Microsoft has added to the slowly growing pile of technologies aimed at spotting synthetic media (aka deepfakes) with the launch of a tool for analyzing videos and still photos to generate a manipulation score.
The tool, called Video Authenticator, provides what Microsoft calls “a percentage chance, or confidence score” that the media has been artificially manipulated.
“In the case of a video, it can provide this percentage in real-time on each frame as the video plays,” it writes in a blog post announcing the tech. “It works by detecting the blending boundary of the deepfake and subtle fading or greyscale elements that might not be detectable by the human eye.”
If a piece of online content looks real but ‘smells’ wrong chances are it’s a high tech manipulation trying to pass as real — perhaps with a malicious intent to misinform people.
And while plenty of deepfakes are created with a very different intent — to be funny or entertaining — taken out of context such synthetic media can still take on a life of its own as it spreads, meaning it can also end up tricking unsuspecting viewers.
While AI tech is used to generate realistic deepfakes, identifying visual disinformation using technology is still a hard problem — and a critically thinking mind remains the best tool for spotting high tech BS.
Nonetheless, technologists continue to work on deepfake spotters — including this latest offering from Microsoft.
Although its blog post warns the tech may offer only passing utility in the AI-fuelled disinformation arms race: “The fact that [deepfakes are] generated by AI that can continue to learn makes it inevitable that they will beat conventional detection technology. However, in the short run, such as the upcoming U.S. election, advanced detection technologies can be a useful tool to help discerning users identify deepfakes.”
This summer a competition kicked off by Facebook to develop a deepfake detector served up results that were better than guessing — but only just in the case of a data-set the researchers hadn’t had prior access to.
Microsoft, meanwhile, says its Video Authenticator tool was created using a public dataset from Face Forensic++ and tested on the DeepFake Detection Challenge Dataset, which it notes are “both leading models for training and testing deepfake detection technologies”.
It’s partnering with the San Francisco-based AI Foundation to make the tool available to organizations involved in the democratic process this year — including news outlets and political campaigns.
“Video Authenticator will initially be available only through RD2020 [Reality Defender 2020], which will guide organizations through the limitations and ethical considerations inherent in any deepfake detection technology. Campaigns and journalists interested in learning more can contact RD2020 here,” Microsoft adds.
The tool has been developed by its R&D division, Microsoft Research, in coordination with its Responsible AI team and an internal advisory body on AI, Ethics and Effects in Engineering and Research Committee — as part of a wider program Microsoft is running aimed at defending democracy from threats posed by disinformation.
“We expect that methods for generating synthetic media will continue to grow in sophistication,” it continues. “As all AI detection methods have rates of failure, we have to understand and be ready to respond to deepfakes that slip through detection methods. Thus, in the longer term, we must seek stronger methods for maintaining and certifying the authenticity of news articles and other media. There are few tools today to help assure readers that the media they’re seeing online came from a trusted source and that it wasn’t altered.”
On the latter front, Microsoft has also announced a system that will enable content producers to add digital hashes and certificates to media that remain in their metadata as the content travels online — providing a reference point for authenticity.
The second component of the system is a reader tool, which can be deployed as a browser extension, for checking certificates and matching the hashes to offer the viewer what Microsoft calls “a high degree of accuracy” that a particular piece of content is authentic/hasn’t been changed.
The certification will also provide the viewer with details about who produced the media.
Microsoft is hoping this digital watermarking authenticity system will end up underpinning a Trusted News Initiative announced last year by UK publicly funded broadcaster, the BBC — specifically for a verification component, called Project Origin, which is led by a coalition of the BBC, CBC/Radio-Canada, Microsoft and The New York Times.
It says the digital watermarking tech will be tested by Project Origin with the aim of developing it into a standard that can be adopted broadly.
“The Trusted News Initiative, which includes a range of publishers and social media companies, has also agreed to engage with this technology. In the months ahead, we hope to broaden work in this area to even more technology companies, news publishers and social media companies,” Microsoft adds.
While work on technologies to identify deepfakes continues, its blog post also emphasizes the importance of media literacy — flagging a partnership with the University of Washington, Sensity and USA Today aimed at boosting critical thinking ahead of the US election.
This partnership has launched a Spot the Deepfake Quiz for voters in the US to “learn about synthetic media, develop critical media literacy skills and gain awareness of the impact of synthetic media on democracy”, as it puts it.
The interactive quiz will be distributed across web and social media properties owned by USA Today, Microsoft and the University of Washington and through social media advertising, per the blog post.
The tech giant also notes that it’s supporting a public service announcement (PSA) campaign in the US encouraging people to take a “reflective pause” and check to make sure information comes from a reputable news organization before they share or promote it on social media ahead of the upcoming election.
“The PSA campaign will help people better understand the harm misinformation and disinformation have on our democracy and the importance of taking the time to identify, share and consume reliable information. The ads will run across radio stations in the United States in September and October,” it adds.
The roots of the nation’s current inability to control the pandemic can be traced to mid-April, when the White House embraced overly rosy projections to proclaim victory and move on.
In Washington State and Italy, the first confirmed cases were not linked to the outbreaks that followed, the analysis found. The epidemics were seeded later.
The president has served up a series of false and misleading statements as he seeks to counter uncomfortable facts about the spread of Covid-19 and questions about his push to reopen the economy.
Historians interested in the way events and people were chronicled in the old days once had to sort through card catalogs for old papers, then microfiche scans, then digital listings — but modern advances can index them down to each individual word and photo. A new effort from the Library of Congress has digitized and organized photos and illustrations from centuries of news using state of the art machine learning.
Led by Ben Lee, a researcher from the University of Washington occupying the Library’s “Innovator in Residence” position, the Newspaper Navigator collects and surfaces data from images from some 16 million pages of newspapers throughout American history.
Lee and his colleagues were inspired by work already being done in Chronicling America, an ongoing digitization effort for old newspapers and other such print materials. While that work used optical character recognition to scan the contents of all the papers, there was also a crowd-sourced project in which people identified and outlined images for further analysis. Volunteers drew boxes around images relating to World War I, then transcribed the captions and categorized the picture.
This limited effort set the team thinking.
“I loved it because it emphasized the visual nature of the pages — seeing the visual diversity of the content coming out of the project, I just thought it was so cool, and I wondered what it would be like to chronicle content like this from all over America,” Lee told TechCrunch.
He also realized that what the volunteers had created was in fact an ideal set of training data for a machine learning system. “The question was, could we use this stuff to create an object detection model to go through every newspaper, to throw open the treasure chest?”
The answer, happily, was yes. Using the initial human-powered work of outlining images and captions as training data, they built an AI agent that could do so on its own. After the usual tweaking and optimizing, they set it loose on the full Chronicling America database of newspaper scans.
“It ran for 19 days nonstop — definitely the largest computing job I’ve ever run,” said Lee. But the results are remarkable: millions of images spanning three centuries (from 1789 to 1963) and organized with metadata pulled from their own captions. The team describes their work in a paper you can read here.
Assuming the captions are at all accurate, these images — until recently only accessible by trudging through the archives date by date and document by document — can be searched for their contents like any other corpus.
Looking for pictures of the president in 1870? No need to browse dozens of papers looking for potential hits and double-checking the contents in the caption — just search Newspaper Navigator for “president 1870.” Or if you want editorial cartoons from the World War II era, you can just get all illustrations from a date range. (The team has already zipped up the photos into yearly packages and plans other collections.)
Here are a few examples of newspaper pages with the machine learning system’s determinations overlaid on them (warning: plenty of hat ads and racism):
That’s fun for a few minutes for casual browsers, but the key thing is what it opens up for researchers — and other sets of documents. The team is throwing a data jam today to celebrate the release of the dataset and tools, during which they hope to both discover and enable new applications.
“Hopefully it will be a great way to get people together to think of creative ways the dataset can be used,” said Lee. “The idea I’m really excited by from a machine learning perspective, is trying to build out a user interface where people can build their own dataset. Political cartoons or fashion ads, just let users define users they’re interested in and train a classifier based on that.”
In other words, Newspaper Navigator’s AI agent could be the parent for a whole brood of more specific ones that could be used to scan and digitize other collections. That’s actually the plan within the Library of Congress, where the digital collections team has been delighted by the possibilities brought up by Newspaper Navigator and machine learning in general.
“One of the things we’re interested in is how computation can expand the way we’re enabling search and discovery,” said Kate Zwaard. “Because we have OCR, you can find things it would have taken months or weeks to find. The Library’s book collection has all these beautiful plates and illustrations. But if you want to know like, what pictures are there of the Madonna and child, some are categorized, but others are inside books that aren’t catalogued.”
That could change in a hurry with an image-and-caption AI systematically poring over them.
Newspaper Navigator, the code behind it, and all the images and results from it are completely public domain, free to use or modify for any purpose. You can dive into the code at the project’s GitHub.
Ali Amin-Javaheri grew up in the chemicals business.
His father had worked for Iran’s state-owned chemical company and when the family fled the country in the nineteen eighties during the Iran-Iraq war, they first settled in Houston where employers welcomed the senior Amin-Jahaveri’s experience.
Houston in the 80s was dominated by the petrochemicals industry and by the time the family later relocated to Washington State, Amin-Jahaveri was already deeply steeped in a world of covalent bonds, chemical cracking, and the molecular coupling and decoupling of matter.
For the former Texas chemical kid, moving to tech-heavy, rain-soaked Washington, dominated at the time by Microsoft, was a bit of a shock, the founder recalled. But it was the 2000s and everyone was in tech so Amin-Jahaveri figured that’d be his path too.
Those two worlds collided for the young University of Washington graduate in his very first job — his only job before launching his first startup — as a programmer and developer at Chempoint.
“Completely through happenstance I was walking around a certain part of Seattle and I walked by this building and it had all these logos outside the office. I saw this logo for a company called Chempoint and I was instantly intrigued,” Amin-Jahaveri said. “I walked up to the receptionist and asked what they were doing.”
In the summer of 2001, Amazon was an online bookseller a little over seven years old, the dot-com boom hadn’t gone completely bust quite yet and business-to-business marketplaces were a hot investment.
“It was a startup with just a handful of folks,” said Amin-Jahaveri. “There wasn’t a business model in place, but the intent was to build a marketplace for chemicals… The dot-com boom was happening and everything was moving on line and the chemicals industry likely will as well.”
Fifteen years later, Chempoint is one of the last remaining companies in a market that once boasted at least fifteen competitors — and the chemicals industry still doesn’t have a true online marketplace. Until (potentially) now, with the launch of Amin-Jahaveri’s first startup — Knowde.
For the vast majority of Americans, the chemicals industry remains a ubiquitous abstraction. Consumers have a direct relationship with the energy business through the movements of prices at the pump, but the ways in which barrels of oil get converted into the plastics, coatings, films, flavors, fillings, soaps, toothpastes, enamels and unguents that touch everyone’s daily life are a little bit less obvious.
It’s a massive industry. The U.S. accounted for 17% of the global chemicals market in 2017 and that percentage amounted to a staggering $765 billion in sales. Worldwide there are thousands of chemicals companies selling hundreds of different specialty chemicals each and all contributing to a total market worth trillions of dollars.
“The market is $5 trillion,” said Amin-Jahaveri. “Just to be super clear about that.. It’s $5 trillion worth of transactions happening every year.”
It’s no secret that venture capitalists love marketplaces. Replacing physical middlemen with electronic ones offers efficiencies and economies of scale that have a cold logic and avoid the messiness of human contact. For the past twenty years, different entrepreneurs have cropped to tackle creating systems that could connect buyers on one side with sellers on another — and the chemicals industry has been investors’ holy grail since Chempoint made its pitch to the market in 2001.
“The chemicals industry is the most interesting of all of them. It’s the biggest. It’s also the most fragmented,” said Sequoia partner Shaun Maguire. “There were three companies in the world that all did about $90 billion in sales and none of those three companies did more than 1.6% of sales of the entire industry.”
Those kinds of numbers would make any investor’s jaw drop. And several firms tried to make a pitch for the hotly contested financing round for Knowde. Maguire first heard that there looking for funds to pursue the creation of the first true marketplace business for the chemicals industry through a finance associate at Sequoia, Spencer Hemphill.
Hemphill knew an early Knowde investor named Ian Rountree at Cantos Ventures and had heard Rountree talk about the new company. He flagged the potential deal to Maguire and another Sequoia partner. It only took one hour for Maguire to be blown away by Amin-Jahaveri’s pedigree in the industry and his vision for Knowde.
From that initial meeting in September to the close of the company’s $14 million Series A round on March 11 (the day the markets suffered their worst COVID-19-related losses), Maguire was tracking the company’s progress. Other firms in the running for the Knowde deal included big names like General Catalyst, according to people with knowledge of the process.
Sequoia wound up leading the Series A deal for Knowde, which also included previous investors Refactor Capital, 8VC, and Cantos Ventures.
The tipping point for Maguire was the rapid adoption and buy-in from the industry when Knowde flipped the switch on sales in early January.
For at least the past fifty years, the modern chemicals industry has been defined — and in some ways constrained — by its sales pitches. There are specialty manufacturers who have hundreds of chemicals that they’ve made, but the knowledge of what those chemicals can do is often locked inside research labs. The companies rely on distributors, middlemen, and internal sales teams to get the word out, according to Maguire and Amin-Jahaveri.
“The way that things are done is still through field sales teams and product catalogs and brochures and face to face meetings and all that stuff,” said Amin-Jahaveri. “This industry has not evolved as quickly as the rest of the world… And we always knew that something has got to give.”
One selling point for Knowde is that it breaks that logjam, according to investors like Maguire.
“One of the references said that they had a bunch of legacy flavors from the seventies,” Maguire said. “It was a Madagascar Vanilla that none of their sales people had tried to sell for 25 years… By putting them on Knowde the sales numbers had gone up over 1,000%… That company does over $5 billion a year in sales through flavors.”
The change happened as the old guard of executives began aging out of the business, according to Amin-Jahaveri. “Between 2002 and 2012 nothing happened.. There was no VC money thrown at any type chemical company and then it started changing a little bit,” he said. “The first domino was the changing age demographic… these consumer product companies kept getting younger.”
Amin-Jahaveri’s previous company grew to $400 million in revenue selling technology and services to the chemicals industry. It was back-end software and customer relationship tools that the industry had never had and needed if it were to begin the process of joining the digital world. Knowde, according to Amin-Jahaveri, is the next phase of that transition.
“Our plan is to connect the chemical producers directly with the buyers,” Amin-Jahaveri said. “And provide all the plumbing and storefronts necessary to manage these things themselves.”
All that Knowde needed to do was collate the disparate data about what chemicals small manufacturers were making and had in stock and begin listing that information online. That transparency of information used to be more difficult to capture, since companies viewed their product catalog as an extension of their intellectual property — almost a trade secret, according to Amin-Jahaveri.
Once companies began listing products online, Amin-Jahaveri and his team could go to work creating a single, searchable taxonomy that would allow outsiders to find the materials they needed without having to worry about differences in descriptions.
Knowde has broken down the chemicals industry into ten different verticals including: food, pharmaceuticals, personal care, houseware goods, industrial chemicals. The company currently operates in three different verticals and plans to extend into all ten within the year.
Amin-Jahaveri knows that he’s not going to get a meaningful chunk of business from the huge chemical manufacturers like BASF or Dow Chemical that pump out thousands of tons of commodity chemicals, those deals only represent $2 trillion of the total addressable market.
That means another $3 trillion in sales are up for grabs for the company Amin-Jahaveri founded with his partner Woyzeck Krupa.
While the opportunity is huge, the company — like every other new business launching in 2020 — is still trying to do business in the middle of the worst economic collapse in American history. However, Amin-Jahaveri thinks the new economic reality could actually work in Knowde’s favor.
“It’s going to be one more trigger event for these chemical companies that they have to go online,” he said. The personal relationships that drove much of the sales for the chemicals business before have dried up. No more conferences and events means no more opportunities to glad-hand, backslap, and chat over drinks at the hotel bar. So these companies need to find a new way to sell.
Maguire sees another benefit to the movement of chemical catalogs into an online marketplace, and that’s internal transparency within chemical companies.
“Even the biggest companies in the world do not have an internal search feature even for their own chemicals,” said Maguire. “I talked to two of the biggest companies in the world. In the case of one chemist who is a friend of mine. If you are trying to formulate some new concoction how do you find what chemicals you have in the company? If it’s in my division it’s pretty easy.. If I need chemicals from another division… there’s no way to search it right now.”
With high school disrupted, a growing number of schools are waiving standardized testing requirements for 2021 applicants.
After this wave of infections, brace yourself for more waves.
Hospitals in hot spots like Seattle and New York are postponing elective surgeries, hoping to conserve resources as the outbreak escalates. More serious surgeries are also getting a second look.
A series of missed chances by the federal government to ensure more widespread testing came during the early days of the outbreak, when containment would have been easier.
A project funded by the Gates Foundation will soon begin issuing at-home testing kits for the novel coronavirus, COVID-19, according to a report in the Seattle Times.
The study, based on a nose-swab should be able to return results in up to two days and will be shared with health officials who can then notify people who test positive. Individuals who have been infected will then be encouraged to answer an online questionnaire to give health officials information about their movements so that those officials can identify and notify other people who may need to be tested or quarantined, according to the Seattle Times report.
“Although there’s a lot to be worked out, this has enormous potential to turn the tide of the epidemic,” Scott Dowell, who leads the coronavirus response effort from the Bill & Melinda Gates Foundation told the Seattle Times.
There’s no clear timeline for the project’s launch as the Foundation looks to finalize the supporting software and draft a final questionnaire for people who request the tests. The Foundation estimates that it could run up to 400 tests per-day, according to Dowell.
The Gates Foundation isn’t the only entity moving quickly to develop at home test kits. In a Twitter thread on Saturday, serial healthcare entrepreneur Jonathan Rothenberg outlined a similar approach, and is apparently now in discussions with a manufacturer on how to bring it to market.
Seattle and the surrounding area has been the epicenter for the coronavirus outbreak in the U.S. The state has confirmed 71 cases and 15 deaths from the disease as of Saturday. At least one health expert estimates that Seattle could have as many as 600 cases, based on computational modeling.
“One of the most important things from our perspective, having watched and worked on this in other parts of the world, is the identification of people who are positive for the virus, so they can be safely isolated and cared for, and the identification of their contacts, who can then be quarantined,” Dowell told the Seattle Times.
The project to do develop at-home testing evolved from a two-year-old research project from the University of Washington that was intended to track the spread of diseases like influenza, according to the Times reporting.
All told, the Gates Foundation has poured about $20 million into the effort. The foundation has also committed $5 million to the local response efforts to combat the disease in the area — including the expansion of testing and analysis.