Singapore-based retail analytics company Trax raises $640M Series E led by SoftBank Vision Fund 2 and BlackRock

A group photo of Trax's co-founders, Joel Bar-El (left) and Dror Feldheim (right), and Trax's CEO, Justin Behar (center)

Trax’s co-founders, Joel Bar-El (left) and Dror Feldheim (right), and Trax’s CEO, Justin Behar (center)

COVID-19 forced many retailers and brands to adopt new technologies. Retail analytics unicorn Trax expects that this openness to tech innovation will continue even after the pandemic. The Singapore-based company announced today that it has raised $640 million in Series E funding to expand its products, which combine computer vision and cloud-based software to help brick-and-mortar stores manage their inventory, merchandising and operations. The round included primary and secondary capital, and was led by SoftBank Vision Fund 2 and returning investor BlackRock. Other participants included new investors OMERS and Sony Innovation Fund by IGV.

Before this round, Trax had raised $360 million in primary funds. J.P. Morgan acted as a placement agent to Trax on its Series E, which brings its total funding so far to $1.02 billion. Trax did not disclose a new valuation, but reportedly hit unicorn status in 2019. Reports emerged last year that it is considering a public offering, but chief executive officer Justin Behar had no comment when asked by TechCrunch if Trax is planning for an IPO.

Founded in 2010 and headquartered in Singapore, Trax also has offices in Brazil, the United States, China, the United Kingdom, Israel, Mexico, Japan, Hungary, France, Russia and Australia. The company says it serves customers in more than 90 countries.

Behar told TechCrunch that the new funding will be used to “invest heavily in global [go-to-market] strategies and technology for our flagship Retail Watch solution, as we look for ways to make it easier for retailers and brands to continue their digitization journey. More specifically, we will use the capital to accelerate growth and triple-down on continued innovation across our core vision, machine learning, IoT and marketplace technologies.”

Launched last year, Retail Watch uses a combination of computer vision, machine learning and hardware like cameras and autonomous robots, to gather real-time data about the shelf availability of products. It sends alerts if stock is running low, corrects pricing errors and checks if planograms, or product display plans for visual merchandising, are being followed. Retail Watch currently focuses on center shelves, where packaged goods are usually stocked, but will expand into categories like fresh food and produce.

The funding will also be used to expand Trax’s Dynamic Merchandising, a partnership with on-demand work platform Flexforce, and Shopkick, the shopping rewards app Trax acquired in 2019, into new markets over the next one to two years.

“Finally, we see many opportunities to help retailers along their digitization journey and will be expanding into new use cases with products we develop internally and via potential acquisitions,” Behar said.

Early in the pandemic, retailers had to cope with surge buying, as customers emptied shelves of stock while preparing to stay at home. As the pandemic continued, buying patterns shifted dramatically and in April 2020, Forrester forecast COVID-19 would cause global retail sales to decline by an average of 9.6% globally, resulting in a loss of $2.1 trillion, and that it would take about four years for retailers to overtake pre-pandemic levels.

In a more recent report, Forrester found despite spending cuts, nearly 40% of retailers and wholesalers immediately increased their tech investment, in some cases implementing projects in weeks that would have otherwise taken years.

Behar said “the pandemic made it clear the retail industry was not prepared for a sudden change in demand, as consumers faced empty shelves and out-of-stocks for extended periods in key categories. These extreme shifts in consumer behavior, coupled with global supply chain disruptions, labor shortages, changing channel dynamics (such as e-commerce) and decrease in brand loyalty forced brands and retailers to develop new strategies to meet the evolving needs of their customers.”

He expects that willingness to adopt new technologies will continue after the pandemic. For example, to get shoppers back into brick-and-mortar stores, retailers might try things like in-store navigation, improved browsing, loyalty programs and new check out and payment systems.

Trax’s Retail Watch, Dynamic Merchandising and Dynamic Workforce Management solutions were in development before the pandemic, though “it has certainly expedited the need for innovative digital solutions to longstanding retail pain points,” Behar added.

For example, Retail Watch supports online ordering features, like showing what products are available to online shoppers and helping store associates fulfill orders, while Dynamic Merchandising lets brands find on-demand workers for in-store execution issues—for example, if new stock needs to be delivered to a location immediately.

Other tech companies focused on retail analytics include Quant Retail, Pensa Systems and Bossa Nova Robotics. Behar said Trax differentiates with a cloud-based platform that is “extensible, flexible and scalable and combines multiple integrated technologies and data-collection methods, optimized to fit each store, such as IoT-enabled shelf-edge cameras, dome cameras, autonomous robots and images taken from smartphones, to enable complete and accurate store coverage.”

Its proprietary computer vision technology was also designed specifically for use in retail stores, and identifies individual SKUs on shelves, regardless of category. For example, Behar said it can distinguish between near identical or multiple products, deal with visual obstructions like odd angles or products that are obscured by another item and recognize issues with price tags.

“Like many innovative solutions, our most meaningful competition comes from the legacy systems deeply entrenched in the world of retail and the fear of change,” he added. “While we do see an acceleration of interest and adoption of digital innovation as a result of the ‘COVID effect,’ this is by far our biggest challenge.”

In a press statement, SoftBank Investment Advisers director Chris Lee said, “Through its innovative AI platform and image recognition technologies, we believe Trax is optimizing retail stores by enabling [consumer packaged goods] brands and retailers to execute better inventory strategies using data and analytics. We are excited to partner with the Trax team to help expand their product offerings and enter new markets.”

#asia, #brick-and-mortar, #computer-vision, #fundings-exits, #retail-analytics, #singapore, #southeast-asia, #startups, #tc, #trax, #trax-retail

0

Hong Kong-based viAct raises $2M for its automated construction monitoring platform

Hong Kong-based viAct helps construction sites perform around-the-clock monitoring with an AI-based cloud platform that combines computer vision, edge devices and a mobile app. The startup announced today it has raised a $2 million seed round, co-led by SOSV and Vectr Ventures. The funding included participation from Alibaba Hong Kong Entrepreneurs Fund, Artesian Ventures and ParticleX.

Founded in 2016, viAct currently serves more than 30 construction industry clients in Asia and Europe. Its new funding will be used on research and development, product development and expanding into Southeast Asian countries.

The platform uses computer vision to detect potential safety hazards, construction progress and the location of machinery and materials. Real-time alerts are sent to a mobile app with a simple interface, designed for engineers who are often “working in a noisy and dynamic environment that makes it hard to look at detailed dashboards,” co-founder and chief operating officer Hugo Cheuk told TechCrunch.

As companies signed up for viAct to monitor sites while complying with COVID-19 social distancing measures, the company provided training over Zoom to help teams onboard more quickly.

Cheuk said the company’s initial markets in Southeast Asia will include Indonesia and Vietnam because government planning for smart cities and new infrastructure means new construction projects there will increase over the next five to 10 years. It will also enter Singapore because developers are willing to adopt AI-based technology.

In a press statement, SOSV partner and Chinaccelerator managing director Oscar Ramos said, “COVID has accelerated digital transformation and traditional industries like construction are going through an even faster process of transformation that is critical for survival. The viAct team has not only created a product that drives value for the industry but has also been able to earn the trust of their customers and accelerate adoption.”

#asia, #building, #computer-vision, #construction, #construction-tech, #fundings-exits, #hong-kong, #startups, #tc, #viact

0

Quest for prosthetic retinas progresses towards human trials, with a VR assist

An artificial retina would be an enormous boon to the many people with visual impairments, and the possibility is creeping closer to reality year by year. One of the latest advancements takes a different and very promising approach, using tiny dots that convert light to electricity, and virtual reality has helped show that it could be a viable path forward.

These photovoltaic retinal prostheses come from the École polytechnique fédérale de Lausanne, where Diego Ghezzi has been working on the idea for several years now.

Early retinal prosthetics were created decades ago, and the basic idea is as follows. A camera outside the body (on a pair of glasses, for instance) sends a signal over a wire to a tiny microelectrode array, which consists of many tiny electrodes that pierce the non-functioning retinal surface and stimulate the working cells directly.

The problems with this are mainly that powering and sending data to the array requires a wire running from outside the eye in — generally speaking a “don’t” when it comes to prosthetics, and the body in general. The array itself is also limited in the number of electrodes it can have by the size of each, meaning for many years the effective resolution in the best case scenario was on the order of a few dozen or hundred “pixels.” (The concept doesn’t translate directly because of the way the visual system works.)

Ghezzi’s approach obviates both these problems with the use of photovoltaic materials, which turn light into an electric current. It’s not so different from what happens in a digital camera, except instead of recording the charge as in image, it sends the current into the retina like the powered electrodes did. There’s no need for a wire to relay power or data to the implant, because both are provided by the light shining on it.

Researcher Diego Ghezzi holds a contact lens with photovoltaic dots on it.

Image Credits: Alain Herzog / EPFL

In the case of the EPFL prosthesis, there are thousands of tiny photovoltaic dots, which would in theory be illuminated by a device outside the eye sending light in according to what it detects from a camera. Of course, it’s still an incredibly difficult thing to engineer. The other part of the setup would be a pair of glasses or goggles that both capture an image and project it through the eye onto the implant.

We first heard of this approach back in 2018, and things have changed somewhat since then, as a new paper documents.

“We increased the number of pixels from about 2,300 to 10,500,” explained Ghezzi in an email to TechCrunch. “So now it is difficult to see them individually and they look like a continuous film.”

Of course when those dots are pressed right up against the retina it’s a different story. After all, that’s only 100×100 pixels or so if it were a square — not exactly high definition. But the idea isn’t to replicate human vision, which may be an impossible task to begin with, let alone realistic for anyone’s first shot.

“Technically it is possible to make pixel smaller and denser,” Ghezzi explained. “The problem is that the current generated decreases with the pixel area.”

Image showing a close-up of the photovoltaic dots on the retinal implant, labeled as being about 80 microns across each.

Current decreases with pixel size, and pixel size isn’t exactly large to begin with.Image Credits: Ghezzi et al

So the more you add, the tougher it is to make it work, and there’s also the risk (which they tested) that two adjacent dots will stimulate the same network in the retina. But too few and the image created may not be intelligible to the user. 10,500 sounds like a lot, and it may be enough — but the simple fact is that there’s no data to support that. To start on that the team turned to what may seem like an unlikely medium: VR.

Because the team can’t exactly do a “test” installation of an experimental retinal implant on people to see if it works, they needed another way to tell whether the dimensions and resolution of the device would be sufficient for certain everyday tasks like recognizing objects and letters.

A digitally rendered street scene and distorted monochrome versions below showing various ways of representing it via virtual phosphors.

Image Credits: Jacob Thomas Thorn et al

To do this, they put people in VR environments that were dark except for little simulated “phosphors,” the pinpricks of light they expect to create by stimulating the retina via the implant; Ghezzi likened what people would see to a constellation of bright, shifting stars. They varied the number of phosphors, the area they appear over, and the length of their illumination or “tail” when the image shifted, asking participants how well they could perceive things like a word or scene.

The word "AGREE" rendered in various ways with virtual phosphors.

Image Credits: Jacob Thomas Thorn et al

Their primary finding was that the most important factor was visual angle — the overall size of the area where the image appears. Even a clear image is difficult to understand if it only takes up the very center of your vision, so even if overall clarity suffers it’s better to have a wide field of vision. The robust analysis of the visual system in the brain intuits things like edges and motion even from sparse inputs.

This demonstration showed that the implant’s parameters are theoretically sound and the team can start working towards human trials. That’s not something that can happen in a hurry, and while this approach is very promising compared with earlier, wired ones, it will still be several years even in the best case scenario before it’s possible it could be made widely available. Still, the very prospect of a working retinal implant of this type is an exciting one and we’ll be following it closely.

#accessibility, #blindness, #computer-vision, #disabilities, #epfl, #gadgets, #hardware, #science, #tc

0

Who Is Making Sure the A.I. Machines Aren’t Racist?

When Google forced out two well-known artificial intelligence experts, a long-simmering research controversy burst into the open.

#amazon-com-inc, #artificial-intelligence, #buolamwini-joy, #clarifai-inc, #computer-vision, #computers-and-the-internet, #discrimination, #facial-recognition-software, #gebru-timnit, #google-inc, #microsoft-corp, #mitchell-margaret-researcher, #research

0

SuperAnnotate, a computer vision platform, partners with with open-source to spread visual ML

SuperAnnotate, a NoCode computer vision platform, is partnering with OpenCV, a non-profit organization that has built a large collection of open-source computer vision algorithms. The move means startups and entrepreneurs will be able to build their own AI models and allow cameras to detect objects using machine learning. SuperAnnotate has so far raised $3M to date from investors including Point Nine Capital, Fathom Capital and Berkeley SkyDeck Fund.

The AI-powered computer vision platform for data scientists and annotation teams will provide OpenCV AI Kit (OAK) users with access to its platform, as well as launching a computer vision course on building AI models. SuperAnnotate will also set up the AI Kit’s camera to detect objects using machine learning and OAK users will get $200 of credit to set up their systems on its platform. 

The OAK is a multi-camera device that can run computer vision and 3D perception tasks such as identifying objects, counting people and measuring distances. Since launching, around 11,000 of these cameras have been distributed.

The AI Kit has so far been used to build drone and security applications, agricultural vision sensors or even COVID-related detection devices (for example, to identify people whether someone is wearing a mask or not).

Tigran Petrosyan, co-founder and CEO at SuperAnnotate said in a statement that: “Computer vision and smart camera applications are gaining momentum, yet not many have the relevant AI expertise to implement those. With OAK Kit and SuperAnnotate, one can finally build their smart camera system, even without coding experience.”

Competitors to SuperAnnotate include Dataloop, Labelbox, Appen and Hive
.

#articles, #artificial-intelligence, #computer-science, #computer-vision, #computing, #europe, #machine-learning, #opencv, #point-nine-capital, #tc

0

Amazon expands its biometric-based Amazon One palm reader system to more retail stores

Last fall, Amazon introduced a new biometric device, Amazon One, that allowed customers to pay at Amazon Go stores using their palm. Today, the company says the device is being rolled out to additional Amazon stores in Seattle — an expansion that will make the system available across eight total Amazon physical retail stores, including Amazon Go convenience stores, Amazon Go Grocery, Amazon Books, and Amazon 4-star stores.

Starting today, the Amazon One system is being added as an entry option at the Amazon Go location at Madison & Minor in Seattle. In the next few weeks, it will also roll out to two more Amazon Go stores, at 5th & Marion and Terry & Stewart, the company says. That brings the system to eight Seattle locations, and sets the stage for a broader U.S. expansion in the months ahead.

As described, the Amazon One system uses computer vision technology to create a unique palm print for each customer, which Amazon then associates with the credit card the customer inserts upon initial setup. While the customer doesn’t have to have an Amazon account to use the service, if they do associate their account information, they’ll be able to see their shopping history on the Amazon website.

Amazon says images of the palm print are encrypted and secured in the cloud, where customers’ palm signatures are created. At the time of its initial launch, Amazon argued that palm prints were a more private form of biometric authentication than some other methods, because you can’t determine a customer’s identity based only on the image of their palm.

But Amazon isn’t just storing palm images, of course. It’s matching them to customer accounts and credit cards, effectively building a database of customer biometrics. It can also then use the data collected, like shopping history, to introduce personalized offers and recommendations over time.

The system raises questions about Amazon’s larger plans, as the company’s historical use of biometrics has been fairly controversial. Amazon sold biometric facial recognition services to law enforcement in the U.S. Its facial recognition technology was the subject of a data privacy lawsuit. Its Ring camera company continues to work in partnership with police. In terms of user data privacy, Amazon hasn’t been careful either — for example, by continuing to store Alexa voice data even when users deleted audio files. 

What’s more is the company doesn’t just envision Amazon One as a means of entry into its own stores — they’re just a test market. In time, Amazon wants to make the technology available to third-parties, as well, including stadiums, office buildings and other non-Amazon retailers.

The timing of the Amazon One launch in the middle of a pandemic has helped spur customer adoption, as it allows for a contactless way to associate your credit card with your future purchases. Upon subsequent re-entry, you just hold your hand above the reader to be scanned again and let into the store.

These systems, however, can disadvantage a lower-socioeconomic group of customers, who prefer to pay using cash. They have to wait for special assistance in these otherwise cashless, checkout-free stores.

Amazon says the system will continue to roll out to more locations in the future.

#amazon, #amazon-one, #biometrics, #computer-vision, #ecommerce, #palm-reader, #retail, #shopping, #technology

0

Here’s a Way to Learn if Facial Recognition Systems Used Your Photos

An online tool targets only a small slice of what’s out there, but may open some eyes to how widely artificial intelligence research fed on personal images.

#computer-vision, #computers-and-the-internet, #facial-recognition-software, #flickr, #megvii-technology-ltd, #photography, #privacy, #tabriz-parisa, #university-of-washington

0

Cadeera is doing AI visual search for home decor

In recent years we’ve seen a whole bunch of visual/style fashion-focused search engines cropping up, tailored to helping people find the perfect threads to buy online by applying computer vision and other AI technologies to perform smarter-than-keywords visual search which can easily match and surface specific shapes and styles. Startups like Donde Search, Glisten and Stye.ai to name a few.

Early stage London-based Cadeera, which is in the midst of raising a seed round, wants to apply a similar AI visual search approach but for interior decor. All through the pandemic it’s been working on a prototype with the aim of making ecommerce discovery of taste-driven items like sofas, armchairs and coffee tables a whole lot more inspirational.

Founder and CEO Sebastian Spiegler, an early (former) SwiftKey employee with a PhD in machine learning and natural language processing, walked TechCrunch through a demo of the current prototype.

The software offers a multi-step UX geared towards first identifying a person’s decor style preferences — which it does by getting them to give a verdict on a number of look book images of rooms staged in different interior decor styles (via a Tinder-style swipe left or right).

It then uses these taste signals to start suggesting specific items to buy (e.g. armchairs, sofas etc) that fit the styles they’ve liked. The user can continue to influence selections by asking to see other similar items (‘more like this’), or see less similar items to broaden the range of stuff they’re shown — injecting a little serendipity into their search. 

The platform also lets users search by uploading an image — with Cadeera then parsing its database to surface similar looking items which are available for sale.

It has an AR component on its product map, too — which will eventually also let users visualize a potential purchase in situ in their home. Voice search will also be supported.

“Keyword search is fundamentally broken,” argues Spiegler. “Image you’re refurbishing or renovating your home and you say I’m looking for something, I’ve seen it somewhere, I only know when I see it, and I don’t really know what I want yet — so the [challenge we’re addressing is this] whole process of figuring out what you want.” 

“The mission is understanding personal preferences. If you don’t know yourself what you’re looking for we’re basically helping you with visual clues and with personalization and with inspiration pieces — which can be content, images and then at some point community as well — to figure out what you want. And for the retailer it helps them to understand what their clients want.”

“It increases trust, you’re more sure about your purchases, you’re less likely to return something — which is a huge cost to retailers. And, at the same time, you may also buy more because you more easily find things you can buy,” he adds.

Ecommerce has had a massive boost from the pandemic which continues to drive shopping online. But the flip side of that is bricks-and-mortar retailers have been hit hard.

The situation may be especially difficult for furniture retailers that may well have been operating showrooms before COVID-19 — relying upon customers being able to browse in-person to drive discovery and sales — so they are likely to be looking for smart tools that can help them transition to and/or increase online sales.

And sector-specific visual search engines do seem likely to see uplift as part of the wider pandemic-driven ecommerce shift.

“The reason why I want to start with interior design/home decor and furniture is that it’s a clearly underserved market. There’s no-one out there, in my view, that has cracked the way to search and find things more easily,” Spiegler tells TechCrunch. “In fashion there are quite a few companies out there. And I feel like we can master furniture and home decor and then move into other sectors. But for me the opportunity is here.”

“We can take a lot of the ideas from the fashion sector and apply it to furniture,” he adds. “I feel like there’s a huge gap — and no-one has looked at it sufficiently.”

The size of the opportunity Cadeera is targeting is a $10BN-$20BN market globally, per Spiegler. 

The startup’s initial business model is b2b — with the plan being to start selling its SaaS to ecommerce retailers to integrate the visual search tools directly into their own websites.

Spiegler says they’re working with a “big” UK-based vintage platform — and aiming to get something launched to the market within the next six to nine months with one to two customers. 

They will also — as a next order of business — offer apps for ecommerce platforms such as WooCommerce, BigCommerce and Shopify to integrate a set of their search tools. (Larger retailers will get more customization of the platform, though.)

On the question of whether Cadeera might develop a b2c offer by launching a direct consumer app itself, Spiegler admits that is an “end goal”.

“This is the million dollar question — my end-goal, my target is building a consumer app. Building a central place where all your shopping preferences are stored — kind of a mix of Instagram where you see inspiration and Pinterest where you can keep what you looked at and then get relevant recommendations,” he says.

“This is basically the idea of a product search engine we want to build. But what I’m showing you are the steps to get there… and we hopefully end in the place where we have a community, we have a b2c app. But the way I look at it is we start through b2b and then at some point switch the direction and open it up by providing a single entry point for the consumer.”

But, for now, the b2b route means Cadeera can work closely with retailers in the meanwhile — increasing its understanding of retail market dynamics and getting access to key data needed power its platform, such as style look books and item databases.

“What we end up with is a large inventory data-set/database, a design knowledge base and imagery and style meta information. And on top of that we do object detection, object recognition, recommendation, so the whole shebang in AI — for the purpose of personalization, exploration, search and suggestion/recommendation,” he goes on, sketching the various tech components involved.

“On the other side we provide an API so you can integrate into use as well. And if you need we can also provide with a responsive UX/UI.”

“Beyond all of that we are creating an interesting data asset where we understand what the user wants — so we have user profiles, and in the future those user profiles can be cross-platform. So if you purchase something at one ecommerce site or one retailer you can then go to another retailer and we can make relevant recommendations based on what you purchased somewhere else,” he adds. “So your whole purchasing history, your style preferences and interaction data will allow you to get the most relevant recommendations.”

While the usual tech giant suspects still dominate general markets for search (Google) and ecommerce (Amazon), Cadeera isn’t concerned about competition from the biggest global platforms — given they are not focused on tailoring tools for a specific furniture/home decor niche.

He also points out that Amazon continues to do a very poor job on recommendations on its own site, despite having heaps of data.

“I’ve been asking — and I’ve been asked as well — so many times why is Amazon doing such a poor job on recommendations and in search. The true answer is I don’t know! They have probably the best data set… but the recommendations are poor,” he says. “What we’re doing here is trying to reinvent a whole product. Search should work… and the inspiration part, for things that are more opaque, is something important that is missing with anything I’ve seen so far.”

And while Facebook did acquire a home decor-focused visual search service (called GrokStyle) back in 2019, Spiegler suggests it’s most likely to integrate their tech (which included AR for visualization) into its own marketplace — whereas he’s convinced most retailers will want to be able to remain independent of the Facebook walled garden.

“GrokStyle will become part of Facebook marketplace but if you’re a retailer the big question is how much do you want to integrate into Facebook, how much do you want to be dependent on Facebook? And I think that’s a big question for a lot of retailers. Do you want to dependent on Google? Do you want to be dependent on Amazon? Do you want to be dependent on Facebook?” he says. “My guess is no. Because you basically want to stay as far away as possible because they’re going to eat up your lunch.”   

#ai-visual-search, #cadeera, #computer-vision, #covid-19, #ecommerce, #saas, #startups, #tc

0

Veo raises $25M for AI-based cameras that record and analyze football and other team sports

Sports have been among some of the most popular and lucrative media plays in the world, luring broadcasters, advertisers and consumers to fork out huge sums to secure the chance to watch (and sponsor) their favorite teams and athletes.

That content, unsurprisingly, also typically costs a ton of money to produce, narrowing the production and distribution funnel even more. But today, a startup that’s cracked open that model with an autonomous, AI -based camera that lets any team record, edit and distribute their games, is announcing a round of funding to build out its business targeting the long tail of sporting teams and fixtures.

Veo Technologies, a Copenhagen startup that has designed a video camera and cloud-based subscription service to record and then automatically pick out highlights of games, which it then hosts on a platform for its customers to access and share that video content, has picked up €20 million (around $24.5 million) in a Series B round of funding.

The funding is being led by Danish investor Chr. Augustinus Fabrikker, with participation from US-based Courtside VC, France’s Ventech and Denmark’s SEED Capital. Veo’s CEO and co-founder Henrik Teisbæk said in an interview that the startup is not disclosing its valuation, but a source close to funding tells me that it’s well over $100 million.

Teisbæk said that the plan will be to use to the funds to continue expanding the company’s business on two levels. First, Veo will be digging into expanding its US operations, with an office in Miami.

Second, it plans to continue enhancing the scope of its technology: The company started out optimising its computer vision software to record and track the matches for the most popular team sport in the world, football (soccer to US readers), with customers buying the cameras — which retail for $800 — and the corresponding (mandatory) subscriptions — $1,200 annually — both to record games for spectators, as well as to use the footage for all kinds of practical purposes like training and recruitment videos. The key is that the cameras can be set up and left to run on their own. Once they are in place, they can record using wide-angles the majority of a soccer field (or whatever playing space is being used) and then zoom and edit down based on that.

Veo on grass

Now, Veo is building the computer vision algorithms to expand that proposition into a plethora of other team-based sports including rugby, basketball and hockey, and it is ramping up the kinds of analytics that it can provide around the clips that it generates as well as the wider match itself.

Even with the slowdown in a lot of sporting activity this year due to Covid — in the UK for example, we’re in a lockdown again where team sports below professional leagues, excepting teams for disabled people, have been prohibited — Veo has seen a lot of growth.

The startup currently works with some 5,000 clubs globally ranging from professional sports teams through to amateur clubs for children, and it has recorded and tracked 200,000 games since opening for business in 2018, with a large proportion of that volume in the last year and in the US.

For a point of reference, in 2019, when we covered a $6 million round for Veo, the startup had racked up 1,000 clubs and 25,000 games, pointing to customer growth of 400% in that period.

The Covid-19 pandemic has indeed altered the playing field — literally and figuratively — for sports in the past year. Spectators, athletes, and supporting staff need to be just as mindful as anyone else when it comes to spreading the coronavirus.

That’s not just led to a change in how many games are being played, but also for attendance: witness the huge lengths that the NBA went to last year to create an extensive isolation bubble in Orlando, Florida, to play out the season, with no actual fans in physical seats watching games, but all games and fans virtually streamed into the events as they happened.

That NBA effort, needless to say, came at a huge financial cost, one that any lesser league would never be able to carry, and so that predicament has led to an interesting use case for Veo.

Pre-pandemic, the Danish startup was quietly building its business around catering to the long tail of sporting organizations who — even in the best of times — would be hard pressed to find the funds to buy cameras and/or hire videographers to record games, not just an essential part of how people can enjoy a sporting event, but useful for helping with team development.

“There is a perception that football is already being recorded and broadcast, but in the UK (for example) it’s only the Premier League,” Teisbæk said. “If you go down one or two steps from that, nothing is being recorded.” Before Veo, to record a football game, he added, “you need a guy sitting on a scaffold, and time and money to then cut that down to highlights. It’s just too cumbersome. But video is the best tool there is to develop talent. Kids are visual learners. And it’s a great way to get recruited sending videos to colleges.”

Those use cases then expanded with the pandemic, he said. “Under cornavirus rules, parents cannot go out and watch their kids, and so video becomes a tool to follow those matches.”

‘We’re a Shopify, not an Amazon’

The business model for Veo up to now has largely been around what Teisbæk described as “the long tail theory”, which in the case of sports works out, he said, as “There won’t be many viewers for each match, but there are millions of matches out there.” But if you consider how a lot of high school sports will attract locals beyond those currently attached to a school — you have alumni supporters and fans, as well as local businesses and neighborhoods — even that long tail audience might be bigger than one might imagine.

Veo’s long-tail focus has inevitably meant that its target users are in the wide array of amateur or semi-pro clubs and the people associated with them, but interestingly it has also spilled into big names, too.

Veo’s cameras are being used by professional soccer clubs in the Premier League, Spain’s La Liga, Italy’s Serie A and France’s Ligue 1, as well as several clubs in the MLS such as Inter Miami, Austin FC, Atlanta United and FC Cincinnati. Teisbæk noted that while this might never be for primary coverage, it’s there to supplement for training and also be used in the academies attached to those organizations.

The plan longer term, he said, is not to build its own media empire with trove of content that it has amassed, but to be an enabler for creating that content for its customers, who can in turn use it as they wish. It’s a “Shopify, not an Amazon,” said Teisbæk.

“We are not building the next ESPN, but we are helping the clubs unlock these connections that are already in place by way of our technology,” he said. “We want to help help them capture and stream their matches and their play for the audience that is there today.”

That may be how he views the opportunity, but some investors are already eyeing up the bigger picture.

Vasu Kulkarni, a partner at Courtside VC — a firm that has focused (as its name might imply) on backing a lot of different sports-related businesses, with The Athletic, Beam (acquired by Microsoft), and many others in its portfolio — said that he’d been looking to back a company like Veo, building a smart, tech-enabled way to record and parse sports in a more cost-effective way.

“I spent close to four years trying to find a company trying to do that,” he said.

“I’ve always been a believer in sports content captured at the long tail,” he said. Coincidentally, he himself started a company called Krossover in his dorm room to help somewhat with tracking and recording sports training. Krossover eventually was acquired by Hudl, which Veo sees as a competitor.

“You’ll never have the NBA finals recorded on Veo, there is just too much at stake, but when you start to look at all the areas where there isn’t enough mass media value to hire people, to produce and livestream, you get to the point where computer vision and AI are going to be doing the filming to get rid of the cost.”

He said that the economics are important here: the camera needs to be less than $1,000 (which it is) and produce something demonstrably better than “a parent with a Best Buy camcorder that was picked up for $100.”

Kulkarni thinks that longer term there could definitely be an opportunity to consider how to help clubs bring that content to a wider audience, especially using highlights and focusing on the best of the best in amateur games — which of course are the precursors to some of those players one day being world-famous elite athletes. (Think of how exciting it is to see the footage of Michael Jordan playing as a young student for some context here.) “AI will be able to pull out the best 10-15 plays and stitch them together for highlight reels,” he said, something that could feasibly find a market with sports fans wider than just the parents of the actual players.

All of that then feeds a bigger market for what has started to feel like an insatiable appetite for sports, one that, if anything, has found even more audience at a time when many are spending more time at home and watching video overall. “The more video you get from the sport, the better the sport gets, for players and fans,” Teisbæk said.

#ai, #artificial-intelligence, #computer-vision, #europe, #football, #sports, #streaming, #tc, #veo, #video

0

AWS announces Panorama a device adds machine learning technology to any camera

AWS has launched a new hardware device, the AWS Panorama Appliance, which, alongside the AWS Panorama SDK, will transform existing on-premises cameras into computer vision enabled super-powered surveillance devices.

Pitching the hardware as a new way for customers to inspect parts on manufacturing lines, ensure that safety protocols are being followed, or analyze traffic in retail stores, the new automation service is part of the theme of this AWS re:Invent event — automate everything.

Along with computer vision models that companies can develop using Amazon SageMaker, the new Panorama Appliance can run those models on video feeds from networked or network-enabled cameras.

Soon, AWS expects to have the Panorama SDK that can be used by device manufacturers to build Panorama-enabled devices.

Amazon has already pitched surveillance technologies to developers and the enterprise before. Back in 2017, the company unveiled DeepLens, which it began selling one year later. It was a way for developers to build prototype machine learning models and for Amazon to get comfortable with different ways of commercializing computer vision capabilities.

As we wrote in 2018:

DeepLens is deeply integrated with the rest of AWS’s services. Those include the AWS IoT service Greengrass, which you use to deploy models to DeepLens, for example, but also SageMaker, Amazon’s newest tool for building machine learning models… Indeed, if all you want to do is run one of the pre-built samples that AWS provides, it shouldn’t take you more than 10 minutes to set up … DeepLens and deploy one of these models to the camera. Those project templates include an object detection model that can distinguish between 20 objects (though it had some issues with toy dogs, as you can see in the image above), a style transfer example to render the camera image in the style of van Gogh, a face detection model and a model that can distinguish between cats and dogs and one that can recognize about 30 different actions (like playing guitar, for example). The DeepLens team is also adding a model for tracking head poses. Oh, and there’s also a hot dog detection model.

 

Amazon has had a lot of experience (and controversy) when it comes to the development of machine learning technologies for video. The company’s Rekognition software sparked protests and pushback which led to a moratorium on the use of the technology.

And the company has tried to incorporate more machine learning capabilities into its consumer facing Ring cameras as well.

Still, enterprises continue to clamor for new machine learning-enabled video recognition technologies for security, safety, and quality control. Indeed, as the COVID-19 pandemic drags on, new protocols around building use and occupancy are being adopted to not only adapt to the current epidemic, but plan ahead for spaces and protocols that can help mitigate the severity of the next one.

 

#amazon, #amazon-sagemaker, #amazon-web-services, #aws, #aws-reinvent-2020, #cloud-computing, #cloud-infrastructure, #computer-vision, #computing, #deeplens, #machine-learning, #retail-stores, #surveillance-technologies, #tc

0

Intel and Nvidia Chips Power a Chinese Surveillance System

Intel and Nvidia chips power a supercomputing center that tracks people in a place where government suppresses minorities, raising questions about the tech industry’s responsibility.

#artificial-intelligence, #cameras, #china, #computer-chips, #computer-vision, #computers-and-the-internet, #facial-recognition-software, #human-rights-and-human-rights-violations, #intel-corporation, #muslims-and-islam, #nvidia-corporation, #surveillance-of-citizens-by-government, #uighurs-chinese-ethnic-group, #xinjiang-china

0

Xesto is a foot scanning app that simplifies shoe gifting

You wait ages for foot scanning startups to help with the tricky fit issue that troubles online shoe shopping and then two come along at once: Launching today in time for Black Friday sprees is Xesto — which like Neatsy, which we wrote about earlier today, also makes use of the iPhone’s TrueDepth camera to generate individual 3D foot models for shoe size recommendations.

The Canadian startup hasn’t always been focused on feet. It has a long-standing research collaboration with the University of Toronto, alma mater of its CEO and co-founder Sophie Howe (its other co-founder and chief scientist, Afiny Akdemir, is also pursuing a Math PhD there) — and was actually founded back in 2015 to explore business ideas in human computer interaction.

But Howe tells us it moved into mobile sizing shortly after the 2017 launch of the iPhone X — which added a 3D depth camera to Apple’s smartphone. Since then Apple has added the sensor to additional iPhone models, pushing it within reach of a larger swathe of iOS users. So you can see why startups are spying a virtual fit opportunity here.

“This summer I had an aha! moment when my boyfriend saw a pair of fancy shoes on a deep discount online and thought they would be a great gift. He couldn’t remember my foot length at the time, and knew I didn’t own that brand so he couldn’t have gone through my closet to find my size,” says Howe. “I realized in that moment shoes as gifts are uncommon because they’re so hard to get correct because of size, and no one likes returning and exchanging gifts. When I’ve bought shoes for him in the past, I’ve had to ruin the surprise by calling him – and I’m not the only one. I realized in talking with friends this was a feature they all wanted without even knowing it… Shoes have such a cult status in wardrobes and it is time to unlock their gifting potential!”

Howe slid into this TechCrunch writer’s DMs with the eye-catching claim that Xesto’s foot-scanning technology is more accurate than Neatsy’s — sending a Xesto scan of her foot compared to Neatsy’s measure of it to back up the boast. (Aka: “We are under 1.5 mm accuracy. We compared against Neatsy right now and they are about 1.5 cm off of the true size of the app,” as she put it.)

Another big difference is Xesto isn’t selling any shoes itself. Nor is it interested in just sneakers; its shoe-type agnostic. If you can put it on your feet it wants to help you find the right fit, is the idea.

Right now the app is focused on the foot scanning process and the resulting 3D foot models — showing shoppers their feet in a 3D point cloud view, another photorealistic view as well as providing granular foot measurements.

There’s also a neat feature that lets you share your foot scans so, for example, a person who doesn’t have their own depth sensing iPhone could ask to borrow a friend’s to capture and takeaway scans of their own feet.

Helping people who want to be bought (correctly fitting) shoes as gifts is the main reason they’ve added foot scan sharing, per Howe — who notes shoppers can create and store multiple foot profiles on an account “for ease of group shopping”.

“Xesto is solving two problems: Buying shoes [online] for yourself, and buying shoes for someone else,” she tells TechCrunch. “Problem 1: When you buy shoes online, you might be unfamiliar with your size in the brand or model. If you’ve never bought from a brand before, it is very risky to make a purchase because there is very limited context in selecting your size. With many brands you translate your size yourself.

“Problem 2: People don’t only buy shoes for themselves. We enable gift and family purchasing (within a household or remote!) by sharing profiles.”

Xesto is doing its size predictions based on comparing a user’s (<1.5mm accurate) foot measurements to brands’ official sizing guidelines — with more than 150 shoe brands currently supported.

Howe says it plans to incorporate customer feedback into these predictions — including by analyzing online reviews where people tend to specify if a particular shoe sizes larger or smaller than expected. So it’s hoping to be able to keep honing the model’s accuracy.

“What we do is remove the uncertainty of finding your size by taking your 3D foot dimensions and correlate that to the brands sizes (or shoe model, if we have them),” she says. “We use the brands size guides and customer feedback to make the size recommendations. We have over 150 brands currently supported and are continuously adding more brands and models. We also recommend if you have extra wide feet you read reviews to see if you need to size up (until we have all that data robustly gathered).”

Asked about the competitive landscape, given all this foot scanning action, Howe admits there’s a number of approaches trying to help with virtual shoe fit — such as comparative brand sizing recommendations or even foot scanning with pieces of paper. But she argues Xesto has an edge because of the high level of detail of its 3D scans — and on account of its social sharing feature. Aka this is an app to make foot scans you can send your bestie for shopping keepsies.

“What we do that is unique is only use 3D depth data and computer vision to create a 3D scan of the foot with under 1.5mm accuracy (unmatched as far as we’ve seen) in only a few minutes,” she argues. “We don’t ask you any information about your feet, or to use a reference object. We make size recommendations based on your feet alone, then let you share them seamlessly with loved ones. Size sharing is a unique feature we haven’t seen in the sizing space that we’re incredibly excited about (not only because we will get more shoes as gifts :D).”

Xesto’s iOS app is free for shoppers to download. It’s also entirely free to create and share your foot scan in glorious 3D point cloud — and will remain so according to Howe. The team’s monetization plan is focused on building out partnerships with retailers, which is on the slate for 2021.

“Right now we’re not taking any revenue but next year we will be announcing partnerships where we work directly within brands ecosystems,” she says, adding: “[We wanted to offer] the app to customers in time for Black Friday and the holiday shopping season. In 2021, we are launching some exciting initiatives in partnership with brands. But the app will always be free for shoppers!”

Since being founded around five years ago, Howe says Xesto has raised a pre-seed round from angel investors and secured national advanced research grants, as well as taking in some revenue over its lifetime. The team has one patent granted and one pending for their technologies, she adds.

#artificial-intelligence, #computer-vision, #ecommerce, #online-shopping, #startups, #virtual-fit, #xesto

0

Neatsy wants to reduce sneaker returns with 3D foot scans

U.S.-based startup Neatsy AI is using the iPhone’s depth-sensing FaceID selfie camera as a foot scanner to capture 3D models for predicting a comfortable sneaker fit.

Its app, currently soft launched for iOS but due to launch officially next month, asks the user a few basic questions about sneaker fit preference before walking through a set of steps to capture a 3D scan of their feet using the iPhone’s front-facing camera. The scan is used to offer personalized fit predictions for a selection of sneakers offered for sale in-app — displaying an individualized fit score (out of five) in green text next to each sneaker model.

Shopping for shoes online can lead to high return rates once buyers actually get to slip on their chosen pair, since shoe sizing isn’t standardized across different brands. That’s the problem Neatsy wants its AI to tackle by incorporating another more individual fit signal into the process.

The startup, which was founded in March 2019, has raised $400K in pre-seed funding from angel investors to get its iOS app to market. The app is currently available in the US, UK, Germany, France, Italy, Spain, Netherlands, Canada and Russia. 

Neatsy analyzes app users’ foot scans using a machine learning model it’s devised to predict a comfy fit across a range of major sneaker brands — currently including Puma, Nike, Jordan Air and Adidas — based on scanning the insoles of sneakers, per CEO and founder Artem Semyanov.

He says they’re also factoring in the material shoes are made of and will be honing the algorithm on an ongoing basis based on fit feedback from users. (The startup says it’s secured a US patent for its 3D scanning tech for shoe recommendations.)

The team tested the algorithm’s efficiency via some commercial pilots this summer — and say they were able to demonstrate a 2.7x reduction in sneaker return rates based on size, and a 1.9x decrease in returns overall, for a focus group with 140 respondents.

Handling returns is clearly a major cost for online retailers — Neatsy estimates that sneaker returns specifically rack up $30BN annually for ecommerce outlets, factoring in logistics costs and other factors like damaged boxes and missing sneakers.

“All in all, shoe ecommerce returns vary among products and shops between 30% and 50%. The most common reasons for this category are fit & size mismatch,” says Semyanov, who headed up the machine learning team at Prism Labs prior to founding Neatsy.

“According to Zappos, customers who purchase its most expensive footwear ultimately return ~50% of everything they buy. 70% online shoppers make returns each year. Statista estimates return deliveries will cost businesses $550 billion by 2020,” he tells us responding to questions via email.

“A 2019 survey from UPS found that, for 73% of shoppers, the overall returns experience impacts how likely they are to purchase from a given retailer again, and 68% say the experience impacts their overall perceptions of the retailer. That’s the drama here!

“Retailers are forced to accept steep costs of returns because otherwise, customers won’t buy. Vs us who want to treat the main reasons of returns rather than treating the symptoms.”

While ecommerce giants like Amazon address this issue by focusing on logistics to reducing friction in the delivery process, speeding up deliveries and returns so customers spend less time waiting to get the right stuff, scores of startups have been trying to tackle size and fit with a variety of digital (and/or less high tech) tools over the past five+ years — from 3D body models to ‘smart’ sizing suits or even brand- and garment-specific sizing tape (Nudea‘s fit tape for bras) — though no one has managed to come up with a single solution that works for everything and everyone. And a number of these startups have deadpooled or been acquired by ecommerce platforms without a whole lot to show for it.

While Neatsy is attempting to tackle what plenty of other founders have tried to do on the fit front, it is at least targeting a specific niche (sneakers) — a relatively narrow focus that may help it hone a useful tool.

It’s also able to lean on mainstream availability of the iPhone’s sensing hardware to get a leg up. (Whereas a custom shoe design startup that’s been around for longer, Solely Original, has offered custom fit by charging a premium to send out an individual fit kit.)

But even zeroing in on sneaker comfort, Neatsy’s foot scanning process does require the user to correctly navigate quite a number of steps (see the full flow in the below video). Plus you need to have a pair of single-block colored socks handy (stripy sock lovers are in trouble). So it’s not a two second process, though the scan only has to be done once.

At the time of writing we hadn’t been able to test Neatsy’s scanning process for ourselves as it requires an iPhones with a FaceID depth-sensing camera. On this writer’s 2nd-gen iPhone SE, the app allowed me to swipe through each step of the scan instruction flow but then hung at what should have been the commencement of scanning — displaying a green outline template of a left foot against a black screen.

This is a bug the team said they’ll be fixing so the scanner gets turned off entirely for iPhone models that don’t have the necessary hardware. (Its App Store listing states its compatible with iPhone SE (2nd generation), though doesn’t specify the foot scan feature isn’t.) 

While the current version of Neatsy’s app is a direct to consumer ecommerce play, targeting select sneaker models at app savvy Gen Z/Millennials, it’s clearly intended as a shopfront for retailers to check out the technology.

When as ask about this Semyanov confirms its longer term ambition is for its custom fit model to become a standard piece of the ecommerce puzzle.

“Neatsy app is our fastest way to show the world our vision of what the future online shop should be,” he tells TechCrunch. “It attracts users to shops and we get revenue share when users buy sneakers via us. The app serves as a new low-return sales channel for a retailer and as a way to see the economic effect on returns by themselves.

“Speaking long term we think that our future is B2B and all ecommerce shops would eventually have a fitting tech, we bet it will be ours. It will be the same as having a credit card payment integration in your online shop.”

#artificial-intelligence, #computer-vision, #ecommerce, #iphone-faceid, #machine-learning, #neatsy, #prisma-labs

0

Computer vision startup Chooch.ai scores $20M Series A

Chooch.ai, a startup that hopes to bring computer vision more broadly to companies to help them identify and tag elements at high speed, announced a $20 million Series A today.

Vickers Venture Partners led the round with participation from 212, Streamlined Ventures, Alumni Ventures Group, Waterman Ventures and several other unnamed investors. Today’s investment brings the total raised to $25.8 million, according to the company.

“Basically we set out to copy human visual intelligence in machines. That’s really what this whole journey is about,” CEO and co-founder Emrah Gultekin explained. As the company describes it, “Chooch Al can rapidly ingest and process visual data from any spectrum, generating AI models in hours that can detect objects, actions, processes, coordinates, states, and more.”

Chooch is trying to differentiate itself from other AI startups by taking a broader approach that could work in any setting, rather than concentrating on specific vertical applications. Using the pandemic as an example, Gultekin says you could use his company’s software to identify everyone who is not wearing a mask in the building or everyone who is not wearing a hard hat at construction site.

 

With 22 employees spread across the U.S., India and Turkey, Chooch is building a diverse company just by virtue of its geography, but as it doubles the workforce in the coming year, it wants to continue to build on that.

“We’re immigrants. We’ve been through a lot of different things, and we recognize some of the issues and are very sensitive to them. One of our senior members is a person of color and we
are very cognizant of the fact that we need to develop that part of our company,” he said. At a recent company meeting, he said that they were discussing how to build diversity into the policies and values of the company as they move forward.

The company currently has 18 enterprise clients and hopes to use the money to add engineers, data scientists and begin to build out a worldwide sales team to continue to build the product and expand its go-to-market effort.

Gultekin says that the company’s unusual name comes from a mix of the words choose and search. He says that it is also an old Italian insult. “It means dummy or idiot, which is what artificial intelligence is today. It’s a poor reflection of humanity or human intelligence in humans,” he said. His startup aims to change that.

#artificial-intelligence, #computer-vision, #data-labeling-tools, #enterprise, #funding, #machine-learning, #recent-funding, #startups, #tagging, #tc, #vickers-venture-partners

0

Training Facial Recognition on Some New Furry Friends: Bears

In their spare time, two Silicon Valley developers aided conservationists in developing artificial intelligence to help keep track of individual bears.

#alaska, #artificial-intelligence, #bears, #british-columbia-canada, #computer-vision, #computers-and-the-internet, #conservation-of-resources, #ecology-and-evolution-journal, #facial-recognition-software, #katmai-national-park-alaska, #research, #your-feed-animals, #your-feed-science

0

Deep Science: Alzheimer’s screening, forest-mapping drones, machine learning in space, more

Research papers come out far too rapidly for anyone to read them all, especially in the field of machine learning, which now affects (and produces papers in) practically every industry and company. This column aims to collect the most relevant recent discoveries and papers — particularly in but not limited to artificial intelligence — and explain why they matter.

This week, a startup that’s using UAV drones for mapping forests, a look at how machine learning can map social media networks and predict Alzheimer’s, improving computer vision for space-based sensors and other news regarding recent technological advances.

Predicting Alzheimer’s through speech patterns

Machine learning tools are being used to aid diagnosis in many ways, since they’re sensitive to patterns that humans find difficult to detect. IBM researchers have potentially found such patterns in speech that are predictive of the speaker developing Alzheimer’s disease.

The system only needs a couple minutes of ordinary speech in a clinical setting. The team used a large set of data (the Framingham Heart Study) going back to 1948, allowing patterns of speech to be identified in people who would later develop Alzheimer’s. The accuracy rate is about 71% or 0.74 area under the curve for those of you more statistically informed. That’s far from a sure thing, but current basic tests are barely better than a coin flip in predicting the disease this far ahead of time.

This is very important because the earlier Alzheimer’s can be detected, the better it can be managed. There’s no cure, but there are promising treatments and practices that can delay or mitigate the worst symptoms. A non-invasive, quick test of well people like this one could be a powerful new screening tool and is also, of course, an excellent demonstration of the usefulness of this field of tech.

(Don’t read the paper expecting to find exact symptoms or anything like that — the array of speech features aren’t really the kind of thing you can look out for in everyday life.)

So-cell networks

Making sure your deep learning network generalizes to data outside its training environment is a key part of any serious ML research. But few attempt to set a model loose on data that’s completely foreign to it. Perhaps they should!

Researchers from Uppsala University in Sweden took a model used to identify groups and connections in social media, and applied it (not unmodified, of course) to tissue scans. The tissue had been treated so that the resultant images produced thousands of tiny dots representing mRNA.

Normally the different groups of cells, representing types and areas of tissue, would need to be manually identified and labeled. But the graph neural network, created to identify social groups based on similarities like common interests in a virtual space, proved it could perform a similar task on cells. (See the image at top.)

“We’re using the latest AI methods — specifically, graph neural networks, developed to analyze social networks — and adapting them to understand biological patterns and successive variation in tissue samples. The cells are comparable to social groupings that can be defined according to the activities they share in their social networks,” said Uppsala’s Carolina Wählby.

It’s an interesting illustration not just of the flexibility of neural networks, but of how structures and architectures repeat at all scales and in all contexts. As without, so within, if you will.

Drones in nature

The vast forests of our national parks and timber farms have countless trees, but you can’t put “countless” on the paperwork. Someone has to make an actual estimate of how well various regions are growing, the density and types of trees, the range of disease or wildfire, and so on. This process is only partly automated, as aerial photography and scans only reveal so much, while on-the-ground observation is detailed but extremely slow and limited.

Treeswift aims to take a middle path by equipping drones with the sensors they need to both navigate and accurately measure the forest. By flying through much faster than a walking person, they can count trees, watch for problems and generally collect a ton of useful data. The company is still very early-stage, having spun out of the University of Pennsylvania and acquired an SBIR grant from the NSF.

“Companies are looking more and more to forest resources to combat climate change but you don’t have a supply of people who are growing to meet that need,” Steven Chen, co-founder and CEO of Treeswift and a doctoral student in Computer and Information Science (CIS) at Penn Engineering said in a Penn news story. “I want to help make each forester do what they do with greater efficiency. These robots will not replace human jobs. Instead, they’re providing new tools to the people who have the insight and the passion to manage our forests.”

Another area where drones are making lots of interesting moves is underwater. Oceangoing autonomous submersibles are helping map the sea floor, track ice shelves and follow whales. But they all have a bit of an Achilles’ heel in that they need to periodically be picked up, charged and their data retrieved.

Purdue engineering professor Nina Mahmoudian has created a docking system by which submersibles can easily and automatically connect for power and data exchange.

A yellow marine robot (left, underwater) finds its way to a mobile docking station to recharge and upload data before continuing a task. (Purdue University photo/Jared Pike)

The craft needs a special nosecone, which can find and plug into a station that establishes a safe connection. The station can be an autonomous watercraft itself, or a permanent feature somewhere — what matters is that the smaller craft can make a pit stop to recharge and debrief before moving on. If it’s lost (a real danger at sea), its data won’t be lost with it.

You can see the setup in action below:

https://youtu.be/kS0-qc_r0

Sound in theory

Drones may soon become fixtures of city life as well, though we’re probably some ways from the automated private helicopters some seem to think are just around the corner. But living under a drone highway means constant noise — so people are always looking for ways to reduce turbulence and resultant sound from wings and propellers.

Computer model of a plane with simulated turbulence around it.

It looks like it’s on fire, but that’s turbulence.

Researchers at the King Abdullah University of Science and Technology found a new, more efficient way to simulate the airflow in these situations; fluid dynamics is essentially as complex as you make it, so the trick is to apply your computing power to the right parts of the problem. They were able to render only flow near the surface of the theoretical aircraft in high resolution, finding past a certain distance there was little point knowing exactly what was happening. Improvements to models of reality don’t always need to be better in every way — after all, the results are what matter.

Machine learning in space

Computer vision algorithms have come a long way, and as their efficiency improves they are beginning to be deployed at the edge rather than at data centers. In fact it’s become fairly common for camera-bearing objects like phones and IoT devices to do some local ML work on the image. But in space it’s another story.

Image Credits: Cosine

Performing ML work in space was until fairly recently simply too expensive power-wise to even consider. That’s power that could be used to capture another image, transmit the data to the surface, etc. HyperScout 2 is exploring the possibility of ML work in space, and its satellite has begun applying computer vision techniques immediately to the images it collects before sending them down. (“Here’s a cloud — here’s Portugal — here’s a volcano…”)

For now there’s little practical benefit, but object detection can be combined with other functions easily to create new use cases, from saving power when no objects of interest are present, to passing metadata to other tools that may work better if informed.

In with the old, out with the new

Machine learning models are great at making educated guesses, and in disciplines where there’s a large backlog of unsorted or poorly documented data, it can be very useful to let an AI make a first pass so that graduate students can use their time more productively. The Library of Congress is doing it with old newspapers, and now Carnegie Mellon University’s libraries are getting into the spirit.

CMU’s million-item photo archive is in the process of being digitized, but to make it useful to historians and curious browsers it needs to be organized and tagged — so computer vision algorithms are being put to work grouping similar images, identifying objects and locations, and doing other valuable basic cataloguing tasks.

“Even a partly successful project would greatly improve the collection metadata, and could provide a possible solution for metadata generation if the archives were ever funded to digitize the entire collection,” said CMU’s Matt Lincoln.

A very different project, yet one that seems somehow connected, is this work by a student at the Escola Politécnica da Universidade de Pernambuco in Brazil, who had the bright idea to try sprucing up some old maps with machine learning.

The tool they used takes old line-drawing maps and attempts to create a sort of satellite image based on them using a Generative Adversarial Network; GANs essentially attempt to trick themselves into creating content they can’t tell apart from the real thing.

Image Credits: Escola Politécnica da Universidade de Pernambuco

Well, the results aren’t what you might call completely convincing, but it’s still promising. Such maps are rarely accurate but that doesn’t mean they’re completely abstract — recreating them in the context of modern mapping techniques is a fun idea that might help these locations seem less distant.

#artificial-intelligence, #computer-vision, #deep-science, #health, #lab-wrap, #machine-learning, #tc

0

mmhmm, Phil Libin’s new startup, acquires Memix to add enhanced filters to its video presentation toolkit

Virtual meetings are a fundamental part of how we interact with each other these days, but even when (if!?) we find better ways to mitigate the effects of Covid-19, many think that they will be here to stay. That means there is an opportunity out there to improve how they work — because let’s face it, Zoom Fatigue is real and I for one am not super excited anymore to be a part of your Team.

mmhmm, the video presentation startup from former Evernote CEO Phil Libin with ambitions to change the conversation (literally and figuratively) about what we can do with the medium — its first efforts have included things like the ability to manipulate presentation material around your video in real time to mimic newscasts — is today announcing an acquisition as it continues to hone in on a wider launch of its product, currently in a closed beta.

It has acquired Memix, an outfit out of San Francisco that has built a series of filters you can apply to videos — either pre-recorded or streaming — to change the lighting, details in the background, or across the whole of the screen, and an app that works across various video platforms to apply those filters.

Like mmhmm, Memix is today focused on building tools that you use on existing video platforms — not building a video player itself. Memix today comes in the form of a virtual camera, accessible via Windows apps for Zoom, WebEx and Microsoft Teams; or web apps like Facebook Messenger, Houseparty and others that run on Chrome, Edge and Firefox.

Libin said in an interview that the plan will be to keep that virtual camera operating as is while it works on integrating the filters and Memix’s technology into mmhmm, while also laying the groundwork for building more on top of the platform.

Libin’s view is that while there are already a lot of video products and users in the market today, we are just at the start of it all, with technology and our expectations changing rapidly. We are shifting, he said, from wanting to reproduce existing experiences (like meetings) to creating completely new ones that might actually be better.

“There is a profound change in the world that we are just at the beginning of,” he said in an interview. “The main thing is that everything is hybrid. If you imagine all the experiences we can have, from in person to online, or recorded to live, up to now almost everything in life fit neatly into one of those quadrants. The boundaries were fixed. Now all these boundaries have melted away we can rebuild every experience to be natively hybrid. This is a monumental change.”

That is a concept that the Memix founders have not just been thinking about, but also building the software to make it a reality.

“There is a lot to do,” said Pol Jeremias-Vila, one of the co-founders. “One of our ideas was to try to provide people who do streaming professionally an alternative to the really complicated set-ups you currently use,” which can involve expensive cameras, lights, microphones, stands and more. “Can we bring that to a user just with a couple of clicks? What can be done to put the same kind of tech you get with all that hardware into the hands of a massive audience?”

Memix’s team of two — co-founders Inigo Quilez and Jeremias-Vila, Spaniards who met not in Spain but the Bay Area — are not coming on board full-time, but they will be helping with the transition and integration of the tech.

Libin said that he first became aware of Quilez from a YouTube video he’d posted on “The principles of painting with maths”, but that doesn’t give a lot away about the two co-founders. They are in reality graphic engineering whizzes, with Jeremias-Vila currently the lead graphics software engineer at Pixar, and Quilez until last year a product manager and lead engineer at Facebook, where he created, among other things, the Quill VR animation and production tool for Oculus.

Because working the kind of hours that people put in at tech companies wasn’t quite enough time to work on graphics applications, the pair started another effort called Beauty Pi (not to be confused with Beauty Pie), which has become a home for various collaborations between the two that had nothing to do with their day jobs. Memix had been bootstrapped by the pair as a project built out of that. And other efforts have included Shadertoy, a community and platform for creating Shaders (a computer program created to shade in 3D scenes).

That background of Memix points to an interesting opportunity in the world of video right now. In part because of all the focus (sorry not sorry!) on video right now as a medium because of our current pandemic circumstances, but also because of the advances in broadband, devices, apps and video technology, we’re seeing a huge proliferation of startups building interesting variations and improvements on the basic concept of video streaming.

Just in the area of videoconferencing alone, some of the hopefuls have included Headroom, which launched the other week with a really interesting AI-based approach to helping its users get more meaningful notes from meetings, and using computer vision to help presenters “read the room” better by detecting if people are getting bored, annoyed and more.

Vowel is also bringing a new set of tools not just to annotate meetings and their corresponding transcriptions in a better way, but to then be able to search across all your sessions to follow up items and dig into what people said over multiple events.

And Descript, which originally built a tool to edit audio tracks, earlier this week launched a video component, letting users edit visuals and what you say in those moving pictures, by cutting, pasting and rewriting a word-based document transcribing the sound from that video. All of these have obvious B2B angles, like mmhmm, and they are just the tip of the iceberg.

Indeed, the huge amount of IP out there is interesting in itself. Yet the jury is still out on where all of it would best live and thrive as the space continues to evolve, with more defined business models (and leading companies) only now emerging.

That presents an interesting opportunity not just for the biggies like Zoom, Google and Microsoft, but also players who are building entirely new platfroms from the ground up.

mmhmm is a notable company in that context. Not only does it have the reputation and inspiration of Libin behind it — a force powerful enough that even his foray into the ill-fated world of chatbots got headlines — but it’s also backed by the likes of Sequoia, which led a $21 million round earlier this month.

Libin said he doesn’t like to think of his startup as a consolidator, or the industry in a consolidation play, as that implies a degree of maturity in an area that he still feels is just getting started.

“We’re looking at this not so much consolidation, which to me means marketshare,” he said. “Our main criteria is that we wanted to work with teams that we are in love with.”

#computer-vision, #exit, #filters, #graphics, #media, #memix, #mmhmm, #phil-libin, #startups, #tc, #video, #videoconferencing, #virtual-reality

0

Tiliter bags $7.5M for its ‘plug and play’ cashierless checkout tech

Tiliter, an Australian startup that’s using computer vision to power cashierless checkout tech that replaces the need for barcodes on products, has closed a $7.5 million Series A round of funding led by Investec Emerging Companies.

The 2017-founded company is using AI for retail product recognition — claiming advantages such as removing the need for retail staff to manually identify loose items that don’t have a barcode (e.g. fresh fruit or baked goods), as well as reductions in packaging waste.

It also argues the AI-based product recognition system reduces incorrect product selections (either intentional or accidental).

“Some objects simply don’t have barcodes which causes a slow and poor experience of manual identification,” says co-founder and CEO Martin Karafilis. “This is items like bulk items, fresh produce, bakery pieces, mix and match etc. Sometimes barcodes are not visible or can be damaged.

“Most importantly there is an enormous amount of plastic created in the world for barcodes and identification packaging. With this technology we are able to dramatically decrease and, in some cases, eliminate single use plastic for retailers.”

Currently the team is focused on the supermarket vertical — and claims over 99% accuracy in under one second for its product identification system.

It’s developed hardware that can be added to existing checkouts to run the computer vision system — with the aim of offering retailers a “plug and play” cashierless solution.

Marketing text on its website adds of its AI software: “We use our own data and don’t collect any in-store. It works with bags, and can tell even the hardest sub-categories apart such as Truss, Roma, and Gourmet tomatoes or Red Delicious, Royal Gala and Pink Lady apples. It can also differentiate between organic and non-organic produce [by detecting certain identification indicators that retailers may use for organic items].”

“We use our pre-trained software,” says Karafilis when asked whether there’s a need for a training period to adapt the system to a retailer’s inventory. “We have focused on creating a versatile and scalable software solution that works for all retailers out of the box. In the instance an item isn’t in the software it can be collected by the supermarket in approx 20min and has self-learning capabilities.”

As well as a claim of easy installation, given the hardware can bolt onto existing retail IT, Tiliter touts lower cost than “currently offered autonomous store solutions”. (Amazon is one notable competitor on that front.)

It sells the hardware outright, charging a yearly subscription fee for the software (this includes a pledge of 24/7 global service and support).

“We provide proprietary hardware (camera and processor) that can be retrofitted to any existing checkout, scale or point of sale system at a low cost integrating our vision software with the point of sale,” says Karafilis, adding that the pandemic is driving demand for easy to implement cashierless tech.

The startup cites a 300% increase in ‘scan and go’ adoption in the US over the past year due to COVID-19, as an example, adding that further global growth is expected.

It’s not breaking out customer numbers at this stage — but early adopters for its AI-powered product recognition system include Woolworths in Australia with over 20 live stores; Countdown in New Zealand, and several retail chains in the US such as New York City’s Westside Market.

The Series A funding will go on accelerating expansion across Europe and the US — with “many” supermarkets set to be adopt its tech over the coming months.

#ai, #artificial-intelligence, #asia, #cashierless-tech, #computer-vision, #investec-emerging-companies, #retail, #tiliter

0

Pimloc gets $1.8M for its AI-based visual search and redaction tool

UK-based Pimloc has closed a £1.4 million (~$1.8M) seed funding round led by Amadeus Capital Partners. Existing investor Speedinvest and other unnamed shareholders also participated in the round.

The 2016-founded computer vision startup launched a AI -powered photo classifier service called Pholio in 2017 — pitching the service as a way for smartphone users to reclaim agency over their digital memories without having to hand their data over to cloud giants like Google.

It has since pivoted to position Pholio as a “specialist search and discovery platform” for large image and video collections and live streams (such as those owned by art galleries or broadcasters) — and also launched a second tool powered by its deep learning platform. This product, Secure Redact, offers privacy-focused content moderation tools — enabling its users to find and redact personal data in visual content.

An example use-case it gives is for law enforcement to anonymize bodycam footage so it can be repurposed for training videos or prepared for submitting as evidence.

Pimloc has been working with diverse image and video content for several years supporting businesses with a host of classification, moderation and data protection challenges (image libraries, art galleries, broadcasters and CCTV providers),” CEO Simon Randall tells TechCrunch.

“Through our work on the visual privacy side we identified a critical gap in the market for services that allow businesses and governments to manage visual data protection at scale on security footage. Pimloc has worked in this area for a couple of years building capability and product, as a result Pimloc has now focussed the business solely around this mission.”

Secure Redact has two components: A first (automated) step that detects personal data (e.g. faces, heads, bodies) within video content. On top of that is what Randall calls a layer of “intelligent tools” — letting users quickly review and edit results.

“All detections and tracks are auditable and editable by users prior to accepting and redacting,” he explains, adding: “Personal data extends wider than just faces into other objects and scene content including ID cards, tattoos, phone screens (body worn cameras have a habit of picking up messages on the wearer’s phone screen as they are typing, or sensitive notes on their laptop or notebook).”

One specific user of redaction the tool he mentions is the University of Bristol. There a research group, led by Dr Dima Damen, an associate professor in computer vision, is participating in an international consortium of 12 universities which is aiming to amass the largest dataset on egocentric vision — and needs to be able to anonymise the video data set before making it available for academic/open source use.

On the legal side, Randall says Pimloc offers a range of data processing models — thereby catering to differences in how/where data can be processed. “Some customers are happy for Pimloc to act as data processor and use the Secure Redact SaaS solution — they manage their account, they upload footage, and can review/edit/update detections prior to redaction and usage. Some customers run the Secure Redact system on their servers where they are both data controller and processor,” he notes.

“We have over 100 users signed up for the SaaS service covering mobility, entertainment, insurance, health and security. We are also in the process of setting up a host of on-premise implementations,” he adds.

Asked which sectors Pimloc sees driving the most growth for its platform in the coming years, he lists the following: smart cities/mobility platforms (with safety/analytics demand coming from the likes of councils, retailers, AVs); the insurance industry, which he notes is “capturing and using an increasing amount of visual data for claims and risk monitoring” and thus “looking at responsible systems for data management and processing”; video/telehealth, with traditional consultations moving into video and driving demand for visual diagnosis; and law enforcement, where security goals need to be supported by “visual privacy designed in by default” (at least where forces are subject to European data protection law).

On the competitive front, he notes that startups are increasingly focusing on specialist application areas for AI — arguing they have an opportunity to build compelling end-to-end propositions which are harder for larger tech companies to focus on.

For Pimlock specifically he argues it has an edge in its particular security-focused niche — given “deep expertise” and specific domain experience.

“There are low barriers to entry to create a low quality product but very high technical barriers to create a service that is good enough to use at scale with real ‘in the wild’ footage,” he argues, adding: The generalist services of the larger tech players do not match-up with domain specific provisions of Pimloc/Secure Redact. Video security footage is a difficult domain for AI, systems trained on lifestyle/celebrity or other general data sets perform poorly on real security footage.”

Commenting on the seed funding in a statement, Alex van Someren, MD of Amadeus Capital Partners, said: “There is a critical need for privacy by design and large-scale solutions, as video grows as a data source for mobility, insurance, commerce and smart cities, while our reliance on video for remote working increases. We are very excited about the potential of Pimloc’s products to meet this challenge.”

“Consumers around the world are rightfully concerned with how enterprises are handling the growing volume of visual data being captured 24/7. We believe Pimloc has developed an industry leading approach to visual security and privacy that will allow businesses and governments to manage the usage of visual data whilst protecting consumers. We are excited to support their vision as they expand into the wider Enterprise and SaaS markets,” added Rick Hao, principal at Speedinvest, in another supporting statement.

#ai, #amadeus-capital-partners, #artificial-intelligence, #computer-vision, #pimloc, #privacy, #recent-funding, #startups, #visual-search

0

Microsoft and partners aim to shrink the ‘data desert’ limiting accessible AI

AI-based tools like computer vision and voice interfaces have the potential to be life-changing for people with disabilities, but the truth is those AI models are usually built with very little data sourced from those people. Microsoft is working with several nonprofit partners to help make these tools reflect the needs and everyday realities of people living with conditions like blindness and limited mobility.

Consider for example a computer vision system that recognizes objects and can describe what is, for example, on a table. Chances are that algorithm was trained with data collected by able people, from their point of view — likely standing.

A person in a wheelchair looking to do the same thing might find the system isn’t nearly as effective from that lower angle. Similarly a blind person will not know to hold the camera in the right position for long enough for the algorithm to do its work, so they must do so by trial and error.

Or consider a face recognition algorithm that’s meant to tell when you’re paying attention to the screen for some metric or another. What’s the likelihood that among the faces used to train that system, any significant amount have things like a ventilator, or a puff-and-blow controller, or a headstrap obscuring part of it? These “confounders” can significantly affect accuracy if the system has never seen anything like them.

Facial recognition software that fails on people with dark skin, or has lower accuracy on women, is a common example of this sort of “garbage in, garbage out.” Less commonly discussed but no less important is the visual representation of people with disabilities, or of their point of view.

Microsoft today announced a handful of efforts co-led by advocacy organizations that hope to do something about this “data desert” limiting the inclusivity of AI.

The first is a collaboration with Team Gleason, an organization formed to improve awareness around the neuromotor degenerative disease amyotrophic lateral sclerosis, or ALS (it’s named after former NFL star Steve Gleason, who was diagnosed with the disease some years back).

Their concern is the one above regarding facial recognition. People living with ALS have a huge variety of symptoms and assistive technologies, and those can interfere with algorithms that have never seen them before. That becomes an issue if, for example, a company wanted to ship gaze tracking software that relied on face recognition, as Microsoft would surely like to do.

“Computer vision and machine learning don’t represent the use cases and looks of people with ALS and other conditions,” said Team Gleason’s Blair Casey. “Everybody’s situation is different and the way they use technology is different. People find the most creative ways to be efficient and comfortable.”

Project Insight is the name of a new joint effort with Microsoft that will collect face imagery of volunteer users with ALS as they go about their business. In time that face data will be integrated with Microsoft’s existing cognitive services, but also released freely so others can improve their own algorithms with it.

They aim to have a release in late 2021. If the timeframe seems a little long, Microsoft’s Mary Bellard, from the company’s AI for Accessibility effort, pointed out that they’re basically starting from scratch and getting it right is important.

“Research leads to insights, insights lead to models that engineers bring into products. But we have to have data to make it accurate enough to be in a product in the first place,” she said. “The data will be shared — for sure this is not about making any one product better, it’s about accelerating research around these complex opportunities. And that’s work we don’t want to do alone.”

Another opportunity for improvement is in sourcing images from users who don’t use an app the same way as most. Like the person with impaired vision or in a wheelchair mentioned above, there’s a want of data from their perspective. There are two efforts aiming to address this.

Images taken by people needing objects in them to be identified or located.

Image Credits: ORBIT

One with City University of London is the expansion and eventual public release of the Object Recognition for Blind Image Training project, which is assembling a dataset for everyday for identifying everyday objects — a can of pop, a keyring — using a smartphone camera. Unlike other datasets, though, this will be sourced entirely from blind users, meaning the algorithm will learn from the start to work with the kind of data it will be given later anyway.

AI captioned images

Image Credits: Microsoft

The other is an expansion of VizWiz to better encompass this kind of data. The tool is used by people who need help right away in telling, say, whether a cup of yogurt is expired or if there’s a car in the driveway. Microsoft worked with the app’s creator, Danna Gurari, to improve the app’s existing database of tens of thousands of images with associated questions and captions. They’re also working to alert a user when their image is too dark or blurry to analyze or submit.

Inclusivity is complex because it’s about people and systems that, perhaps without even realizing it, define “normal” and then don’t work outside of those norms. If AI is going to be inclusive, “normal” needs to be redefined and that’s going to take a lot of hard work. Until recently, people weren’t even talking about it. But that’s changing.

“This is stuff the ALS community wanted years ago,” said Casey. “This is technology that exists — it’s sitting on a shelf. Let’s put it to use. When we talk about it, people will do more, and that’s something the community needs as a whole.”

#accessibility, #als, #artificial-intelligence, #computer-vision, #disabilities, #face-recognition, #facial-recognition, #microsoft, #tc, #team-gleason

0

Headroom, which uses AI to supercharge videoconferencing, raises $5M

Videoconferencing has become a cornerstone of how many of us work these days — so much so that one leading service, Zoom, has graduated into verb status because of how much it’s getting used.

But does that mean videoconferencing works as well as it should? Today, a new startup called Headroom is coming out of stealth, tapping into a battery of AI tools — computer vision, natural language processing and more — on the belief that the answer to that question is a clear — no bad WiFi interruption here — “no.”

Headroom not only hosts videoconferences, but then provides transcripts, summaries with highlights, gesture recognition, optimised video quality, and more, and today it’s announcing that it has raised a seed round of $5 million as it gears up to launch its freemium service into the world.

You can sign up to the waitlist to pilot it, and get other updates here.

The funding is coming from Anna Patterson of Gradient Ventures (Google’s AI venture fund); Evan Nisselson of LDV Capital (a specialist VC backing companies buidling visual technologies); Yahoo founder Jerry Yang, now of AME Cloud Ventures; Ash Patel of Morado Ventures; Anthony Goldbloom, the cofounder and CEO of Kaggle.com; and Serge Belongie, Cornell Tech associate dean and Professor of Computer Vision and Machine Learning.

It’s an interesting group of backers, but that might be because the founders themselves have a pretty illustrious background with years of experience using some of the most cutting-edge visual technologies to build other consumer and enterprise services.

Julian Green — a British transplant — was most recently at Google, where he ran the company’s computer vision products, including the Cloud Vision API that was launched under his watch. He came to Google by way of its acquisition of his previous startup Jetpac, which used deep learning and other AI tools to analyze photos to make travel recommendations. In a previous life, he was one of the co-founders of Houzz, another kind of platform that hinges on visual interactivity.

Russian-born Andrew Rabinovich, meanwhile, spent the last five years at Magic Leap, where he was the head of AI, and before that, the director of deep learning and the head of engineering. Before that, he too was at Google, as a software engineer specializing in computer vision and machine learning.

You might think that leaving their jobs to build an improved videoconferencing service was an opportunistic move, given the huge surge of use that the medium has had this year. Green, however, tells me that they came up with the idea and started building it at the end of 2019, when the term “Covid-19” didn’t even exist.

“But it certainly has made this a more interesting area,” he quipped, adding that it did make raising money significantly easier, too. (The round closed in July, he said.)

Given that Magic Leap had long been in limbo — AR and VR have proven to be incredibly tough to build businesses around, especially in the short- to medium-term, even for a startup with hundreds of millions of dollars in VC backing — and could have probably used some more interesting ideas to pivot to; and that Google is Google, with everything tech having an endpoint in Mountain View, it’s also curious that the pair decided to strike out on their own to build Headroom rather than pitch building the tech at their respective previous employers.

Green said the reasons were two-fold. The first has to do with the efficiency of building something when you are small. “I enjoy moving at startup speed,” he said.

And the second has to do with the challenges of building things on legacy platforms versus fresh, from the ground up.

“Google can do anything it wants,” he replied when I asked why he didn’t think of bringing these ideas to the team working on Meet (or Hangouts if you’re a non-business user). “But to run real-time AI on video conferencing, you need to build for that from the start. We started with that assumption,” he said.

All the same, the reasons why Headroom are interesting are also likely going to be the ones that will pose big challenges for it. The new ubiquity (and our present lives working at home) might make us more open to using video calling, but for better or worse, we’re all also now pretty used to what we already use. And for many companies, they’ve now paid up as premium users to one service or another, so they may be reluctant to try out new and less-tested platforms.

But as we’ve seen in tech so many times, sometimes it pays to be a late mover, and the early movers are not always the winners.

The first iteration of Headroom will include features that will automatically take transcripts of the whole conversation, with the ability to use the video replay to edit the transcript if something has gone awry; offer a summary of the key points that are made during the call; and identify gestures to help shift the conversation.

And Green tells me that they are already also working on features that will be added into future iterations. When the videoconference uses supplementary presentation materials, those can also be processed by the engine for highlights and transcription too.

And another feature will optimize the pixels that you see for much better video quality, which should come in especially handy when you or the person/people you are talking to are on poor connections.

“You can understand where and what the pixels are in a video conference and send the right ones,” he explained. “Most of what you see of me and my background is not changing, so those don’t need to be sent all the time.”

All of this taps into some of the more interesting aspects of sophisticated computer vision and natural language algorithms. Creating a summary, for example, relies on technology that is able to suss out not just what you are saying, but what are the most important parts of what you or someone else is saying.

And if you’ve ever been on a videocall and found it hard to make it clear you’ve wanted to say something, without straight-out interrupting the speaker, you’ll understand why gestures might be very useful.

But they can also come in handy if a speaker wants to know if he or she is losing the attention of the audience: the same tech that Headroom is using to detect gestures for people keen to speak up can also be used to detect when they are getting bored or annoyed and pass that information on to the person doing the talking.

“It’s about helping with EQ,” he said, with what I’m sure was a little bit of his tongue in his cheek, but then again we were on a Google Meet, and I may have misread that.

And that brings us to why Headroom is tapping into an interesting opportunity. At their best, when they work, tools like these not only supercharge videoconferences, but they have the potential to solve some of the problems you may have come up against in face-to-face meetings, too. Building software that actually might be better than the “real thing” is one way of making sure that it can have staying power beyond the demands of our current circumstances (which hopefully won’t be permanent circumstances).

#artificial-intelligence, #computer-vision, #enterprise, #google, #headroom, #natural-language-processing, #nlp, #recent-funding, #startups, #tc, #videoconferencing

0

There’s a way to pick the absolute best images for your content: Apply AI

Most marketers believe there’s a lot of value in having relevant, engaging images featured in content.

But selecting the “right” images for blog posts, social media posts or video thumbnails has historically been a subjective process. Social media and SEO gurus have a slew of advice on picking the right images, but this advice typically lacks real empirical data.

This got me thinking: Is there a data-driven — or even better, an AI-driven — process for gaining deeper insight into which images are more likely to perform well (aka more likely to garner human attention and sharing behavior)?

The technique for finding optimal photos

In July of 2019, a fascinating new machine learning paper called “Intrinsic Image Popularity Assessment” was published. This new model has found a reliable way to predict an image’s likely “popularity” (estimation of likelihood the image will get a like on Instagram).

It also showed an ability to outperform humans, with a 76.65% accuracy on predicting how many likes an Instagram photo would garner versus a human accuracy of 72.40%.

I used the model and source code from this paper to come up with how marketers can improve their chances of selecting images that will have the best impact on their content.

Finding the best screen caps to use for a video

One of the most important aspects of video optimization is the choice of the video’s thumbnail.

According to Google, 90% of the top performing videos on the platform use a custom selected image. Click-through rates, and ultimately view counts, can be greatly influenced by how eye-catching a video title and thumbnail are to a searcher,

In recent years, Google has applied AI to automate video thumbnail extraction, attempting to help users find thumbnails from their videos that are more likely to attract attention and click-throughs.

Unfortunately, with only three provided options to choose from, it’s unlikely the thumbnails Google currently recommends are the best thumbnails for any given video.

That’s where AI comes in.

With some simple code, it’s possible to run the “intrinsic popularity score” (as derived by a model similar to the one discussed in this article) against all of the individual frames of a video, providing a much wider range of options.

The code to do this is available here. This script downloads a YouTube video, splits it into frames as .jpg images, and runs the model on each image, providing a predicted popularity score for each frame image.
Caveat: It is important to remember that this model was trained and tested on Instagram images. Given the similarity in behavior for clicking on an Instagram photo or a YouTube thumbnail, we feel it’s likely (though never tested) that if a thumbnail is predicted to do well as an Instagram photo, it will similarly do well as a YouTube video thumbnail.

Let’s look at an example of how this works.

 

thumbnail from youtube video with housebuilding couple

Current thumbnail. Image Credits: YouTube (opens in a new window)

 

We had the intrinsic popularity model look at three frames per second of this 23-minute video. It took about 20 minutes. The following were my favorites from the 20 images that had the highest overall scores.

#advertising-tech, #artificial-intelligence, #column, #computer-vision, #content-marketing, #growth-marketing, #machine-learning, #photography, #tc

0

On lying AIs

A yellow-eyed cat tilts its eyes at the camera, gazing up from a grey bedspread. ‘London Trip’, is the AI’s title for this photo-montage ‘Memory’ plucked from the depths of my iPhone camera-roll. It’s selected a sad score of plinking piano and sweeping violin. The algorithm has calculated it must tug at the heart strings. 

Cut to a crop of a desk with a 2FA device resting on a laptop case. It’s not at all photogenic. On to a shot of a sofa in a living room. It’s empty. The camera inclines toward a radio on a sidetable. Should we be worried for the invisible occupant? The staging invites cryptic questions.

Cut to an outdoor scene: A massive tree spreading above a wrought iron park fence. Another overcast day in the city. Beside it an eccentric shock of orange. A piece of public art? A glass-blown installation? There’s no time to investigate or interrogate. The AI is moving on. There’s more data clogging its banks. 

Cut to a conference speaker. White, male, besuited, he’s gesticulating against a navy wall stamped with some kind of insignia. The photo is low quality, snapped in haste from the audience, details too fuzzy to pick out. Still, the camera lingers, panning across the tedious vista. A wider angle shows conference signage for something called ‘Health X’. This long distant press event rings a dim bell. Another unlovely crop: My voice recorder beside a brick wall next to an iced coffee. I guess I’m working from a coffee shop.

On we go. A snap through a window-frame of a well kept garden, a bird-bath sprouting from low bushes. Another shot of the shrubbery shows a ladder laid out along a brick wall. I think it looks like a church garden in Southwark but I honestly can’t tell. No matter. The AI has lost interest. Now it’s obsessing over a billboard of a Google Play ad: “All the tracks you own and millions more to discover — Try it now for free,” the text reads above a weathered JCDecaux brand stamp.

There’s no time to consider what any of this means because suddenly it’s nighttime. It must be; my bedside lamp is lit. Or is it? Now we’re back on the living room sofa with daylight and a book called ‘Nikolski’ (which is also, as it happens, about separation and connection and random artefacts — although its artful narrative succeeds in serendipity).

Cut to a handful of berries in a cup. Cut to an exotic-looking wallflower which I know grows in the neighbourhood. The score is really soaring now. A lilting female vocal lands on cue to accompany a solitary selfie.

I am looking unimpressed. I have so many questions. 

The AI isn’t quite finished. For the finale: A poorly framed crop of a garden fence and a patio of pot plants, washing weeping behind the foliage. The music is fading, the machine is almost done constructing its London trip. The last shot gets thrust into view: Someone’s hand clasping a half-drunk punch. 

Go home algorithm, you’re drunk.

Footnote: Apple says on-device machine learning powers iOS’ “intelligent photos experience” which “analyzes every 
photo in a user’s photo library using on-device machine learning [to] deliver 
a personalized experience for each user” — with the advanced processing slated to include scene classification, composition analysis, people and pets identification, quality analysis and identification of facial expressions

#artificial-intelligence, #computer-vision, #ios, #iphone, #machine-learning, #photos

0

A Car Insurance Claim Estimate Before the Tow Truck Is Called

Using algorithms, tech companies are helping insurers speed up the process after an accident, make it more accurate and keep estimators out of the field — a plus during a pandemic.

#artificial-intelligence, #automobile-insurance-and-liability, #computer-vision, #dalyac-alex, #mobile-applications, #tractable-ltd, #traffic-accidents-and-safety

0

‘Impossible Objects’ That Reveal a Hidden Power

The artist Trevor Paglen peers into the history of photography and its relationship to state surveillance.

#art, #carnegie-museum-of-art, #computer-security, #computer-vision, #opposing-geometries-exhibit, #paglen-trevor, #photography, #pittsburgh-pa, #privacy, #surveillance-of-citizens-by-government

0

Deep Science: Dog detectors, Mars mappers and AI-scrambling sweaters

Research papers come out at far too rapid a rate for anyone to read them all, especially in the field of machine learning, which now affects (and produces papers in) practically every industry and company. This column aims to collect the most relevant recent discoveries and papers, particularly in but not limited to artificial intelligence, and explain why they matter.

This week in Deep Science spans the stars all the way down to human anatomy, with research concerning exoplanets and Mars exploration, as well as understanding the subtlest habits and most hidden parts of the body.

Let’s proceed in order of distance from Earth. First is the confirmation of 50 new exoplanets by researchers at the University of Warwick. It’s important to distinguish this process from discovering exoplanets among the huge volumes of data collected by various satellites. These planets were flagged as candidates but no one has had the chance to say whether the data is conclusive. The team built on previous work that ranked planet candidates from least to most likely, creating a machine learning agent that could make precise statistical assessments and say with conviction, here is a planet.

“A prime example when the additional computational complexity of probabilistic methods pays off significantly,” said the university’s Theo Damoulas. It’s an excellent example of a field where marquee announcements, like the Google-powered discovery of Kepler-90 i, represent only the earliest results rather than a final destination, emphasizing the need for further study.

In our own solar system, we are getting to know our neighbor Mars quite well, though even the Perseverance rover, currently hurtling through the void in the direction of the red planet, is like its predecessors a very resource-limited platform. With a small power budget and years-old radiation-hardened CPUs, there’s only so much in the way of image analysis and other AI-type work it can do locally. But scientists are preparing for when a new generation of more powerful, efficient chips makes it to Mars.

#artificial-intelligence, #communications-satellites, #computer-vision, #emerging-technologies, #gadgets, #lab-wrap, #lidar, #machine-learning, #robotics, #science, #science-and-technology, #space, #tc

0

Report: Apple quietly acquired Israel’s Camerai, formerly Tipit, a specialist in AR and camera tech

Apple is well known for picking up smaller startups on the hush-hush to augment its business, and today news leaked out about the latest of these… nearly two years after the fact. Sometime between 2018 and 2019, the iPhone giant reportedly acquired and shut down Camerai, an augmented reality and computer vision company based out of Israel, which used to be called Tipit.

The news was first reported earlier today by Israeli newspaper Calcalist, and we have reached out to ask Apple directly about it. In the meantime, Jonathan (Yehonatan) Rimon, who had been Camerai’s CEO and co-founded the company with Moty Kosharovsky, Erez Tal, and Aaron Wetzler, declined to comment one way or the other on the report to us when we contacted him directly about it. A separate source confirmed the story to us. We’ll update as we learn more.

Calcalist said that the startup sold for several tens of millions of dollars. From being founded in 2015, Camerai had raised around $5 million — including a $2.5 million round in 2017 and another unreported $2.5 million in 2018 — with investors including the Atooro Fund and another called the SKO Fund.

It seems that the acquisition came on the heels of multiple approaches from a number of companies at a time when AR was arguably at a peak of hype and many big tech companies wanted a piece of the action. (Recall that 2018 was the year when Magic Leap raised nearly $1 billion in a single round of funding.) Back in 2018, we heard rumors that those approaching and looking at the startup included Apple, Samsung, and Alibaba.

The Calcalist report said that Camerai employees joined Apple’s computer vision team, and that the company’s technology has been incorporated into Apple products already. It’s not clear specifically where and when, but recall that both iOS 13 and iOS 14 have featured big software updates to the camera.

Camerai had built an SDK and specifically a range of software-based AR tools to help edit and use camera-made images in more sophisticated ways,

Its tech included the ability to detect different objects in the picture, and outline them with precision to alter them cosmetically; the ability to outline and apply filters across the whole image; a “skeleton tracking” neural network API that could detect and draw body joints in real time overlaid on a picture of a human; and its own version of selective focus for enhanced portrait modes (remember this was 2018 and this was not standard on phones at the time). Camerai’s site is shut down, but here are some screenshots of how it all looked, pulled from the Internet Archive:

Camerai’s acquisition underscores a couple of interesting, and ongoing, trends.

The first of these is in the development of smartphone technology, particularly around cameras. Some of the more interesting innovations in smartphone camera technology have come not out of improvements in hardware, but software, where the application of breakthroughs in artificial intelligence can mean that an existing combination of sensor, lens, and on-phone and cloud processors produce a better and more technically dynamic picture than before.

At a time when smartphone replacement cycles have really slowed down and we are seeing also slower innovation on hardware, bolting on talent and tech created outside the phone companies is one way to gain a competitive edge.

(Separately, I wonder if making cutting edge technology software-based also means that there could be scope in the future for paid updates to older phone models, which could mean more incremental revenues from consumers that don’t want to invest incompletely new devices.)

The second trend that this deal underscores is how Israel remains fertile ground for bigger companies on the hunt to pick up and bolt on technology, and that the secretive approach is likely to remain for some time to come.

“In Israel there are over 350 global corporate companies, from 30 countries, who search for local innovation. Some of them like Apple, MS, Google, even have local R&D [operations],” said Avihai Michaeli, a Tel Aviv-based senior investment banker and startup advisor. “Those global companies look mainly for tech which could serve as its competitive edge. It is not the first time that an acquired startup is asked not to publish it was acquired, nor talk about it.”

Other acquisitions that Apple has made in Israel have included camera module maker LinX, semiconductor startup Anobit, and 3D sensor company PrimeSense.

We’ll update this post as we learn more.

#apple, #ar, #computer-vision, #europe, #ma, #mobile

0

Instrumental raises $20M to scale its AI-powered manufacturing tech

This morning Instrumental, a startup that uses vision-powered AI to detect manufacturing anomalies, announced that it has closed a $20 million Series B led by Canaan Partners. The company had previously raised $10.3 million across two rounds, including a $7.5 million Series A in mid-2017.

According to a release, the Series B was participated in by other venture groups, including Series A investors Root Ventures, Eclipse Ventures, and First Round Capital, which also led its Seed round. Stanford StartX also took part in the new investment.

Anna-Katrina Shedletsky, via the company.

Instrumental’s technology is a hybrid of hardware and software, with a focus on the latter.

TechCrunch caught up with the company’s founder and CEO Anna-Katrina Shedletsky to better understand its tech, and its business model. And we asked participating Canaan partner Hrach Simonian about the business metrics and milestones that led him to lead the deal.

Tech

Instrumental’s tech is a combination of cameras and code. The startup installs its hardware on manufacturing lines, employing learning software to ferret out anomalies using data from small sample sets. The company’s goal to help other businesses that build physical products boost yields and save time.

“Our customers identify design, quality, and process issues weeks faster than their competitors and get to mass production with a higher quality product that can be built for significantly less money,” she said in an email to TechCrunch.

According to Shedletsky, who previously worked at Apple in design and manufacturing capacities, Instrumental uses commodity hardware. The startup’s software is what matters, allowing its camera-units to train with as few as 30 units and simple labeling training. Notably, given the reduced-capacity Internet available at many manufacturing facilities in China where the company often works, its hardware can handle on-site data processing to prevent upload/download lag.

It’s not easy to get tech installed onto manufacturing lines, the company told TechCrunch, as it’s easy to get fired for stopping a production run. This can make it hard for companies like Instrumental to get their foot in the door.

Instrumental works around the problem by getting its tech installed on manufacturing lines when they are in pre-production development. If the startup can prove value there, its tech can be rolled out when the lines move from development to production. And, if Instrumental’s tech works with initial lines, it can be scaled across other manufacturing lines that are spun up, something called “replicating lines.”

Instrumental hardware unit, via the company.

The startup has two products: One for manufacturing lines in development, and one for those in production. Unlike enterprise software contracts that are often sold on a yearly-cadence, Instrumental’s manufacturing deals can ramp up based on volume through a process that its CEO called “continuous sale.”

The model allows the company to charge more, more quickly than an enterprise software contract waiting for its re-up period to allow for renegotiation, boosting how quickly Instrumental can grow its business,

Money

Flush with $20 million, what does Instrumental have planned? Shedletsky told TechCrunch that her first goal is to expand its business in the electronics space, a part of the manufacturing world where the startup has seen initial customer traction.

To support that effort, Instrumental is building out its go-to-market functionality, and continuing to work on its core technology she said.

After living off its Series A for around twice as long as many venture-backed companies tend to, TechCrunch was curious how quickly Instrumental intends to deploy its larger Series B. According to its CEO, the startup plans on being principled, but not slow. She stressed that she’s working to build a long-term company, and that she wants to create something that is both sustainable, and large.

Lacking hard growth metrics, TechCrunch was curious what attracted Canaan to Instrumental at this juncture. According to the Hrach Simonian, a general partner at the firm, “Instrumental’s tools are quickly becoming a business imperative,” something that can be seen in its “renewal rates with big customers” he said, describing them as “extraordinarily high.”

Given the sheer scale of global electronics is enormous, giving Instrumental nearly infinite TAM to sell into. Let’s see how quickly the startup can grow.

#ai, #computer-vision, #fundings-exits, #instrumental, #startups, #tc

0

Imint: the Swedish firm that gives Chinese smartphones an edge in video production

If your phone takes amazing photos, chances are its camera has been augmented by artificial intelligence embedded in the operating system. Now videos are getting the same treatment.

In recent years, smartphone makers have been gradually transforming their cameras into devices that capture data for AI processing beyond what the lens and sensor pick up in a single shot. That effectively turns a smartphone into a professional camera on auto mode and lowers the bar of capturing compelling images and videos.

In an era of TikTok and vlogging, there’s a huge demand to easily produce professional-looking videos on the go. Like still images, videos shot on smartphones rely not just on the lens and sensor but also on enhancement algorithms. To some extent, those lines of codes are more critical than the hardware, argued Andreas Lifvendahl, founder and chief executive of Swedish company Imint, whose software now enhances video production in roughly 250 million devices — most of which come from Chinese manufacturers.

“[Smartphone makers] source different kinds of camera solutions — motion sensors, gyroscopes, and so on. But the real differentiator, I would say, is more on the software side,” Lifvendahl told TechCrunch over a phone call.

Smart video recording

Imint started life in 2007 as a spin-off academic research team from Uppsala University in Sweden. It spent the first few years building software for aerial surveillance, just as many cutting-edge innovations that find their first clients in the defense market. In 2013, Lifvendahl saw the coming of widespread smartphone adaptation and a huge opportunity to bring the same technology used in defense drones into the handsets in people’s pockets.

“Smartphone companies were investing a lot in camera technology and that was a clever move,” he recalled. “It was very hard to find features with a direct relationship to consumers in daily use, and the camera was one of those because people wanted to document their life.”

“But they were missing the point by focusing on megapixels and still images. Consumers wanted to express themselves in a nice fashion of using videos,” the founder added.

Source: Imint’s video enhancement software, Vidhance

The next February, the Swedish founder attended Mobile World Congress in Barcelona to gauge vendor interest. Many exhibitors were, unsurprisingly, Chinese phone makers scouring the conference for partners. They were immediately intrigued by Imint’s solution, and Lifvendahl returned home to set about tweaking his software for smartphones.

“I’ve never met this sort of open attitude to have a look so quickly, a clear signal that something is happening here with smartphones and cameras, and especially videos,” Lifvendahl said.

Vidhance, Imint’s video enhancement software suite mainly for Android, was soon released. In search of growth capital, the founder took the startup public on the Stockholm Stock Exchange at the end of 2015. The next year, Imint landed its first major account with Huawei, the Chinese telecoms equipment giant that was playing aggressive catch-up on smartphones at the time.

“It was a turning point for us because once we could work with Huawei, all the other guys thought, ‘Okay, these guys know what they are doing,’” the founder recalled. “And from there, we just grew and grew.”

Working with Chinese clients

The hyper-competitive nature of Chinese phone makers means they are easily sold on new technology that can help them stand out. The flipside is the intensity that comes with competition. The Chinese tech industry is both well-respected — and notorious — for its fast pace. Slow movers can be crushed in a matter of a few months.

“In some aspects, it’s very U.S.-like. It’s very straight to the point and very opportunistic,” Lifvendahl reflected on his experience with Chinese clients. “You can get an offer even in the first or second meeting, like, ‘Okay, this is interesting, if you can show that this works in our next product launch, which is due in three months. Would you set up a contract now?’”

“That’s a good side,” he continued. “The drawback for a Swedish company is the demand they have on suppliers. They want us to go on-site and offer support, and that’s hard for a small Swedish company. So we need to be really efficient, making good tools and have good support systems.”

The fast pace also permeates into the phone makers’ development cycle, which is not always good for innovation, suggested Lifvendahl. They are reacting to market trends, not thinking ahead of the curve — what Apple excels in — or conducting adequate market research.

Despite all the scrambling inside, Lifvendahl said he was surprised that Chinese manufacturers could “get such high-quality phones out.”

“They can launch one flagship, maybe take a weekend break, and then next Monday they are rushing for the next project, which is going to be released in three months. So there’s really no time to plan or prepare. You just dive into a project, so there would be a lot of loose ends that need to be tied up in four or five weeks. You are trying to tie hundreds of different pieces together with fifty different suppliers.”

High-end niche

Imint is one of those companies that thrive by finding a tough-to-crack niche. Competition certainly exists, often coming from large Japanese and Chinese companies. But there’s always a market for a smaller player who focuses on one thing and does it very well. The founder compares his company to a “little niche boutique in the corner, the hi-fi store with expensive speakers.” His competitors, on the other hand, are the Walmarts with thick catalogs of imaging software.

The focused strategy is what allows Imint’s software to enhance precision, reduce motion, track moving objects, auto-correct horizon, reduce noise, and enhance other aspects of a video in real-time — all through deep learning.

About three-quarters of Imint’s revenues come from licensing its proprietary software that does these tricks. Some clients pay royalties on the number of devices shipped that use Vidhance, while others opt for a flat annual fee. The rest of the income comes from licensing its development tools or SDK, and maintenance fees.

Imint now supplies its software to 20 clients around the world, including the Chinese big-four of Huawei, Xiaomi, Oppo and Vivo as well as chip giants like Qualcomm and Mediatek. ByteDance also has a deal to bake Imint’s software into Smartisan, which sold its core technology to the TikTok parent last year. Imint is beginning to look beyond handsets into other devices that can benefit from high-quality footage, from action cameras, consumer drones, through to body cameras for law enforcement.

So far, the Swedish company has been immune from the U.S.-China trade tensions, but Lifvendahl worried as the two superpowers move towards technological self-reliance, outsiders like itself will have a harder time entering the two respective markets.

“We are in a small, neutral country but also are a small company, so we’re not a strategic threat to anyone. We come in and help solve a puzzle,” assured the founder.

#artificial-intelligence, #asia, #china, #computer-vision, #hardware, #imint, #sweden, #tc, #tiktok, #video, #vidhance

0

OpenCV AI Kit aims to do for computer vision what Raspberry Pi did for hobbyist hardware

A new gadget called the OpenCV AI Kit, or OAK, looks to replicate the success of Raspberry Pi and other minimal computing solutions, but for the growing fields of computer vision and 3D perception. Its new multi-camera PCBs pack a lot of capability into a small, open-source unit and are now seeking funding on Kickstarter.

The OAK devices use their cameras and onboard AI chip to perform a number of computer vision tasks, like identifying objects, counting people, finding distances to and between things in frame, and more. This info is sent out in polished, ready-to-use form.

Having a reliable, low cost, low power draw computer vision unit like this is a great boon for anyone looking to build a smart device or robot that might have otherwise required several and discrete cameras and other chips (not to mention quite a bit of fiddling with software).

Image Credits: Luxonis

Like the Raspberry Pi, which has grown to become the first choice for hobbyist programmers dabbling in hardware, pretty much everything about these devices is open source on the permissive MIT license. And it’s officially affiliated with OpenCV, a widespread set of libraries and standards used in the computer vision world.

The actual device and onboard AI were created by Luxonis, which previously created the CommuteGuardian, a sort of smart brake light for bikes that tracks objects in real time so it can warn the rider. The team couldn’t find any hardware that fit the bill so they made their own, and then collaborated with OpenCV to make the OAK series as a follow-up.

There are actually two versions: The extra-small OAK-1 and triple-camera OAK-D. They share many components, but the OAK-D’s multiple camera units mean it can do true stereoscopic 3D vision rather than relying on other cues in the plain RGB image — these techniques are better now than ever but true stereo is still a big advantage. (The human vision system uses both, in case you’re wondering.)

The two OAK devices, with the world’s ugliest quarter for scale.

The idea was to unitize the computer vision system so there’s no need to build or configure it, which could help get a lot of projects off the ground faster. You can use the baked-in object and depth detection out of the box, or pick and choose the metadata you want and use it to augment your own analysis of the 4K (plus two 720p) images that also come through.

A very low power draw helps, too. Computer vision tasks can be fairly demanding on processors and thus use a lot of power, which was why a device like XNOR’s ultra-efficient chip was so promising (and why that company got snapped up by Apple). The OAK devices don’t take things to XNOR extremes but with a maximum power draw of a handful of watts, they could run on normal-sized batteries for days or weeks on end depending on their task.

The specifics will no doubt be interesting to those who know the ins and outs of such things — ports and cables and GitHub repositories and so on — but I won’t duplicate them here, as they’re all listed in orderly fashion in the campaign copy. Here’s the quick version: