CDC to regain control of US hospital data after Trump-era seizure, chaos

An older man in a business suit listens to a woman in a business suit.

Enlarge / Former president Donald Trump, right, listens to Deborah Birx, former coronavirus response coordinator, as she speaks during a news conference in the White House in Washington, DC, on Thursday, April 23, 2020. (credit: Getty | Bloomberg)

This December, the US Centers for Disease Control and Prevention will finally regain control of national COVID-19 hospital data—which the agency abruptly lost early in the pandemic to an inexperienced private company with ties to then-President Donald Trump.

As SARS-CoV-2 raged in the summer of 2020, the Trump administration was busy sabotaging the once-premier public health agency. The administration’s meddling included stripping the CDC of its power to collect critical data on COVID-19 patients and pandemic resources in hospitals around the country.

According to multiple investigative reports at the time, then-White House Coronavirus Task Force Coordinator Deborah Birx was frustrated by the CDC’s slow and somewhat messy process of collecting and tidying the data submitted by thousands of hospitals. The data included stats on admissions, patient demographics, bed availability, ventilator use, discharges, and personal protective equipment (PPE) supplies.

Read 9 remaining paragraphs | Comments

#birx, #cdc, #covid-19, #data, #data-collection, #hhs, #hospital, #infectious-disease, #pandemic, #science, #teletracking, #trump

Discs vs. data: Are we helping the environment by streaming?

Image of a pile of DVDs

Enlarge (credit: Getty Images)

Earth Day was April 22nd, and its usual message—take care of our planet—has been given added urgency by the challenges highlighted in the latest IPCC report. This year, Ars is taking a look at the technologies we normally cover, from cars to chipmaking, and finding out how we can boost their sustainability and minimize their climate impact.

Gone are the days of going to Blockbuster to pick out a film for a night in. Physical media like CDs, DVDs, Blu-ray discs, Sony’s weird PlayStation Portable UMDs, and countless other formats have been thoroughly dethroned thanks to a barrage of streaming services like Netflix—itself ailing at the moment—Amazon Prime, and Spotify.

For the first time in the past 17 years, CDs saw an increase in sales—of 1.1 percent, or 40.59 million units in 2021, compared to 40.16 million units the year prior. In 2021, people purchased 1.2 billion pieces of physical video media, compared to 6.1 billion a decade prior. Meanwhile, according to the Recording Industry Association of America, revenue from music streaming grew 13.4 percent to $10.1 billion in 2020.

Read 16 remaining paragraphs | Comments

#carbon, #cds, #climate, #data, #dvds, #emissions, #engineering, #film, #life-cycle-analysis, #movies, #music, #science, #tv

Study: Reducing poverty and climate goals aren’t at odds

Image of slum housing.

Enlarge / Eliminating extreme poverty won’t necessarily boost emissions as much as people fear. (credit: Soltan Frédéric)

The United Nations’ first Sustainable Development Goal (SDG) aims to eradicate poverty around the world. If implemented, however, it might see people consume more—drive more often, buy more products—and, thus, produce more carbon emissions, fueling climate change. “With more money to spend, and therefore more consumption, there is usually a higher carbon footprint,” Benedikt Bruckner, a master’s student of energy and environmental sciences at the University of Groningen, told Ars.

But it doesn’t necessarily have to be that way, according to a new study put out by Bruckner, other researchers out of Groningen, and colleagues in the United States and China.

Published in Nature, the research makes use of high-level data about consumption patterns to show that reaching SDG 1—which shoots to move every person out of extreme poverty (under $1.90 per day) and half of everyone above the poverty lines of their respective countries—won’t excessively fuel climate change.

Read 7 remaining paragraphs | Comments

#climate, #data, #poverty, #poverty-alleviation, #science, #world-bank

DataRobot CEO Dan Wright coming to TC Sessions: SaaS to discuss role of data in machine learning

Just about every company is sitting on vast amounts of data, which they can use to their advantage if they can just learn how to harness it. Data is actually the fuel for machine learning models, and with the proper tools, businesses can learn to process this data and build models to help them compete in a rapidly-changing marketplace, to react more quickly to shifting customer requirements and to find insights faster than any human ever possibly could.

Boston-based DataRobot, a late-stage startup that has built a platform to help companies navigate the machine learning model lifecycle, has been raising money by the bushel over the last several years including $206 million in September 2019 and another $300 million in July. DataRobot CEO Dan Wright will be joining us on a panel to discuss the role of data in business at TC Sessions: SaaS on October 27th.

The company covers the gamut of the machine learning lifecycle including preparing data, operationalizing it and finally building APIs to make it useful for the organization as it attempts to build a soup-to-nuts platform. DataRobot’s broad platform approach has appealed to investors.

As we wrote at the time of the $206 million round:

The company has been catching the attention of these investors by offering a machine learning platform aimed at analysts, developers and data scientists to help build predictive models much more quickly than it typically takes using traditional methodologies. Once built, the company provides a way to deliver the model in the form of an API, simplifying deployment.

DataRobot has raised a total of $1 billion on $6.3 billion post valuation, according to Pitchbook data and it’s been putting that money to work to add to its platform of services. Most recently the company acquired Algorithmia, which helps manage machine learning models.

As the pandemic has pushed more business online, companies are always looking for an edge and one way to achieve that is by taking advantage of AI and machine learning. Wright will be joined on the data panel by Monte Carlo co-founder and CEO Barr Moses and AgentSync co-founder and CTO Jenn Knight to discuss the growing role of data in business operations

In addition to our discussion with Wright, the conference will also include Microsoft’s Jared Spataro, Amplitude’s Olivia Rose, as well as investors Kobey Fuller and Laela Sturdy, among others. We hope you’ll join us. It’s going to be a thought-provoking lineup.

Buy your pass now to save up to $100. We can’t wait to see you in October!

#agentsync, #algorithmia, #artificial-intelligence, #boston, #ceo, #data, #datarobot, #enterprise, #jared-spataro, #jenn-knight, #machine-learning, #monte-carlo, #pitchbook-data, #tc, #tc-sessions-saas-2021

Companies betting on data must value people as much as AI

The Pareto principle, also known as the 80-20 rule, asserts that 80% of consequences come from 20% of causes, rendering the remainder way less impactful.

Those working with data may have heard a different rendition of the 80-20 rule: A data scientist spends 80% of their time at work cleaning up messy data as opposed to doing actual analysis or generating insights. Imagine a 30-minute drive expanded to two-and-a-half hours by traffic jams, and you’ll get the picture.

As tempting as it may be to think of a future where there is a machine learning model for every business process, we do not need to tread that far right now.

While most data scientists spend more than 20% of their time at work on actual analysis, they still have to waste countless hours turning a trove of messy data into a tidy dataset ready for analysis. This process can include removing duplicate data, making sure all entries are formatted correctly and doing other preparatory work.

On average, this workflow stage takes up about 45% of the total time, a recent Anaconda survey found. An earlier poll by CrowdFlower put the estimate at 60%, and many other surveys cite figures in this range.

None of this is to say data preparation is not important. “Garbage in, garbage out” is a well-known rule in computer science circles, and it applies to data science, too. In the best-case scenario, the script will just return an error, warning that it cannot calculate the average spending per client, because the entry for customer #1527 is formatted as text, not as a numeral. In the worst case, the company will act on insights that have little to do with reality.

The real question to ask here is whether re-formatting the data for customer #1527 is really the best way to use the time of a well-paid expert. The average data scientist is paid between $95,000 and $120,000 per year, according to various estimates. Having the employee on such pay focus on mind-numbing, non-expert tasks is a waste both of their time and the company’s money. Besides, real-world data has a lifespan, and if a dataset for a time-sensitive project takes too long to collect and process, it can be outdated before any analysis is done.

What’s more, companies’ quests for data often include wasting the time of non-data-focused personnel, with employees asked to help fetch or produce data instead of working on their regular responsibilities. More than half of the data being collected by companies is often not used at all, suggesting that the time of everyone involved in the collection has been wasted to produce nothing but operational delay and the associated losses.

The data that has been collected, on the other hand, is often only used by a designated data science team that is too overworked to go through everything that is available.

All for data, and data for all

The issues outlined here all play into the fact that save for the data pioneers like Google and Facebook, companies are still wrapping their heads around how to re-imagine themselves for the data-driven era. Data is pulled into huge databases and data scientists are left with a lot of cleaning to do, while others, whose time was wasted on helping fetch the data, do not benefit from it too often.

The truth is, we are still early when it comes to data transformation. The success of tech giants that put data at the core of their business models set off a spark that is only starting to take off. And even though the results are mixed for now, this is a sign that companies have yet to master thinking with data.

Data holds much value, and businesses are very much aware of it, as showcased by the appetite for AI experts in non-tech companies. Companies just have to do it right, and one of the key tasks in this respect is to start focusing on people as much as we do on AIs.

Data can enhance the operations of virtually any component within the organizational structure of any business. As tempting as it may be to think of a future where there is a machine learning model for every business process, we do not need to tread that far right now. The goal for any company looking to tap data today comes down to getting it from point A to point B. Point A is the part in the workflow where data is being collected, and point B is the person who needs this data for decision-making.

Importantly, point B does not have to be a data scientist. It could be a manager trying to figure out the optimal workflow design, an engineer looking for flaws in a manufacturing process or a UI designer doing A/B testing on a specific feature. All of these people must have the data they need at hand all the time, ready to be processed for insights.

People can thrive with data just as well as models, especially if the company invests in them and makes sure to equip them with basic analysis skills. In this approach, accessibility must be the name of the game.

Skeptics may claim that big data is nothing but an overused corporate buzzword, but advanced analytics capacities can enhance the bottom line for any company as long as it comes with a clear plan and appropriate expectations. The first step is to focus on making data accessible and easy to use and not on hauling in as much data as possible.

In other words, an all-around data culture is just as important for an enterprise as the data infrastructure.

#artificial-intelligence, #big-data, #column, #computing, #data, #data-analysis, #data-management, #data-scientist, #databases, #engineer, #enterprise, #facebook, #information, #machine-learning, #opinion, #startups, #tc

Monte Carlo closes $60M Series C on the back of rapid ARR growth

Monte Carlo has returned to the capital well for the second time this year, adding $60 million to its accounts thanks to a new, ICONIQ Growth-led Series C. The company helps customers monitor their data inflows, finding issues and errors that could foul downstream data analysis.

The company raised a Series B earlier this year, a $25 million round led by GGV and Redpoint. Both Series B lead investors participated in Monte Carlo’s Series C, joined by Salesforce Ventures, a new investor in the company.

The company declined to share what it is now worth, but did tell TechCrunch that it is now worth 4x its previous valuation. Given that the company’s Series B likely sold for between 15% and 25% of the company, valuing it somewhere around the $100 million to $175 million range, we can infer a new valuation range of around $400 million to $700 million from its new investment. That’s back-of-the-envelope math, but it’s the best we can do.

Image Credits: Monte Carlo. Shared as too often we cover enterprise software from a purely textual perspective. Here’s what it looks like!

In slightly more concrete terms, Monte Carlo told TechCrunch that it doubled its ARR in each of the last four quarters, or that it scaled its annual recurring revenue by around 8x from the one-year period between “summer 2020 [and] summer 2021.”

Given its growth rate, it’s not hard to see how the startup managed to put together another round of capital.

However, as is common with young tech companies raising rapid-fire rounds of capital these days, Monte Carlo did not need the money. According to Monte Carlo co-founder and CEO Barr Moses, she was on the receiving end of investor interest and still had the majority of her company’s Series B in the bank.

Monte Carlo has raised $101 million to date, and, given Moses’s comment, has north of $70 million in cash on hand.

Given the company’s dearth of impending cash-flow issues, we asked Moses why she raised more capital. She said that Monte Carlo is perceived as a market leader in its category and that she wants to build the biggest company that she can. Monte Carlo wants to pull on all available levers, she explained; with the new capital, it’s going to hire more folks, build more product, support more use cases and take on new markets.

The company’s market is growing in tandem with the larger world of big data and data-focused analysis. More simply, Monte Carlo sits upstream from data lakes and the analytical tools that data scientists use to extract insights from reams of information.

As TechCrunch noted at the time of the company’s Series B, by monitoring data ingestion for its customers, Monte Carlo can prevent crunched data from being skewed due to inbound issues. Its software can detect various divergences from historical patterns, for example, alerting customers to which of their inbound data sources could be providing faulty information.

If we think of data science as a steering wheel of sorts, data observability is the work of keeping the car’s windshield clear. That way when you steer, you aren’t accidentally driving in the wrong direction. Into a wall, for example. Sure, it’s not a perfect analogy, but it’s workable.

Widening our lens, Monte Carlo is a good example of the current startup fundraising market. When an upstart technology company can demonstrate rapid growth, investors are willing to fund it far in advance of its needs, hoping to secure a share of its future success as early as possible. This practice goes against old-fashioned venture capital thought, in which too much capital was believed to engender focus risks. Excess capital for investment has shaken up venture logic, or at least venture behavior.

Regardless, the market for analytical work in the data world has proved to be a simply massive market. Databricks’ latest valuation underscores that fact.

Monte Carlo is riding a similar wave. Expect to hear from the company again in early 2022 if it keeps its growth rate even close to what it managed in the last year.

#big-data, #data, #data-observability, #fundings-exits, #monte-carlo, #startups, #tc, #venture-capital

Don’t give your weed dealer all your data

Hello and welcome back to Equity, TechCrunch’s venture capital-focused podcast, where we unpack the numbers behind the headlines.

Our beloved Danny was back, joining Natasha and Alex and Grace and Chris to chat through yet another incredibly busy week. As a window into our process, every week we tell one another that the next week we’ll cut the show down to size. Then the week is so interesting that we end up cutting a lot of news, but also keeping a lot of news. The chaotic process is a work in progress, but it means that the end result is always what we decided we can’t not talk about.

Here’s what we got into:

Equity drops every Monday at 7:00 a.m. PDT, Wednesday, and Friday morning at 7:00 a.m. PDT, so subscribe to us on Apple PodcastsOvercastSpotify and all the casts.

#climate-change, #cloud-100, #data, #disaster-tech, #dreamforce, #edtech, #equity, #equity-podcast, #felt, #figma, #fundings-exits, #gusto, #india, #ipcc-report, #mailchimp, #pave, #ransomware, #rapidsos, #salesforce, #startups, #surfside, #trendyol, #turkey, #upgrad

Florida is ablaze with COVID-19—and its case data reporting is a hot mess

A man in a suit gestures while speaking at a podium

Enlarge / Florida Gov. Ron DeSantis speaks during an event on August 10, 2021. (credit: Getty | Joe Raedle)

With the hypertransmissible delta variant on the rampage, Florida has become the epicenter of transmission in the US. The state is experiencing its largest surge of COVID-19 cases yet in the pandemic. Hospitalizations have reached record levels, and deaths are on the rise.

But instead of focusing on the response to the dire public health emergency, state officials appear to be squabbling over pandemic data and health measures.

On Monday night, Florida’s health department blasted media outlets for reporting the state’s most recent daily COVID-19 cases counts—as the counts were relayed by the Centers for Disease Control and Prevention.

Read 8 remaining paragraphs | Comments

#cases, #cdc, #coronavirus, #covid-19, #data, #florida, #infectious-disease, #masks, #pandemic, #public-health, #ron-desantis, #science

Facebook adds Photobucket and Google Calendar to its data portability options

Facebook has today announced that it has added two new destinations for when you want to move your data from the social network. In a blog post, the company said that users will be able to move their images to Photobucket and event listings to Google Calendar. Product Manager Hadi Michel said that the tool has been “completely rebuilt” to be “simpler and more intuitive,” giving people more clarity on what they can share to which platforms. In addition, users can now launch multiple transfers, with better fine-grain control on what they’re choosing to export in any one transfer.

This is yet another feature piled on to the Data Transfer Project, an open-source project developed by Google, Facebook and Microsoft. Facebook users can already send their photos to Google’s own image-storage service, as well as Dropbox, Blogger, Google Documents and WordPress. This is, in part, a way to address the long-in-progress ACCESS Act, which would enable users to transfer their data to any competing platform. Facebook says that it calls on government to “make clearer rules about who is responsible for protecting that data as it is transferred to different services.”

Editor’s note: This post originally appeared on Engadget.

#column, #data, #data-portability, #data-transfer, #facebook, #google-calendar, #photobucket, #tc, #tceng

Monte Carlo’s Barr Moses will join us at TC Sessions: SaaS

Monte Carlo’s Barr Moses joins the data panel at TC Sessions: SaaS. See you there!

As the clock ticks down on TechCrunch’s upcoming SaaS-focused event, we’re excited to announce that Monte Carlo co-founder and CEO Barr Moses will join us. Specifically, the startup exec will be joining our data-focused panel.

What does Monte Carlo do? The startup works in the realm of data observability, making sure that companies’ data ingestion work is bringing in actual information, and not bunk.

When I covered Monte Carlo’s Series B earlier this year, Moses was kind enough to walk me through her company’s market. Which makes her a perfect fit for our data-focused panel.

We’re past the era in which saying “big data” could get you onto a stage. Today’s data gurus are now building lakehouses and going public for their work with hybrid structured-and-unstructured database tech. Meanwhile, Monte Carlo wants to make sure that companies around the world are alerted when some of their incoming data pipelines go off the rails. That way when the corporate world does run data analysis on their collected information, it isn’t skewed by zeroes and other effluent.

It’s a big enough problem, and a hot enough market, that Monte Carlo raised its Series A in September of 2020, and its Series B mere months later in February of 2021. That’s a rapid-fire pace of capital accumulation; investors are betting that Moses and her team are onto something pretty big. Notably, TechCrunch also published an article the other month that included an interview with her cofounder, Yotam Hadass.

Moses will join other tech folks at the event, including Javier Soltero, Google’s head of Workspace. Who else is coming? Databricks’ Ali Ghodsi, UiPath’s Daniel Dines, Puppet’s Abby Kearns, and investors Casey Aylward and Sarah Guo, among others. It’s going to be nerdy and kickass.

Register today with a $75 early-bird ticket and save $100 before tickets go up. TC Sessions: SaaS takes place on October 27 and will feature chats with the leading minds in SaaS, networking and startup demos.

#data, #databricks, #fundings-exits, #monte-carlo, #saas, #startups, #tc, #tc-sessions-saas-2021

Opaque raises $9.5M seed to secure sensitive data in the cloud

Opaque, a new startup born out of Berkely’s RISELabs, announced a $9.5 million seed round today to build a solution to access and work with sensitive data in the cloud in a secure way, even with multiple organizations involved. Intel Capital led today’s investment with participation by Race Capital, The House Fund and FactoryHQ.

The company helps customers work with secure data in the cloud while making sure the data they are working on is not being exposed to cloud providers, other research participants or anyone else, says company president Raluca Ada Popa.

“What we do is we use this very exciting hardware mechanism called Enclave, which [operates] deep down in the processor — it’s a physical black box — and only gets decrypted there. […] So even if somebody has administrative privileges in the cloud, they can only see encrypted data,” she explained.

Company co-founder Ion Stoica, who was a co-founder at Databricks, says the startup’s solution helps resolve two conflicting trends. On one hand, businesses increasingly want to make use of data, but at the same time are seeing a growing trend toward privacy. Opaque is designed to resolve this by giving customers access to their data in a safe and fully encrypted way.

The company describes the solution as “a novel combination of two key technologies layered on top of state-of-the-art cloud security—secure hardware enclaves and cryptographic fortification.” This enables customers to work with data — for example to build machine learning models — without exposing the data to others, yet while generating meaningful results.

Popa says this could be helpful for hospitals working together on cancer research, who want to find better treatment options without exposing a given hospital’s patient data to other hospitals, or banks looking for money laundering without exposing customer data to other banks, as a couple of examples.

Investors were likely attracted to the pedigree of Popa, a computer security and applied crypto professor at UC Berkeley and Stoica, who is also a Berkeley professor and co-founded Databricks. Both helped found RISELabs at Berkeley where they developed the solution and spun it out as a company.

Mark Rostick, vice president and senior managing director at lead investor Intel Capital says his firm has been working with the founders since the startup’s earliest days, recognizing the potential of this solution to help companies find complex solutions even when there are multiple organizations involved sharing sensitive data.

“Enterprises struggle to find value in data across silos due to confidentiality and other concerns. Confidential computing unlocks the full potential of data by allowing organizations to extract insights from sensitive data while also seamlessly moving data to the cloud without compromising security or privacy,” Rostick said in a statement

He added, “Opaque bridges the gap between data security and cloud scale and economics, thus enabling inter-organizational and intra-organizational collaboration.”

#cloud, #cloverly, #data, #data-security, #encryption, #enterprise, #funding, #machine-learning, #recent-funding, #security, #startups, #tc

Tinybird turns raw data into realtime API at scale

Meet Tinybird, a new startup that helps developers build data products at scale without having to worry about infrastructure, query time and all those annoying issues that come up once you deal with huge data sets. The company ingests data at scale, lets you transform it using SQL and then exposes that data through API endpoints.

Over the past few years, analytics and business intelligence products have really changed the way we interact with data. Now, many big companies store data in a data warehouse or a data lake. They try to get insights from those data sets.

And yet, extracting and manipulating data can be costly and slow. It works great if you want to make a PowerPoint presentation for your quarterly results. But it doesn’t let you build modern web products and data products in general.

“What we do at Tinybird is we help developers build data products at any scale. And we’re really focused on the realtime aspect,” co-founder and CEO Jorge Gómez Sancha told me.

The team of co-founders originally met at Carto. They were already working on complex data issues. “Every year people would come with an order of magnitude more data,” Gómez Sancha said. That’s how they came up with the idea behind Tinybird.

Image Credits: Tinybird

The product can be divided into three parts. First, you connect your Tinybird account with your data sources. The company will then ingest data constantly from those data sources.

Second, you can transform that data through SQL queries. In addition to the command-line interface, you can also enter your SQL queries in a web interface, divide then into multiple steps and document everything. Every time you write a query, you can see your data filtered and sorted according to your query.

Third, you can create API endpoints based on those queries. After that, it works like a standard JSON-based API. You can use it to fetch data in your own application.

What makes Tinybird special is that it’s so fast that it feels like you’re querying your data in realtime. "Several of our customers are reading over 1.5 trillion rows on average per day via Tinybird and ingesting around 5 billion rows per day, others are making an average of 250 requests per second to our APIs querying several billion row datasets," Gómez Sancha wrote in an email.

Behind the scene, the startup uses ClickHouse. But you don’t have to worry about that as Tinybird manages all the infrastructure for you.

Right now, Tinybird has identified three promising use cases. Customers can use it to provide in-product analytics. For instance, if you operate a web hosting service and wants to give some analytics to your customers or if you manage online stores and want to surface purchasing data to your customers, Tinybird works well for that.

Some customers also use the product for operational intelligence, such as realtime dashboards that you can share internally within a company. Your teams can react more quickly and always know if everything is running fine.

You can also use Tinybird as the basis for some automation or complex event processing. For instance, you can leverage Tinybird to build a web application firewall that scans your traffic and reacts in realtime.

Tinybird has raised a $3 million seed round led by with several business angels also participating, such as Nat Friedman (GitHub CEO), Nicholas Dessaigne (Algolia co-founder), Guillermo Rauch (Vercel CEO), Jason Warner (GitHub CTO), Adam Gross (former Heroku CEO), Stijn Christiaens (co-founder and CTO of Collibra), Matias Woloski (co-founder and CTO of Auth0) and Carsten Thoma (Hybris co-founder).

#data, #developer, #europe, #fundings-exits, #startups, #tc, #tinybird

Wimbledon: The tech behind the world’s top tennis tournament

The Championships at Wimbledon, which consumes some 28 tons of strawberries, 10,000 litres of cream, and 320,000 glasses of Pimm’s per year, is notable for its seemingly unchecked luxuriant hedonism. But while most companies, organisations, and institutions are looking to cut costs, Wimbledon has stuck to its mantra. Don’t do things cheaper; do things better.

Case in point: Wimbledon’s use of technology is really quite impressive. I’ve been lucky enough to follow Wimbledon’s tech over the last three years, and it has been very encouraging to see a massive endeavour like The Championships dive deeper and deeper into technology. You might think that after 140 years Wimbledon could be stuck in its ways, but far from it. When technology is the only viable way of providing consistent, significant gains for players, visitors, and hundreds of millions of people spectating remotely, you don’t try to fight it; you embrace it.

Of course, Wimbledon is embracing tech in a distinctly Wimbledon way. This year, for example, The Championships are trialling free Wi-Fi—but because they don’t yet know how people will behave with free Wi-Fi, it’s only available in three specific locations (near the food court, in the ticket resale area, and on the west side of court 12). The tournament continues through this weekend, but its team has already started analysing the data. It turns out that, thankfully, the Wi-Fi users on court 12 only use their phones between points. When play begins, Wi-Fi usage stops. If the trial is a success, Wi-Fi could be rolled out across Wimbledon in 2018.

Read 10 remaining paragraphs | Comments

#biz-it, #data, #sports, #tech, #tennis, #wimbledon

Enterprise AI platform Dataiku launches managed service for smaller companies

Dataiku is going downstream with a new product today called Dataiku Online. As the name suggests, Dataiku Online is a fully managed version of Dataiku. It lets you take advantage of the data science platform without going through a complicated setup process that involves a system administrator and your own infrastructure.

If you’re not familiar with Dataiku, the platform lets you turn raw data into advanced analytics, run some data visualization tasks, create data-backed dashboards and train machine learning models. In particular, Dataiku can be used by data scientists, but also business analysts and less technical people.

The company has been mostly focused on big enterprise clients. Right now, Dataiku has more than 400 customers, such as Unilever, Schlumberger, GE, BNP Paribas, Cisco, Merck and NXP Semiconductors.

There are two ways to use Dataiku. You can install the software solution on your own, own-premise servers. You can also run it on a cloud instance. With Dataiku Online, the startup offers a third option and takes care of setup and infrastructure for you.

“Customers using Dataiku Online get all the same features that our on-premises and cloud instances provide, so everything from data preparation and visualization to advanced data analytics and machine learning capabilities,” co-founder and CEO Florian Douetteau said. “We’re really focused on getting startups and SMBs on the platform — there’s a perception that small or early-stage companies don’t have the resources or technical expertise to get value from AI projects, but that’s simply not true. Even small teams that lack data scientists or specialty ML engineers can use our platform to do a lot of the technical heavy lifting, so they can focus on actually operationalizing AI in their business.”

Customers using Dataiku Online can take advantage of Dataiku’s pre-built connectors. For instance, you can connect your Dataiku instance with a cloud data warehouse, such as Snowflake Data Cloud, Amazon Redshift and Google BigQuery. You can also connect to a SQL database (MySQL, PostgreSQL…), or you can just run it on CSV files stored on Amazon S3.

And if you’re just getting started and you have to work on data ingestion, Dataiku works well with popular data ingestion services. “A typical stack for our Dataiku Online Customers involves leveraging data ingestion tools like FiveTran, Stitch or Alooma, that sync to a cloud data warehouse like Google BigQuery, Amazon Redshift or Snowflake. Dataiku fits nicely within their modern data stacks,” Douetteau said.

Dataiku Online is a nice offering to get started with Dataiku. High-growth startups might start with Dataiku Online as they tend to be short on staff and want to be up and running as quickly as possible. But as you become bigger, you could imagine switching to a cloud or on-premise installation of Dataiku. Employees can keep using the same platform as the company scales.

#ai, #analytics, #artificial-intelligence, #data, #dataiku, #developer, #enterprise-ai, #machine-learning, #startups, #tc

Peloton and Echelon profile photos exposed riders’ real-world locations

Security researchers say at-home exercise giant Peloton and its closest rival Echelon were not stripping user-uploaded profile photos of their metadata, in some cases exposing users’ real-world location data.

Almost every file, photo or document contains metadata, which is data about the file itself, such as how big it is, when it was created, and by whom. Photos and video will often also include the location from where they were taken. That location data helps online services tag your photos or videos that you were at this restaurant or that other landmark.

But those online services — especially social platforms, where you see people’s profile photos — are supposed to remove location data from the file’s metadata so other users can’t snoop on where you’ve been, since location data can reveal where you live, work, where you go, and who you see.

Jan Masters, a security researcher at Pen Test Partners, found the metadata exposure as part of a wider look at Peloton’s leaky API. TechCrunch verified the bug by uploading a profile photo with GPS coordinates of our New York office, and checking the metadata of the file while it was on the server.

The bugs were privately reported to both Peloton and Echelon.

Peloton fixed its API issues earlier this month but said it needed more time to fix the metadata bug and to strip existing profile photos of any location data. A Peloton spokesperson confirmed the bugs were fixed last week. Echelon fixed its version of the bug earlier this month. But TechCrunch held this report until we had confirmation that both companies had fixed the bug and that metadata had been stripped from old profile photos.

It’s not known how long the bug existed or if anyone maliciously exploited it to scrape users’ personal information. Any copies, whether cached or scraped, could represent a significant privacy risk to users whose location identifies their home address, workplace, or other private location.

Parler infamously didn’t scrub metadata from user-uploaded photos, which exposed the locations of millions of users when archivists exploited weaknesses on the platform’s API to download its entire contents. Others have been slow to adopt metadata stripping, like Slack, even if it got there in the end.

Read more:

#api, #computing, #data, #data-management, #gps, #health, #information, #peloton, #pen-test-partners, #privacy, #security, #social-networks

Datacy raises $2.4M to help consumers monetize their own damn data

This morning Datacy, a startup with its headquarters in Wilmington, Delaware, announced that it has closed $2.4 million in new funding to continue building its consumer-friendly data collection and monetization service.

The company is effectively an argument that the preceding sentence is possible. Datacy is a tool that allows individuals to collect their browsing data, manage it, have it anonymized and aggregated with others and then sold. The end-user gets 85% of the resulting revenue, while Datacy takes 15%.

Its model has found financial backing, with its new capital coming from Trend Forward Capital, Truesight Ventures, Redhawk VC, the Female Founders Alliance and others. The startup raised the funds using a convertible note that was capped at $9.5 million, though TechCrunch is not certain whether or not there were other terms associated with the fundraising mechanism.

Regardless, Datacy’s model fits into the modestly more privacy-forward stance that the technology world has taken in recent years; Apple is not the only company looking to make hay off of what some consider to be rising consumer interest in keeping their activities, and data, to themselves. But what Datacy wants to do is merge the consumer privacy impulse with profit.

According to company co-founder Paroma Indilo, her startup is not a cookie blocker. She told TechCrunch that if someone wants to block data collection, there are good tools for the task in the market already. What Datacy wants to do, she said, is evolve from its current status as a control platform to the way that data is shared and exchanged, built atop user consent. With monetization, we’d add.

It’s a better vision for the future than the hellscape adtech and data-vendor market that we’ve become accustomed to.

Today the startup has live beta users, allowing it to learn and collect initial data. The company is waiting to make the business side of its operation open to all until it has 50,000 users; Indilo told TechCrunch that individual data is not worth much, but in aggregate it can be worth quite a lot. So to see the startup wait to scale up its sales operations until it has a larger user base is reasonable.

It may not be too long until Datacy reaches that 50,000 user mark. From a current base of 10,000, and what Indilo described as 30% monthly growth via word of mouth, it could hit that mark in a half-year or so.

Datacy is one of those early-stage bets that has a lot of potential, but also a notable helping of risk. If it can attract the masses it needs to prove out the economics of its model, its payments to its user base could make growth a self-fulfilling destiny. But if its ability to garner more users slows, it could fail to reach sufficient scale for its model to work whatsoever.

So it’s a good use of venture capital, in other words. We’ll check back in with Datacy in a few months to see how close it is to its 50,000 user goal. And how its bet that consumers want their data back is playing out.

#data, #datacy, #female-founders-alliance, #fundings-exits, #recent-funding, #startups, #truesight-ventures

Vizio TV buyers are becoming the product Vizio sells, not just its customers

Promotional image for widescreen television set.

Enlarge / Vizio’s 65-inch 4K OLED TV. (credit: Vizio)

Over the past several years, TV-maker Vizio has achieved a reputation among home theater enthusiasts as the company that makes TVs that provide superior picture quality relative to their cost. While the most expensive TVs from Samsung and LG beat Vizio’s in quality assessment by reviewers, Vizio is widely regarded as one of the best bang-for-buck brands.

But for consumers, those competitive prices may come with a downside: becoming subject to targeted advertising and monetized personal data collection. As reported previously on Engadget, Vizio just posted its first public earnings report, wherein it revealed that profits from the part of its business that is built around collecting and selling user data as well as targeting advertising at users totaled $38.4 million in the quarter.

That’s less than the $48.2 million of profit generated by device sales in the same quarter, but data and advertising profits grew significantly year-over-year while actual device sales grew comparatively slowly. These digital products are still nowhere close to device sales in total revenue, however; the data and ad-related business unit (dubbed Platform+) added up to only 7.2 percent of global revenue.

Read 4 remaining paragraphs | Comments

#advertising, #data, #earnings, #tech, #tv, #vizio

Near acquires the location data company formerly known as UberMedia

Data intelligence company Near is announcing the acquisition of another company in the data business — UM.

In some ways, this echoes Near’s acquisition of Teemo last fall. Just as that deal helped Singapore-headquartered Near expand into Europe (with Teemo founder and CEO Benoit Grouchko becoming Near’s chief privacy officer), CEO Anil Mathews said that this new acquisition will help Near build a presence in the United States, turning the company into “a truly global organization,” while also tailoring its product to offer “local flavors” in each country.

The addition of UM’s 60-person team brings Near’s total headcount to around 200, with UM CEO Gladys Kong becoming CEO of Near North America.

At the same time, Mathews suggested that this deal isn’t simply about geography, because the data offered by Near and UM are “very complementary,” allowing both teams to upsell current customers on new offerings. He described Near’s mission as “merging two diverse worlds, the online world and the offline world,” essentially creating a unified profile of consumers for marketers and other businesses. Apparently, UM is particularly strong on the offline side, thanks to its focus on location data.

Near CEO Anil Mathews and UM CEO Gladys Kong

Near CEO Anil Mathews and UM CEO Gladys Kong

“UM has a very strong understanding of places, they’ve mastered their understanding of footfalls and dwell times,” Mathews added. “As a result, most of the use cases where UM is seeing growth — in tourism, retail, real estate — are in industries struggling due to the pandemic, where they’re using data to figure out, ‘How do we come out of the pandemic?’”

TechCrunch readers may be more familiar with UM under its old name UberMedia, which created social apps like Echofon and UberSocial before pivoting its business to ad attribution and location data. Kong said that contrary to her fears, the company had “an amazing 2020” as businesses realized they needed UM’s data (its customers include RAND Corporation, Hawaii Tourism Authority, Columbia University and Yale University).

And the year was capped by connecting with Near and realizing that the two companies have “a lot of synergies.” In fact, Kong recalled that UM’s rebranding last month was partly at Mathews’ suggestion: “He said, ‘Why do you have media in your name when you don’t do media?’ And we realized that’s probably how the world saw us, so we decided to change [our name] to make it clear what we do.”

Founded in 2010, UM raised a total of $34.6 million in funding, according to Crunchbase. The financial terms of the acquisition were not disclosed.


#advertising-tech, #data, #enterprise, #near, #startups, #tc, #ubermedia

Flawed data is putting people with disabilities at risk

Data isn’t abstract — it has a direct impact on people’s lives.

In 2019, an AI-powered delivery robot momentarily blocked a wheelchair user from safely accessing the curb when crossing a busy road. Speaking about the incident, the person noted how “it’s important that the development of technologies [doesn’t put] disabled people on the line as collateral”.

Alongside other minority groups, people with disabilities have long been harmed by flawed data and data tools. Disabilities are diverse, nuanced, and dynamic; they don’t fit within the formulaic structure of AI, which is programmed to find patterns and form groups. Because AI treats any outlier data as ‘noise’ and disregards it, too often people with disabilities are excluded from its conclusions.

Take for example the case of Elaine Herzberg, who was struck and killed by a self-driving Uber SUV in 2018. At the time of the collision, Herzberg was pushing a bicycle, which meant Uber’s system struggled to categorize her and flitted between labeling her as a ‘vehicle,’ ‘bicycle,’ and ‘other.’ The tragedy raised many questions for people with disabilities: would a person in a wheelchair or a scooter be at risk of the same fatal misclassification?

We need a new way of collecting and processing data. ‘Data’ ranges from personal information, user feedback, resumes, multimedia, user metrics, and much more, and it’s constantly being used to optimize our software. However, it’s not done so with the understanding of the spectrum of nefarious ways that it can and is used in the wrong hands, or when principles are not applied to each touchpoint of building.

Our products are long overdue for a new, fairer data framework to ensure that data is managed with people with disabilities in mind. If it isn’t, people with disabilities will face more friction, and dangers, in a day-to-day life that is increasingly dependent on digital tools.

Misinformed data hampers the building of good tools

Products that lack accessibility might not stop people with disabilities from leaving their homes, but they can stop them from accessing pivot points of life like quality healthcare, education, and on-demand deliveries.

Our tools are a product of their environment. They reflect their creators’ world view and subjective lens. For too long, the same groups of people have been overseeing faulty data systems. It’s a closed loop, where underlying biases are perpetuated and groups that were already invisible remain unseen. But as data progresses, that loop becomes a snowball. We’re dealing with machine-learning models — if they’re taught long enough that ‘not being X’ (read: white, able-bodied, cisgendered) means not being ‘normal’, they will evolve by building on that foundation.

Data is interlinked in ways that are invisible to us. It’s not enough to say that your algorithm won’t exclude people with registered disabilities. Biases are present in other sets of data. For example, in the United States it’s illegal to refuse someone a mortgage loan because they’re Black. But by basing the process heavily on credit scores — which have inherent biases detrimental to people of color — banks indirectly exclude that segment of society.

For people with disabilities, indirectly biased data could potentially be: frequency of physical activity or number of hours commuted per week. Here’s a concrete example of how indirect bias translates to software: If a hiring algorithm studies candidates’ facial movements during a video interview, a person with a cognitive disability or mobility impairment will experience different barriers than a fully able-bodied applicant.

The problem also stems from people with disabilities not being viewed as part of businesses’ target market. When companies are in the early stage of brainstorming their ideal users, people’s disabilities often don’t figure, especially when they’re less noticeable — like mental health illness. That means the initial user data used to iterate products or services doesn’t come from these individuals. In fact, 56% of organizations still don’t routinely test their digital products among people with disabilities.

If tech companies proactively included individuals with disabilities on their teams, it’s far more likely that their target market would be more representative. In addition, all tech workers need to be aware of and factor in the visible and invisible exclusions in their data. It’s no simple task, and we need to collaborate on this. Ideally, we’ll have more frequent conversations, forums and knowledge-sharing on how to eliminate indirect bias from the data we use daily.

We need an ethical stress test for data

We test our products all the time — on usability, engagement, and even logo preferences. We know which colors perform better to convert paying customers, and the words that resonate most with people, so why aren’t we setting a bar for data ethics?

Ultimately, the responsibility of creating ethical tech does not just lie at the top. Those laying the brickwork for a product day after day are also liable. It was the Volkswagen engineer (not the company CEO) who was sent to jail for developing a device that enabled cars to evade US pollution rules.

Engineers, designers, product managers: we all have to acknowledge the data in front of us and think about why we collect it and how we collect it. That means dissecting the data we’re requesting and analyzing what our motivations are. Does it always make sense to ask about someone’s disabilities, sex or race? How does having this information benefit the end user?

At Stark, we’ve developed a five-point framework to run when designing and building any kind of software, service or tech. We have to address:

  1. What data we’re collecting
  2. Why we’re collecting it
  3. How it will be used (and how it can be misused)
  4. Simulate IFTTT: ‘if this, then that.’ Explain possible scenarios in which the data can be used nefariously, and alternate solutions. For instance, how users can be impacted by an at-scale data breach? What happens if this private information becomes public to their family and friends?
  5. Ship or trash the idea

If we can only explain our data using vague terminology and unclear expectations, or by stretching the truth, we shouldn’t be allowed to have that data. The framework forces us to break down data in the most simple manner; and if we can’t, it’s because we’re not yet equipped to handle it responsibly.

Innovation has to include people with disabilities

Complex data technology is entering new sectors all the time, from vaccine development to robotaxis. Any bias against individuals with disabilities in these sectors stops them from accessing the most cutting-edge products and services. As we become more dependent on tech in every niche of our lives, there’s greater room for exclusion in how we carry out everyday activities.

This is all about forward thinking and baking inclusion into your product at the start. Money and/or experience aren’t limiting factors here — changing your thought process and development journey is free, it’s just a conscious pivot in a better direction. And while the upfront cost may be a heavy lift, the profits you’d lose from not tapping into these markets, or because you end up retrofitting your product down the line, far outweigh that initial expense. This is especially true for enterprise-level companies that won’t be able to access academia or governmental contracts without being compliant.

So early-stage companies, integrate accessibility principles into your product development and gather user data to constantly reinforce those principles. Sharing data across your onboarding, sales, and design teams will give you a more complete picture of where your users are experiencing difficulties. Later-stage companies should carry out a self-assessment to determine where those principles are lacking in their product, and harness historical data and new user feedback to generate a fix.

An overhaul of AI and data isn’t just about adapting businesses’ framework. We still need the people at the helm to be more diverse. The fields remain overwhelmingly male and white, and in tech, there are numerous first-hand accounts of exclusion and bias towards people with disabilities. Until the teams curating data tools are themselves more diverse, nations’ growth will continue to be stifled, and people with disabilities will be some of the hardest-hit casualties.

#accessibility, #artificial-intelligence, #cat-noone, #column, #data, #diversity, #ethics, #opinion, #tc

Data scientists: Bring the narrative to the forefront

By 2025, 463 exabytes of data will be created each day, according to some estimates. (For perspective, one exabyte of storage could hold 50,000 years of DVD-quality video.) It’s now easier than ever to translate physical and digital actions into data, and businesses of all types have raced to amass as much data as possible in order to gain a competitive edge.

However, in our collective infatuation with data (and obtaining more of it), what’s often overlooked is the role that storytelling plays in extracting real value from data.

The reality is that data by itself is insufficient to really influence human behavior. Whether the goal is to improve a business’ bottom line or convince people to stay home amid a pandemic, it’s the narrative that compels action, rather than the numbers alone. As more data is collected and analyzed, communication and storytelling will become even more integral in the data science discipline because of their role in separating the signal from the noise.

Data alone doesn’t spur innovation — rather, it’s data-driven storytelling that helps uncover hidden trends, powers personalization, and streamlines processes.

Yet this can be an area where data scientists struggle. In Anaconda’s 2020 State of Data Science survey of more than 2,300 data scientists, nearly a quarter of respondents said that their data science or machine learning (ML) teams lacked communication skills. This may be one reason why roughly 40% of respondents said they were able to effectively demonstrate business impact “only sometimes” or “almost never.”

The best data practitioners must be as skilled in storytelling as they are in coding and deploying models — and yes, this extends beyond creating visualizations to accompany reports. Here are some recommendations for how data scientists can situate their results within larger contextual narratives.

Make the abstract more tangible

Ever-growing datasets help machine learning models better understand the scope of a problem space, but more data does not necessarily help with human comprehension. Even for the most left-brain of thinkers, it’s not in our nature to understand large abstract numbers or things like marginal improvements in accuracy. This is why it’s important to include points of reference in your storytelling that make data tangible.

For example, throughout the pandemic, we’ve been bombarded with countless statistics around case counts, death rates, positivity rates, and more. While all of this data is important, tools like interactive maps and conversations around reproduction numbers are more effective than massive data dumps in terms of providing context, conveying risk, and, consequently, helping change behaviors as needed. In working with numbers, data practitioners have a responsibility to provide the necessary structure so that the data can be understood by the intended audience.

#column, #computing, #data, #data-management, #data-visualization, #developer, #ec-column, #ec-consumer-applications, #ec-enterprise-applications, #enterprise, #machine-learning, #peter-wang, #startups, #storytelling

Soda monitors data and helps you fix issues before it’s too late

Meet Soda, a data monitoring platform that is going to help you discover issues with your data processing setup. This way, you can react as quickly as possible and make sure that you keep the full data picture.

If you’re building a digital-first company, you and your customers are likely generating a ton of data. And you may even be leveraging that data to adjust your product itself — think about hotel pricing, finding the right restaurant on a food delivery website, applying for a loan with a fintech company, etc. Those are data-heavy products.

“Companies build a data platform — as they call it — in one of the big three clouds [Amazon Web Services, Google Cloud, Microsoft Azure]. They land their data in there and they make it available for analytics and more,” Soda co-founder and CEO Maarten Masschelein told me.

You can then tap into those data lakes or data warehouses to display analytics, visualize your data, monitor your services, etc. But what happens if there’s an issue in your data workflows?

It might take you a while to realize that there’s some missing data, or that you’re miscounting some stuff. For instance, Facebook miscalculated average video view times for several years. When you spot that issue, an important part of your business might be affected.

Soda wants to catch data issues as quickly as possible by monitoring your data automatically and at scale. “We sit further upstream, closer to the source of data,” Masschelein said.

When you set up Soda with your data platform, you instantly get some alerts. Soda tells you if there’s something off. For example, if your application generated only 6,000 records today while you usually generate 24,000 records in 24 hours, chances are there’s something wrong. Or if you usually get a new entry every minute and there hasn’t been an entry in 15 minutes, your data might not be fresh.

“But that only covers a small part of what is considered data issues. There’s more logic that you want to test and validate,” Masschelein said.

Soda lets you create rules to test and validate your data. Basically, think about test suite in software development. When you build a new version of your app, your code needs to pass several tests to make sure that nothing critical is going to break with the new version.

With Soda, you can check data immediately and get the result. If the test doesn’t pass, you can programmatically react — for instance, you can stop a process and quarantine data.

Today, the startup is also launching Soda Cloud. It’s a collaboration web application that gives you visibility in your data flows across the organization. This way, non-technical people can easily browse metadata to see whether everything seems to be flowing correctly.

Basically, Soda customers use Soda SQL, a command-line tool that helps someone scan data, along with Soda Cloud, a web application to view Soda SQL results.

Beyond those products, Soda’s vision is that data is becoming an entire category in software products. Development teams now have a ton of dev tools available to automate testing, integration, deployment, versioning, etc. But there’s a lot of potential for tools specifically designed for data teams.

Soda has recently raised a $13.5 million Series A round (€11.5 million) led by Singular, a new Paris-based VC fund that I covered earlier this week. Soda’s seed investors Point Nine Capital, Hummingbird Ventures, DCF and various business angels also participated.

#data, #data-lake, #data-platform, #data-warehouse, #dev-tool, #developer, #europe, #soda, #startups

Android sends 20x more data to Google than iOS sends to Apple, study says

A woman under a thick blanket looks at her smartphone.

Enlarge / Insomnia people and mobile-addiction concepts. (credit: Getty Images)

Whether you have an iPhone or an Android device, it’s continuously sending data including your location, phone number, and local network details to Apple or Google. Now, a researcher has provided a side-by-side comparison that suggests that, while both iOS and Android collect handset data around the clock—even when devices are idle, just out of the box, or after users have opted out—the Google mobile OS collects about 20 times as much data than its Apple competitor.

Both iOS and Android, researcher Douglas Leith from Trinity College in Ireland said, transmit telemetry data to their motherships even when a user hasn’t logged in or has explicitly configured privacy settings to opt out of such collection. Both OSes also send data to Apple and Google when a user does simple things such as inserting a SIM card or browsing the handset settings screen. Even when idle, each device connects to its back-end server on average every 4.5 minutes.

Apps and more

It wasn’t just the OSes that sent data to Apple or Google. Preinstalled apps or services also made network connections, even when they hadn’t been opened or used. Whereas iOS automatically sent Apple data from Siri, Safari, and iCloud, Android collected data from Chrome, YouTube, Google Docs, Safetyhub, Google Messenger, the device clock, and the Google search bar.

Read 11 remaining paragraphs | Comments

#android, #apple, #biz-it, #data, #google, #ios, #privacy, #tech

No-code business intelligence service y42 raises $2.9M seed round

Berlin-based y42 (formerly known as Datos Intelligence), a data warehouse-centric business intelligence service that promises to give businesses access to an enterprise-level data stack that’s as simple to use as a spreadsheet, today announced that it has raised a $2.9 million seed funding round led by La Famiglia VC. Additional investors include the co-founders of Foodspring, Personio and Petlab.

The service, which was founded in 2020, integrates with over 100 data sources, covering all the standard B2B SaaS tools from Airtable to Shopify and Zendesk, as well as database services like Google’s BigQuery. Users can then transform and visualize this data, orchestrate their data pipelines and trigger automated workflows based on this data (think sending Slack notifications when revenue drops or emailing customers based on your own custom criteria).

Like similar startups, y42 extends the idea data warehouse, which was traditionally used for analytics, and helps businesses operationalize this data. At the core of the service is a lot of open source and the company, for example, contributes to GitLabs’ Meltano platform for building data pipelines.

y42 founder and CEO Hung Dang

y42 founder and CEO Hung Dang.

“We’re taking the best of breed open-source software. What we really want to accomplish is to create a tool that is so easy to understand and that enables everyone to work with their data effectively,” Y42 founder and CEO Hung Dang told me. “We’re extremely UX obsessed and I would describe us as no-code/low-code BI tool — but with the power of an enterprise-level data stack and the simplicity of Google Sheets.”

Before y42, Vietnam-born Dang co-founded a major events company that operated in over 10 countries and made millions in revenue (but with very thin margins), all while finishing up his studies with a focus on business analytics. And that in turn led him to also found a second company that focused on B2B data analytics.

Image Credits: y42

Even while building his events company, he noted, he was always very product- and data-driven. “I was implementing data pipelines to collect customer feedback and merge it with operational data — and it was really a big pain at that time,” he said. “I was using tools like Tableau and Alteryx, and it was really hard to glue them together — and they were quite expensive. So out of that frustration, I decided to develop an internal tool that was actually quite usable and in 2016, I decided to turn it into an actual company. ”

He then sold this company to a major publicly listed German company. An NDA prevents him from talking about the details of this transaction, but maybe you can draw some conclusions from the fact that he spent time at Eventim before founding y42.

Given his background, it’s maybe no surprise that y42’s focus is on making life easier for data engineers and, at the same time, putting the power of these platforms in the hands of business analysts. Dang noted that y42 typically provides some consulting work when it onboards new clients, but that’s mostly to give them a head start. Given the no-code/low-code nature of the product, most analysts are able to get started pretty quickly  — and for more complex queries, customers can opt to drop down from the graphical interface to y42’s low-code level and write queries in the service’s SQL dialect.

The service itself runs on Google Cloud and the 25-people team manages about 50,000 jobs per day for its clients. the company’s customers include the likes of LifeMD, Petlab and Everdrop.

Until raising this round, Dang self-funded the company and had also raised some money from angel investors. But La Famiglia felt like the right fit for y42, especially due to its focus on connecting startups with more traditional enterprise companies.

“When we first saw the product demo, it struck us how on top of analytical excellence, a lot of product development has gone into the y42 platform,” said Judith Dada, General Partner at LaFamiglia VC. “More and more work with data today means that data silos within organizations multiply, resulting in chaos or incorrect data. y42 is a powerful single source of truth for data experts and non-data experts alike. As former data scientists and analysts, we wish that we had y42 capabilities back then.”

Dang tells me he could have raised more but decided that he didn’t want to dilute the team’s stake too much at this point. “It’s a small round, but this round forces us to set up the right structure. For the series, A, which we plan to be towards the end of this year, we’re talking about a dimension which is 10x,” he told me.

#alteryx, #analytics, #berlin, #big-data, #business-intelligence, #business-software, #ceo, #cloud, #data, #data-analysis, #data-management, #data-warehouse, #enterprise, #general-partner, #information-technology, #judith-dada, #recent-funding, #shopify, #sql, #startups, #vietnam

Microsoft Azure expands its NoSQL portfolio with Managed Instances for Apache Cassandra

At its Ignite conference today, Microsoft announced the launch of Azure Managed Instance for Apache Cassandra, its latest NoSQL database offering and a competitor to Cassandra-centric companies like Datastax. Microsoft describes the new service as a ‘semi-managed offering that will help companies bring more of their Cassandra-based workloads into its cloud.

“Customers can easily take on-prem Cassandra workloads and add limitless cloud scale while maintaining full compatibility with the latest version of Apache Cassandra,” Microsoft explains in its press materials. “Their deployments gain improved performance and availability, while benefiting from Azure’s security and compliance capabilities.”

Like its counterpart, Azure SQL Manages Instance, the idea here is to give users access to a scalable, cloud-based database service. To use Cassandra in Azure before, businesses had to either move to Cosmos DB, its highly scalable database service which supports the Cassandra, MongoDB, SQL and Gremlin APIs, or manage their own fleet of virtual machines or on-premises infrastructure.

Cassandra was originally developed at Facebook and then open-sourced in 2008. A year later, it joined the Apache Foundation and today it’s used widely across the industry, with companies like Apple and Netflix betting on it for some of their core services, for example. AWS launched a managed Cassandra-compatible service at its re:Invent conference in 2019 (it’s called Amazon Keyspaces today), Microsoft only launched the Cassandra API for Cosmos DB last November. With today’s announcement, though, the company can now offer a full range of Cassandra-based servicer for enterprises that want to move these workloads to its cloud.

#amazon, #apache-cassandra, #api, #apple, #aws, #cloud, #computing, #data, #data-management, #datastax, #developer, #enterprise, #facebook, #microsoft, #microsoft-ignite-2021, #microsoft-azure, #mongodb, #netflix, #nosql, #sql, #tc

DataJoy raises $6M seed to help SaaS companies track key business metrics

Every business needs to track fundamental financial information, but the data typically lives in a variety of silos making it a constant challenge to understand a company’s overall financial health. DataJoy, an early stage startup, wants to solve that issue. The company announced a $6 million seed round today led by Foundation Capital with help from Quarry VC, Partech Partners, IGSB, Bow Capital and SVB.

Like many startup founders, CEO Jon Lee has experienced the frustration first hand of trying to gather this financial data, and he decided to start a company to deal with it once and for all. “The reason why I started this company was that I was really frustrated at Copper, my last company because it was really hard just to find the answers to simple business questions in my data,” he told me.

These include basic questions like how the business is doing this quarter, if there are any surprises that could throw the company off track and where are the best places to invest in the business to accelerate more quickly.

The company has decided to concentrate its efforts for starters on SaaS companies and their requirements. “We basically focus on taking the work out of revenue intelligence, and just give you the insights that successful companies in the SaaS vertical depend on to be the largest and fastest growing in the market,” Lee explained.

The idea is to build a product with a way to connect to key business systems, pull the data and answer a very specific set of business questions, while using machine learning to provide more proactive advice.

While the company is still in the process of building the product and is pre-revenue, it has begun developing the pieces to ultimately help companies answer these questions. Eventually it will have a set of connectors to various key systems like Salesforce for CRM, HubSpot and Marketo for marketing, Netsuite for ERP, Gainsight for customer experience and Amplitude for product intelligence.

Lee says the set of connectors will be as specific as the questions themselves and based on their research with potential customers and what they are using to track this information. Ashu Garg, general partner at lead investor Foundation Capital says that he was attracted to the founding team’s experience, but also to the fact they were solving a problem he sees all the time sitting on the boards of various SaaS startups.

“I spend my life in the board meetings. It’s what I do, and every CEO, every board is looking for straight answers for what should be obvious questions, but they require this intersection of data,” Garg said. He says to an extent, it’s only possible now due to the evolution of technology to pull this all together in a way that simplifies this process.

The company currently has 11 employees with plans to double that by the middle of this year. As a long-time entrepreneur, Lee says that he has found that building a diverse workforce is essential to building a successful company. “People have found diversity usually [results in a company that is] more productive, more creative and works faster,” Lee said. He said that that’s why it’s important to focus on diversity from the earliest days of the company, while being proactive to make that happen. For example, ensuring you have a diverse set of candidates to choose from when you are reviewing resumes.

For now, the company is 100% remote. In fact, Lee and his co-founder Chief Product Officer Ken Lee, who was previously at Tableau, have yet to meet in person, but they are hoping that changes soon. The company will eventually have a presence in Vancouver and San Mateo whenever offices start to open.

#data, #datajoy, #enterprise, #funding, #machine-learning, #recent-funding, #saas, #startups, #tc

TigerGraph raises $105M Series C for its enterprise graph database

TigerGraph, a well-funded enterprise startup that provides a graph database and analytics platform, today announced that it has raised a $105 million Series C funding round. The round was led by Tiger Global and brings the company’s total funding to over $170 million.

“TigerGraph is leading the paradigm shift in connecting and analyzing data via scalable and native graph technology with pre-connected entities versus the traditional way of joining large tables with rows and columns,” said TigerGraph found and CEO, Yu Xu. “This funding will allow us to expand our offering and bring it to many more markets, enabling more customers to realize the benefits of graph analytics and AI.”

Current TigerGraph customers include the likes of Amgen, Citrix, Intuit, Jaguar Land Rover and UnitedHealth Group. Using a SQL-like query language (GSQL), these customers can use the company’s services to store and quickly query their graph databases. At the core of its offerings is the TigerGraphDB database and analytics platform, but the company also offers a hosted service, TigerGraph Cloud, with pay-as-you-go pricing, hosted either on AWS or Azure. With GraphStudio, the company also offers a graphical UI for creating data models and visually analyzing them.

The promise for the company’s database services is that they can scale to tens of terabytes of data with billions of edges. Its customers use the technology for a wide variety of use cases, including fraud detection, customer 360, IoT, AI, and machine learning.

Like so many other companies in this space, TigerGraph is facing some tailwind thanks to the fact that many enterprises have accelerated their digital transformation projects during the pandemic.

“Over the last 12 months with the COVID-19 pandemic, companies have embraced digital transformation at a faster pace driving an urgent need to find new insights about their customers, products, services, and suppliers,” the company explains in today’s announcement. “Graph technology connects these domains from the relational databases, offering the opportunity to shrink development cycles for data preparation, improve data quality, identify new insights such as similarity patterns to deliver the next best action recommendation.”

#amgen, #analytics, #articles, #artificial-intelligence, #aws, #business-intelligence, #ceo, #citrix, #citrix-systems, #computing, #data, #database, #enterprise, #graph-database, #intuit, #jaguar-land-rover, #machine-learning, #tiger-global

Iteratively raises $5.4M to help companies build data pipelines they can trust

As companies gather more data, ensuring that they can trust the quality of that data is becoming increasingly important. An analytics pipeline is only as good as the data it collects, after all, and messy data — or outright bugs — can easily lead to issues further down the line.

Seattle-based Iteratively wants to help businesses build data pipelines they can trust. The company today announced a $5.4 million seed funding round led by Google’s AI-centric Gradient Ventures fund. Fika Ventures and early Iteratively investor PSL Ventures also participated, with Gradient Ventures partner Zach Bratun-Glennon joining the company’s board.

Patrick Thompson, Iteratively’s Co-founder and CEO, started working on Iteratively about two years ago. Before that, he worked at Atlassian and at Simplicity, where he met his co-founder Ondrej Hrebicek. After getting started, the team spent six months doing customer discovery and the theme they picked up on was that companies weren’t trusting the data they captured.

“We interviewed a ton of companies who built internal solutions, trying to solve this particular problem. We actually built one at Atlassian, as well, so I was very much intimately familiar with this pain. And so we decided to bring a product to market that really helps alleviate the pain,” he told me.

Image Credits: Iteratively

In a lot of companies, the data producers and data consumers don’t really talk to each other — and if they do, it’s often only through a spreadsheet or wiki. Iteratively aims to provide a collaborative environment to bring these different groups together and create a single source of truth for all stakeholders. “Typically, there’s a handoff process, either on a JIRA ticket or a Confluence page or spreadsheet, where they try to hand over these requirements — and generally, it’s never really implemented correctly, which then causes a lot of pain points down down the line,” Thompson explained.

Currently, Iteratively focuses on event streaming data for product and marketing analytics — the kind of data that typically flows into a Mixpanel, Amplitude or Segment. Iteratively itself sits at the origin of the data, say an app, and then validates the data and routes it to whatever third-party solution a company may use. That means the tool sits right where the data is generated, but this setup also means that none of the data ever flows through Iteratively’s own servers.

Image Credits: Iteratively

“We don’t actually see the data,” Thompson stressed. “We’re not a data set processor. We’re a wrapper over the top of your own analytics pipeline or your own third party SaaS tools, but we verify the payloads as they flow through our SDK on the client.”

Over time, though, that may change, he acknowledged and Iteratively may do some data processing as well, but likely with a focus on metadata and observability.

Since the company doesn’t actually process any of the data itself, it’s charging customers by seat and not based on how many events move through their pipelines, for example. That may obviously change over time as the company looks into doing some data processing on its side as well.

Currently, Iteratively has about 10 employees and plans to grow that to 20 by the end of the year. The company plans to hire across R&D, sales and marketing.

Iteratively‘s software has a unique approach to enabling company-wide collaboration and enforcing data quality,” said Grandient’s Bratun-Glennon. “Going forward, we believe that intelligent analytics and data-driven business decision making will differentiate successful companies and best-in-class products. Iteratively‘s mission, product and team are poised to give each of their customers these capabilities.”

#analytics, #articles, #artificial-intelligence, #atlassian, #business-intelligence, #co-founder, #data, #data-processing, #developer, #fika-ventures, #gradient-ventures, #information-technology, #mixpanel, #seattle

Datastax acquires Kesque as it gets into data streaming

Datastax, the company best known for commercializing the open-source Apache Cassandra database, is moving beyond databases. As the company announced today, it has acquired Kesque, a cloud messaging service.

The Kesque team built its service on top of the Apache Pulsar messaging and streaming project. Datastax has now taken that team’s knowledge in this area and, combined with its own expertise, is launching its own Pulsar-based streaming platform by the name of Datastax Luna Streaming, which is now generally available.

This move comes right as Datastax is also now, for the first time, announcing that it is cash-flow positive and profitable, as the company’s chief product officer, Ed Anuff, told me. “We are at over $150 million in [annual recurring revenue]. We are cash-flow positive and we are profitable,” he told me. This marks the first time the company is publically announcing this data. In addition, the company also today revealed that about 20 percent of its annual contract value is now for DataStax Astra, its managed multi-cloud Cassandra service and that the number of self-service Asta subscribers has more than doubled from Q3 to Q4.

The launch of Luna Streaming now gives the 10-year-old company a new area to expand into — and one that has some obvious adjacencies with its existing product portfolio.

“We looked at how a lot of developers are building on top of Cassandra,” Anuff, who joined Datastax after leaving Google Cloud last year, said. “What they’re doing is, they’re addressing what people call ‘data-in-motion’ use cases. They have huge amounts of data that are coming in, huge amounts of data that are going out — and they’re typically looking at doing something with streaming in conjunction with that. As we’ve gone in and asked, “What’s next for Datastax?,’ streaming is going to be a big part of that.”

Given Datastax’s open-source roots, it’s no surprise the team decided to build its service on another open-source project and acquire an open-source company to help it do so. Anuff noted that while there has been a lot of hype around streaming and Apache Kafka, a cloud-native solution like Pulsar seemed like the better solution for the company. Pulsar was originally developed at Yahoo! (which, full disclosure, belongs to the same Verizon Media Group family as TechCrunch) and even before acquiring Kesque, Datastax already used Pulsar to build its Astra platform. Other Pulsar users include Yahoo, Tencent, Nutanix and Splunk.

“What we saw was that when you go and look at doing streaming in a scale-out way, that Kafka isn’t the only approach. We looked at it, and we liked the Pulsar architecture, we like what’s going on, we like the community — and remember, we’re a company that grew up in the Apache open-source community — we said, ‘okay, we think that it’s got all the right underpinnings, let’s go and get involved in that,” Anuff said. And in the process of doing so, the team came across Kesque founder Chris Bartholomew and eventually decided to acquire his company.

The new Luna Streaming offering will be what Datastax calls a “subscription to success with Apache Pulsar.’ It will include a free, production-ready distribution of Pulsar and an optional, SLA-backed subscription tier with enterprise support.

Unsurprisingly, Datastax also plans to remain active in the Pulsar community. The team is already making code contributions, but Anuff also stressed that Datastax is helping out with scalability testing. “This is one of the things that we learned in our participation in the Apache Cassandra project,” Anuff said. “A lot of what these projects need is folks coming in doing testing, helping with deployments, supporting users. Our goal is to be a great participant in the community.”

#apache-cassandra, #cassandra, #cloud, #computing, #data, #data-management, #databases, #datastax, #developer, #enterprise, #graph-databases, #nosql, #tc

Electricity and CRISPR used to write data to bacterial DNA

False color image of bacteria

Enlarge (credit: Rizlan Bencheikh and Bruce Arey, PNNL)

In recent years, researchers have used DNA to encode everything from an operating system to malware. Rather than being a technological curiosity, these efforts were serious attempts to take advantage of DNA’s properties for long-term storage of data. DNA can remain chemically stable for hundreds of thousands of years, and we’re unlikely to lose the technology to read it, something you can’t say about things like ZIP drives and MO disks.

But so far, writing data to DNA has involved converting the data to a sequence of bases on a computer, and then ordering that sequence from someplace that operates a chemical synthesizer—living things don’t actually enter into the picture. But separately, a group of researchers had been figuring out how to record biological events by modifying a cell’s DNA, allowing them to read out the cell’s history. A group at Columbia University has now figured out how to merge the two efforts and write data to DNA using voltage differences applied to living bacteria.

CRISPR and data storage

The CRISPR system has been developed as a way of editing genes or cutting them out of DNA entirely. But the system first came to the attention of biologists because it inserted new sequences into DNA. For all the details, see our Nobel coverage, but for now, just know that part of the CRISPR system involves identifying DNA from viruses and inserting copies of it into the bacterial genome in order to recognize it should the virus ever appear again.

Read 12 remaining paragraphs | Comments

#biology, #crispr, #data, #molecular-biology, #science, #storage

Microsoft launches Azure Purview, its new data governance service

As businesses gather, store and analyze an ever-increasing amount of data, tools for helping them discover, catalog, track and manage how that data is shared are also becoming increasingly important. With Azure Purview, Microsoft is launching a new data governance service into public preview today that brings together all of these capabilities in a new data catalog with discovery and data governance features.

As Rohan Kumar, Microsoft’s corporate VP for Azure Data told me, this has become a major paint point for enterprises. While they may be very excited about getting started with data-heavy technologies like predictive analytics, those companies’ data- and privacy- focused executives are very concerned to make sure that the way the data is used is compliant or that the company has received the right permissions to use its customers’ data, for example.

In addition, companies also want to make sure that they can trust their data and know who has access to it and who made changes to it.

“[Purview] is a unified data governance platform which automates the discovery of data, cataloging of data, mapping of data, lineage tracking — with the intention of giving our customers a very good understanding of the breadth of the data estate that exists to begin with, and also to ensure that all these regulations that are there for compliance, like GDPR, CCPA, etc, are managed across an entire data estate in ways which enable you to make sure that they don’t violate any regulation,” Kumar explained.

At the core of Purview is its catalog that can pull in data from the usual suspects like Azure’s various data and storage services but also third-party data stores including Amazon’s S3 storage service and on-premises SQL Server. Over time, the company will add support for more data sources.

Kumar described this process as a ‘multi-semester investment,’ so the capabilities the company is rolling out today are only a small part of what’s on the overall roadmap already. With this first release today, the focus is on mapping a company’s data estate.

Image Credits: Microsoft

“Next [on the roadmap] is more of the governance policies,” Kumar said. “Imagine if you want to set things like ‘if there’s any PII data across any of my data stores, only this group of users has access to it.’ Today, setting up something like that is extremely complex and most likely you’ll get it wrong. That’ll be as simple as setting a policy inside of Purview.”

In addition to launching Purview, the Azure team also today launched Azure Synapse, Microsoft’s next-generation data warehousing and analytics service, into general availability. The idea behind Synapse is to give enterprises — and their engineers and data scientists — a single platform that brings together data integration, warehousing and big data analytics.

“With Synapse, we have this one product that gives a completely no code experience for data engineers, as an example, to build out these [data] pipelines and collaborate very seamlessly with the data scientists who are building out machine learning models, or the business analysts who build out reports for things like Power BI.”

Among Microsoft’s marquee customers for the service, which Kumar described as one of the fastest-growing Azure services right now, are FedEx, Walgreens, Myntra and P&G.

“The insights we gain from continuous analysis help us optimize our network,” said Sriram Krishnasamy, senior vice president, strategic programs at FedEx Services. “So as FedEx moves critical high value shipments across the globe, we can often predict whether that delivery will be disrupted by weather or traffic and remediate that disruption by routing the delivery from another location.”

Image Credits: Microsoft

#analytics, #big-data, #business-intelligence, #cloud, #computing, #data, #data-management, #data-protection, #developer, #enterprise, #general-data-protection-regulation, #information, #microsoft, #rohan-kumar, #tc

Europe’s data strategy aims to tip the scales away from big tech

Google wants to organize the world’s information but European lawmakers are in a rush to organize the local digital sphere and make Europe “the most data-empowered continent in the world”, internal market commissioner Thierry Breton said today, setting out the thinking behind the bloc’s data strategy during a livestreamed discussion organized by the Brussels-based economic think tank, Bruegel.

Rebalancing big data power dynamics to tip the scales away from big tech is another stated aim.

Breton likened the EU’s ambitious push to encourage industrial data sharing and rebalance platform power to work done in the past to organize the region’s air space and other physical infrastructure — albeit, with a lot less time to get the job done given the blistering pace of digital innovation.

“This will require of course political vision — that we have — and willingness, that I believe we have too, and smart regulation, hopefully you will judge, to set the right rules and investment in key infrastructure,” said Breton.

During the talk, he gave a detailed overview of how the flotilla of legislative proposals which are being worked on by EU lawmakers will set rules intended to support European businesses and governments to safely unlock the value of industrial and public data and drive the next decades of economic growth.

“We have been brave enough to set our rules in the personal data sphere and this is what we need to do now for government and public and industrial data. Set the rules. The European rules. Everyone will be welcome in Europe, that’s extremely important — provided they respect our rules,” said Breton.

“We don’t have one minute to lose,” he added. “The battle for industrial data is starting now and the battlefield may be Europe so we need to get ready — and this is my objective.”

EU lawmakers are drafting rules for how (non-personal) data can be used and shared; who will get access to them; and how rights can be guaranteed under the framework, per Breton. And he argued that concerns raised by European privacy challenges to international data transfers — reflected in the recent Schrems II ruling — are not limited to privacy and personal data. 

“These worries are in fact at the heart of the Single Market for data that I am building,” he said. “These worries are clear in the world we are entering when individuals or companies want to keep control over its data. The key question is, therefore, how to organize this control while allowing data flow — which is extremely important in the data economy.”

An open single European market for data must recognize that not all data are the same — “in terms of their sensitivity” — Breton emphasized, pointing to the EU’s General Data Protection Regulation (GDPR) data protection framework as “the proof of that”.

“Going forward, there are also sensitive industrial data that should benefit from specific conditions when they are accessed, used or shared,” he went on. “This is a case for instance for some sensitive public data [such as] from public hospitals, but also anonymized data that remains sensitive, mixed data which are difficult to handle.”

At one point during the talk he gave the example of European hospitals during the pandemic not being able to share data across borders to help in the fight against the virus because of the lack of a purpose-built framework to securely enable such data flows.

“I want our SMEs and startups, our public hospitals, our cities and many other actors to use more data — to make them available, to value them, to share them — but for this we need to generate the trust,” he added.

The first legislative plank of the transformation to a single European data economy is a Data Governance Act (DGA) — which Breton said EU lawmakers will present tomorrow, after a vote on the proposal this afternoon.

“With this act we are defining a European approach to data sharing,” he noted on the DGA. “This new regulation will facilitate data sharing across sectors and Member States. And it will put those who generate the data in the driving seat — moving away from the current practices of the big tech platforms.

“Concretely, with this legislation, we create the conditions to allow access to a reuse of sensitive public data, creating a body of harmonized rules for the single market.”

A key component of building the necessary trust for the data economy will mean creating rules that state “European highly sensitive data should be able to be stored and processed in the EU”, Breton also said, signalling that data localization will be a core component of the strategy — in line with a number of recent public remarks in which he’s argued it’s not protectionist for European data to be stored in Europe. 

“Without such a possibility Member States will never agree to open their data hold,” Breton went on, saying that while Europe will be “open” with data, it will not be offering a “naive” data free-for-all.

The Commission also wants the data framework to support an ecosystem of data brokers whose role Breton said will be to connect data owners and data users “in a neutral manner” — suggesting this will empower companies to have stronger control over the data they generate, (i.e the implication being rather than the current situation where data-mining platform giants can use their market power to asset-strip weaker third parties).

“We are shifting here the product,” he said. “And we promote also data altruism — the role of sharing data, industrial or personal, for common good.”

Breton also noted that the forthcoming data governance proposal will include a shielding provision — meaning data actors will be required to take steps to avoid having to comply with what he called “abusive and unlawful” data access requests for data held in Europe from third countries.

“This is a major point. It is not a question of calling into question our international judicial or policy cooperation. We cannot tolerate abuses,” he said, specifying three off-limits examples (“unauthorized access; access that do offer sufficient legal guarantees; or fishing expeditions), adding: “By doing so we are ensuring that European law and the guarantees it carries is respected. This is about enforcing our own rules.”

Breton also touched on other interlocking elements of the policy strategy which regional lawmakers see as crucial to delivering a functional data framework: Namely the Digital Services Act (DSA) and Digital Markets Act (DMA) — which are both due to be set out in detail early next month.

The DSA will put “a clear responsibility and obligation on platforms and the content that is spread”, said Breton.

While the companion ex ante regulation, the DMA, will “frame the behaviours of gatekeepers — of systemic actors in the Single Market — and target their behaviors against their competitors or customers”; aka further helping to pin and clip the wings of big tech.

“With this set of regulation I just want to set up the rules and that the rules are clear — based on our values,” he added.

He also confirmed that interoperability and portability will be a key feature of the EU’s hoped for data transformation.

“We are working on this on several strands,” he said on this. “The first is standards for interoperability. That’s absolutely key for sectoral data spaces that we will create and very important for the data flows. You will see that we will create a European innovation data board — set in the DGA today — which will help the Commission in setting and working the right standards.”

While combating “blocking efforts and abusive behaviors” by platform gatekeepers — which could otherwise put an artificial limit on the value of the data economy — will be “the job of the DMA”, he noted.

A fourth pillar of the data strategy — which Breton referred to as a “data act” — will be introduced in 2021, with the aim of “increasing fairness in the data economy by clarifying data usage rights in business to business and business to government settings”.

“We will also consider enhanced data portability rights to give individuals more control — which is extremely important — over the data they produce,” he added. “And we will have a look at the intellectual property rights framework.”

He also noted that key infrastructure investments will be vital — pointing to the Commission’s plan to build a European industrial cloud and related strategic tech investment priorities such as in compute power capacity, building out next-gen connectivity and support for cutting edges technologies like quantum encryption.

Privacy campaigner Max Schrems, who had been invited as the other guest speaker, raised the issue of enforceability — pointing out that Ireland’s data protection authority, which is responsible for overseeing a large number of major tech companies in the region, still hasn’t issued any decisions on cross-border complaints filed under the 2.5 year old GDPR framework.

Breton agreed that enforcement will be a vital piece of the puzzle — claiming EU lawmakers are alive to the problem of enforcement “bottlenecks” in the GDPR.

“We need definitely clear, predictable, implementable rules — and this is what is driving me when I am regulating against the data market. But also what you will find behind the DSA and the DMA with an ex ante regulation to be able to apply it immediately and everywhere in Europe, not only in one country, everywhere at the same time,” he said. “Just to be able to make sure that things are happening quick. In this digital space we have to be fast.”

“So we will again make sure in DSA that Member State authorities can ask platforms to remove immediately content cross-border — like, for example, if you want an immediate comparison, the European Arrest Warrant.”

The Commission will also have the power to step in via cooperation at the European level, Breton further noted.

“So you see we are putting in rules, we are not naive, we understand pretty well where we have the bottleneck — and again we try to regulate. And also, in parallel, that’s very important because like everywhere where you have regulation you need to have sanctions — you will have appropriate sanctions,” he said, adding: “We learn the lessons from the GDPR.”

#data, #data-governance-act, #digital-markets-act, #digital-services-act, #eu, #eu-data-strategy, #europe, #gdpr, #policy, #thierry-breton

Gretel announces $12M Series A to make it easier to anonymize data

As companies work with data, one of the big obstacles they face is making sure they are not exposing personally identifiable information (PII) or other sensitive data. It usually requires a painstaking manual effort to strip out that data. Gretel, an early stage startup, wants to change that by making it faster and easier to anonymize data sets. Today the company announced a $12 million Series A led by Greylock. The company has now raised $15.5 million.

Gretel founder and CEO Alex Watson says that his company was founded to make it simpler to anonymize data and unlock data sets that were previously out of reach because of privacy concerns.

“As a developer, you want to test an idea or build a new feature, and it can take weeks to get access to the data you need. Then essentially it boils down to getting approvals to get started, then snapshotting a database, and manually removing what looks like personal data and hoping that you got everything,”

Watson, who previously worked as a GM at AWS, believed that there needed to be a faster and more reliable way to anonymize the data, and that’s why he started Gretel. The first product is an open source, synthetic machine learning library for developers that strips out personally identifiable information.

“Developers use our open source library, which trains machine learning models on their sensitive data, then as that training is happening we are enforcing something called differential privacy, which basically ensures that the model doesn’t memorize details about secrets for individual people inside of the data,” he said. The result is a new artificial data set that is anonymized and safe to share across a business.

The company was founded last year, and they have actually used this year to develop the open source product and build an open source community around it. “So our approach and our go-to-market here is we’ve open sourced our underlying libraries, and we will also build a SaaS service that makes it really easy to generate synthetic data and anonymized data at scale,” he said.

As the founders build the company, they are looking at how to build a diverse and inclusive organization, something that they discuss at their regular founders’ meetings, especially as they look to take these investment dollars and begin to hire additional senior people.

“We make a conscious effort to have diverse candidates apply, and to really make sure we reach out to them and have a conversation, and that’s paid off, or is in the process of paying off I would say, with the candidates in our pipeline right now. So we’re excited. It’s tremendously important that we avoid group think that happens so often,” he said.

The company doesn’t have paying customers, but the plan is to build off the relationships it has with design partners and begin taking in revenue next year. Sridhar Ramaswamy, the partner at Greylock, who is leading the investment, says that his firm is placing a bet on a pre-revenue company because he sees great potential for a service like this.

“We think Gretel will democratize safe and controlled access to data for the whole world the way Github democratized source code access and control,” Ramaswamy said.

#data, #developer, #enterprise, #funding, #greylock, #open-source, #recent-funding, #startups