Xata is a database service for serverless apps

Meet Xata, a startup with a new take on managed databases. The company runs your database for you and turns it into an API so that you can query and update it from your serverless app. Xata has raised a $5 million funding round. Its product is not yet ready for prime time but the company is sharing details.

Xata seems particularly well suited for Jamstack websites. Jamstack has been a popular way of developing and deploying websites at scale. Popular Jamstack hosting platforms include Netlify, Vercel and Cloudflare Pages.

Applications are deployed on a global edge network and most of the logic is handled by API calls. The result is a website or an application that loads quickly and can handle a lot of traffic.

Deploying a Jamstack website is quite easy as it often integrates tightly with your Git repository. When you commit code changes, serverless platforms take care of deploying your application. Integrating with API-based developer tools is relatively effortless as well as you don’t manage the logic yourself.

For instance, deploying a website with static content and a Stripe checkout module doesn’t require a ton of effort — Stripe manages the payment servers for you. It gets a bit more complicated if you want to use a live database and interact with it. Traditional database software doesn’t rely on API calls across the internet to add a row, search through multiple rows and find data.

Xata is focusing on databases and want to make it easier to integrate a database with your serverless app. You don’t have to take care of the underlying infrastructure as Xata can scale the database for you. You don’t have to update software, move data to a new server, etc.

Your database is distributed across multiple data centers to improve response times and redundancy. It supports many data types including images. After that, interacting with the database works like any RESTful API out there.

The startup is also drawing some inspiration from popular no-code startups, such as Airtable. You can open your database in a web browser and interact with your data directly from there. For instance, you can filter the current view, sort data using a specific criteria and get the API query that you can use in your code.

If you store a lot of data in your database, you can search through your data using a free-text search feature. You can also leverage Xata for analytics by creating charts and visualizations.

The ability to interact with your data from a web browser is Xata’s competitive advantage. Many companies rely on Airtable as their first backend to prototype a new project. Xata could become a production-ready version of this Airtable-as-a-backend data management model.

The $5 million round was led by Index Ventures. Operator Collective, SV Angel, X-Factor and firstminute capital also participated. Some business angels, such as Shay Banon and Uri Boness from Elastic, Neha Narkhede from Confluent, Guillermo Rauch from Vercel, Elad Gil from Color Genomics and Christian Bach and Mathias Biilmann from Netlify also invested.

The startup was founded by Monica Sarbu, who used to be the Director of Engineering at Elastic. So she probably knows a thing or two about scaling databases.

Image Credits: Xata

#database, #developer, #fundings-exits, #serverless, #startups

Real-time database platform SingleStore raises $80M more, now at a $940M valuation

Organizations are swimming in data these days, and so solutions to help manage and use that data in more efficient ways will continue to see a lot of attention and business. In the latest development, SingleStore — which provides a platform to enterprises to help them integrate, monitor and query their data as a single entity, regardless of whether that data is stored in multiple repositories — is announcing another $80 million in funding, money that it will be using to continue investing in its platform, hiring more talent and overall business expansion. Sources close to the company tell us that the company’s valuation has grown to $940 million.

The round, a Series F, is being led by Insight Partners, with new investor Hewlett Packard Enterprise, and previous backers Khosla Ventures, Dell Capital, Rev IV, Glynn Capital, and GV (formerly Google Ventures) also participating. The startup has to date raised $264 million, including most recently an $80 million Series E as recently as last December, just on the heels of rebranding from MemSQL.

The fact that there are three major strategic investors in this Series F — HPE, Dell and Google — may say something about the traction that SingleStore is seeing, but so too do its numbers: 300%+ increase in new customer acquisition for its cloud service and 150%+ year-over-year growth in cloud

Raj Verma, SingleStore’s CEO, said in an interview that its cloud revenues have grown by 150% year over year and now account for some 40% of all revenues (up from 10% a year ago). New customer numbers, meanwhile, have grown by over 300%.

“The flywheel is now turning around,” Verma said. “We didn’t need this money. We’ve barely touched our Series E. But I think there has been a general sentiment among our board and management that we are now ready for the prime time. We think SingleStore is one of the best kept secrets in the database market. Now we want to aggressively be an option for people looking for a platform for intensive data applications or if they want to consolidate databases to 1 from 3, 5 or 7 repositories. We are where the world is going: real-time insights.”

With database management and the need for more efficient and cost-effective tools to manage that becoming an ever-growing priority — one that definitely got a fillip in the last 18 months with Covid-19 pushing people into more remote working environments. That means SingleStore is not without competitors, with others in the same space including Amazon, Microsoft, Snowflake, PostgreSQL, MySQL, Redis and more. Others like Firebolt are tackling the challenges of handing large, disparate data repositories from another angle. (Some of these, I should point out, are also partners: SingleStore works with data stored on AWS, Microsoft Azure, Google Cloud Platform, and Red Hat, and Verma describes those who do compute work as “not database companies; they are using their database capabilities for consumption for cloud compute.”)

But the company has carved a place for itself with enterprises and has thousands now on its books, including GE, IEX Cloud, Go Guardian, Palo Alto Networks, EOG Resources, and SiriusXM + Pandora.

“SingleStore’s first-of-a-kind cloud database is unmatched in speed, scale, and simplicity by anything in the market,” said Lonne Jaffe, managing director at Insight Partners, in a statement. “SingleStore’s differentiated technology allows customers to unify real-time transactions and analytics in a single database.” Vinod Khosla from Khosla Ventures added that “SingleStore is able to reduce data sprawl, run anywhere, and run faster with a single database, replacing legacy databases with the modern cloud.”

#amazon, #aws, #ceo, #cloud-computing, #cloud-infrastructure, #computing, #database, #database-management, #enterprise, #funding, #glynn-capital, #google-cloud-platform, #google-ventures, #hewlett-packard-enterprise, #khosla-ventures, #lonne-jaffe, #memsql, #microsoft, #mysql, #palo-alto-networks, #postgresql, #red-hat, #redis, #series-e, #singlestore, #snowflake, #vinod-khosla

Databricks raises $1.6B at $38B valuation as it blasts past $600M ARR

Databricks this morning confirmed earlier reports that it was raising new capital at a higher valuation. The data- and AI-focused company has secured a $1.6 billion round at a $38 billion valuation, it said. Bloomberg first reported last week that Databricks was pursuing new capital at that price.

The Series H was led by Counterpoint Global, a Morgan Stanley fund. Other new investors included Baillie Gifford, UC Investments and ClearBridge. A grip of prior investors also kicked in cash to the round.

The new funding brings Databricks’ total private funding raised to $3.5 billion. Notably, its latest raise comes just seven months after the late-stage startup raised $1 billion on a $28 billion valuation. Its new valuation represents paper value creation in excess of $1 billion per month.

The company, which makes open source and commercial products for processing structured and unstructured data in one location, views its market as a new technology category. Databricks calls the technology a data “lakehouse,” a mashup of data lake and data warehouse.

Databricks CEO and co-founder Ali Ghodsi believes that its new capital will help his company secure market leadership.

For context, since the 1980s, large companies have stored massive amounts of structured data in data warehouses. More recently, companies like Snowflake and Databricks have provided a similar solution for unstructured data called a data lake.

In Ghodsi’s view, combining structured and unstructured data in a single place with the ability for customers to execute data science and business-intelligence work without moving the underlying data is a critical change in the larger data market.

“[Data lakehouses are] a new category, and we think there’s going to be lots of vendors in this data category. So it’s a land grab. We want to quickly race to build it and complete the picture,” he said in an interview with TechCrunch.

Ghodsi also pointed out that he is going up against well-capitalized competitors and that he wants the funds to compete hard with them.

“And you know, it’s not like we’re up against some tiny startups that are getting seed funding to build this. It’s all kinds of [large, established] vendors,” he said. That includes Snowflake, Amazon, Google and others who want to secure a piece of the new market category that Databricks sees emerging.

The company’s performance indicates that it’s onto something.

Growth

Databricks has reached the $600 million annual recurring revenue (ARR) milestone, it disclosed as part of its funding announcement. It closed 2020 at $425 million ARR, to better illustrate how quickly it is growing at scale.

Per the company, its new ARR figure represents 75% growth, measured on a year-over-year basis.

That’s quick for a company of its size; per the Bessemer Cloud Index, top-quartile public software companies are growing at around 44% year over year. Those companies are worth around 22x their forward revenues.

At its new valuation, Databricks is worth 63x its current ARR. So Databricks isn’t cheap, but at its current pace should be able to grow to a size that makes its most recent private valuation easily tenable when it does go public, provided that it doesn’t set a new, higher bar for its future performance by raising again before going public.

Ghodsi declined to share timing around a possible IPO, and it isn’t clear whether the company will pursue a traditional IPO or if it will continue to raise private funds so that it can direct list when it chooses to float. Regardless, Databricks is now sufficiently valuable that it can only exit to one of a handful of mega-cap technology giants or go public.

Why hasn’t the company gone public? Ghodsi is enjoying a rare position in the startup market: He has access to unlimited capital. Databricks had to open another $100 million in its latest round, which was originally set to close at just $1.5 billion. It doesn’t lack for investor interest, allowing its CEO to bring aboard the sort of shareholder he wants for his company’s post-IPO life — while enjoying limited dilution.

This also enables him to hire aggressively, possibly buy some smaller companies to fill in holes in Databricks’ product roadmap, and grow outside of the glare of Wall Street expectations from a position of capital advantage. It’s the startup equivalent of having one’s cake and eating it too.

But staying private longer isn’t without risks. If the larger market for software companies was rapidly devalued, Databricks could find itself too expensive to go public at its final private valuation. However, given the long bull market that we’ve seen in recent years for software shares, and the confidence Ghodsi has in his potential market, that doesn’t seem likely.

There’s still much about Databricks’ financial position that we don’t yet know — its gross margin profile, for example. TechCrunch is also incredibly curious what all its fundraising and ensuing spending have done to near-term Databricks operating cash flow results, as well as how long its gross-margin adjusted CAC payback has evolved since the onset of COVID-19. If we ever get an S-1, we might find out.

For now, winsome private markets are giving Ghodsi and crew space to operate an effectively public company without the annoyances that come with actually being public. Want the same thing for your company? Easy: Just reach $600 million ARR while growing 75% year over year.

#ali-ghodsi, #artificial-intelligence, #cloud, #data-lake, #data-warehouse, #database, #databricks, #enterprise, #fundings-exits, #ml, #startups

The CockroachDB EC-1

Every application is a palimpsest of technologies, each layer forming a base that enables the next layer to function. Web front ends rely on JavaScript and browser DOM, which rely on back-end APIs, which themselves rely on databases.

As one goes deeper down the stack, engineering decisions become ever more conservative — changing the location of a button in a web app is an inconvenience; changing a database engine can radically upend an entire project.

It’s little surprise then that database technologies are among the longest-lasting engineering projects in the modern software developer toolkit. MySQL, which remains one of the most popular database engines in the world, was first released in the mid-1990s, and Oracle Database, launched more than four decades ago, is still widely used in high-performance corporate environments.

Database technology can change the world, but the world in these parts changes very, very slowly. That’s made building a startup in the sector a tough equation: Sales cycles can be painfully slow, even when new features can dramatically expand a developer’s capabilities. Competition is stiff and comes from some of the largest and most entrenched tech companies in the world. Exits have also been few and far between.

That challenge — and opportunity — is what makes studying Cockroach Labs so interesting. The company behind CockroachDB attempts to solve a long-standing problem in large-scale, distributed database architecture: How to make it so that data created in one place on the planet is always available for consumption by applications that are thousands of miles away, immediately and accurately. Making global data always available immediately and accurately might sound like a simple use case, but in reality it’s quite the herculean task. Cockroach Labs’ story is one of an uphill struggle, but one that saw it turn into a next-generation, $2-billion-valued database contender.

The lead writer of this EC-1 is Bob Reselman. Reselman has been writing about the enterprise software market for more than two decades, with a particular emphasis on teaching and educating engineers on technology. The lead editor for this package was Danny Crichton, the assistant editor was Ram Iyer, the copy editor was Richard Dal Porto, figures were designed by Bob Reselman and stylized by Bryce Durbin, and illustrations were drawn by Nigel Sussman.

CockroachDB had no say in the content of this analysis and did not get advance access to it. Reselman has no financial ties to CockroachDB or other conflicts of interest to disclose.

The CockroachDB EC-1 comprises four main articles numbering 9,100 words and a reading time of 37 minutes. Here’s what we’ll be crawling over:

We’re always iterating on the EC-1 format. If you have questions, comments or ideas, please send an email to TechCrunch Managing Editor Danny Crichton at danny@techcrunch.com.

#cockroach-labs, #cockroachdb, #cockroachdb-ec-1, #database, #database-management, #databases, #ec-cloud-and-enterprise-infrastructure, #ec-enterprise-applications, #ec-1, #enterprise, #mysql, #saas, #startups

How engineers fought the CAP theorem in the global war on latency

CockroachDB was intended to be a global database from the beginning. The founders of Cockroach Labs wanted to ensure that data written in one location would be viewable immediately in another location 10,000 miles away. The use case was simple, but the work needed to make it happen was herculean.

The company is betting the farm that it can solve one of the largest challenges for web-scale applications. The approach it’s taking is clever, but it’s a bit complicated, particularly for the non-technical reader. Given its history and engineering talent, the company is in the process of pulling it off and making a big impact on the database market, making it a technology well worth understanding. In short, there’s value in digging into the details.

Using CockroachDB’s multiregion feature to segment data according to geographic proximity fulfills Cockroach Labs’ primary directive: To get data as close to the user as possible.

In part 1 of this EC-1, I provided a general overview and a look at the origins of Cockroach Labs. In this installment, I’m going to cover the technical details of the technology with an eye to the non-technical reader. I’m going to describe the CockroachDB technology through three questions:

  1. What makes reading and writing data over a global geography so hard?
  2. How does CockroachDB address the problem?
  3. What does it all mean for those using CockroachDB?

What makes reading and writing data over a global geography so hard?

Spencer Kimball, CEO and co-founder of Cockroach Labs, describes the situation this way:

There’s lots of other stuff you need to consider when building global applications, particularly around data management. Take, for example, the question and answer website Quora. Let’s say you live in Australia. You have an account and you store the particulars of your Quora user identity on a database partition in Australia.

But when you post a question, you actually don’t want that data to just be posted in Australia. You want that data to be posted everywhere so that all the answers to all the questions are the same for everybody, anywhere. You don’t want to have a situation where you answer a question in Sydney and then you can see it in Hong Kong, but you can’t see it in the EU. When that’s the case, you end up getting different answers depending where you are. That’s a huge problem.

Reading and writing data over a global geography is challenging for pretty much the same reason that it’s faster to get a pizza delivered from across the street than from across the city. The essential constraints of time and space apply. Whether it’s digital data or a pepperoni pizza, the further away you are from the source, the longer stuff takes to get to you.

#cockroach-labs, #cockroachdb, #cockroachdb-ec-1, #data-management, #database, #databases, #ec-cloud-and-enterprise-infrastructure, #ec-enterprise-applications, #ec-1, #enterprise, #relational-database, #saas, #startups

Scaling CockroachDB in the red ocean of relational databases

Most database startups avoid building relational databases, since that market is dominated by a few goliaths. Oracle, MySQL and Microsoft SQL Server have embedded themselves into the technical fabric of large- and medium-size companies going back decades. These established companies have a lot of market share and a lot of money to quash the competition.

So rather than trying to compete in the relational database market, over the past decade, many database startups focused on alternative architectures such as document-centric databases (like MongoDB), key-value stores (like Redis) and graph databases (like Neo4J). But Cockroach Labs went against conventional wisdom with CockroachDB: It intentionally competed in the relational database market with its relational database product.

While it did face an uphill battle to penetrate the market, Cockroach Labs saw a surprising benefit: It didn’t have to invent a market. All it needed to do was grab a share of a market that also happened to be growing rapidly.

Cockroach Labs has a bright future, compelling technology, a lot of money in the bank and has an experienced, technically astute executive team.

In previous parts of this EC-1, I looked at the origins of CockroachDB, presented an in-depth technical description of its product as well as an analysis of the company’s developer relations and cloud service, CockroachCloud. In this final installment, we’ll look at the future of the company, the competitive landscape within the relational database market, its ability to retain talent as it looks toward a potential IPO or acquisition, and the risks it faces.

CockroachDB’s success is not guaranteed. It has to overcome significant hurdles to secure a profitable place for itself among a set of well-established database technologies that are owned by companies with very deep pockets.

It’s not impossible, though. We’ll first look at MongoDB as an example of how a company can break through the barriers for database startups competing with incumbents.

When life gives you Mongos, make MongoDB

Dev Ittycheria, MongoDB CEO, rings the Nasdaq Stock Market Opening Bell. Image Credits: Nasdaq, Inc

MongoDB is a good example of the risks that come with trying to invent a new database market. The company started out as a purely document-centric database at a time when that approach was the exception rather than the rule.

Web developers like document-centric databases because they address a number of common use cases in their work. For example, a document-centric database works well for storing comments to a blog post or a customer’s entire order history and profile.

#aws, #baidu, #cloud, #cloud-computing, #cloud-services, #cockroach-labs, #cockroachdb, #cockroachdb-ec-1, #data-management, #database, #database-management, #ec-cloud-and-enterprise-infrastructure, #ec-enterprise-applications, #ec-1, #enterprise, #google, #mongodb, #mysql, #new-york-city, #nosql, #oracle, #relational-database, #saas, #startups

PlanetScale raises $30M Series B for its database service

PlanetScale, the company behind the open-source Vitess database clustering system for MySQL that was first developed at YouTube, today announced that it has raised a $30 million Series B funding round led by Insight Partners, with participation from a16z and SignalFire. With this, the company has now raised a total of $55 million, according to Crunchbase.

Today’s announcement comes only a few weeks after PlanetScale launched its new hosted database platform, also dubbed PlanetScale. The company had previously offered a hosted version of Vitess, but with this new service, it is going a step further and offering what it calls a “developer-first database” that abstracts away all of the infrastructures to ensure that developers won’t have to think about cloud zones, cluster sizes and other details.

Indeed, PlanetScale CEO and co-founder Jiten Vaidya was quite open about the limitations of this earlier product. “What we had built last year was pretty much hosted Vitess, which was no different than how a lot of cloud providers today give you databases,” he said. “So none of this ease of use, none of this elegance, none of these state-of-the-art experiences that the developers want and expect today, we had built into our product.”

But a few months ago, the company brought on former GitHub VP of Engineering Sam Lambert as its Chief Product Officer. Vaidya noted that Lambert brought a lot of developer empathy to PlanetScale and helped it launch this new product.

“People come to you because they’re not database experts, but they have data, they have problems,” Lambert said. “And too many companies, especially in the database world, do not think about the daily lives of their users like we do. They don’t think about the complete journey of what the user is actually trying to do, which is to provide value to their customers. They’re just very impressed with themselves for storing and retrieving data. And it’s like, yep, we’ve been doing that. We’ve been doing that since the 60s. Can we do something else now?”

The company’s users today include the likes of Slack, Figma, GitHub and Square, so it’s clearly delivering value to a lot of users. As Lambert noted, PlanetScale aims to offer them a product that is simple and easy to use. “Just because it is simple and easy to use, and beautiful, honestly — like just beautiful, well-designed tooling — it doesn’t mean it’s inferior. It doesn’t mean it’s missing anything. It means the others are missing the poetry and the additional elements of beauty that you can add to infrastructure products,” he said.

PlanetScale plans to use the new funding to scale its team globally and accelerate the adoption of its platform. Insight Partners Managing Director Nikhil Sachdev will join the company’s board, with the firm’s Managing Director Praveen Akkiraju also joining as a board observer.

“PlanetScale is setting a new bar for simplicity, performance and scalability for cloud-based databases in the serverless era,” said Sachdev. “The developer experience for databases has been painful for too long. PlanetScale is breaking that chain, solving longstanding problems related to scalability and reliability in an extremely elegant, tasteful, and useful way.”

#andreessen-horowitz, #cloud-computing, #computing, #database, #github, #insight-partners, #mysql, #planetscale, #serverless-computing, #software, #tc, #vitess, #youtube

Hacking my way into analytics: A creative’s journey to design with data

Growing up, did you ever wonder how many chairs you’d have to stack to reach the sky?

No? I guess that’s just me then.

As a child, I always asked a lot of “how many/much” questions. Some were legitimate (“How much is 1 USD in VND?”); some were absurd (“How tall is the sky and can it be measured in chairs?”). So far, I’ve managed to maintain my obnoxious statistical probing habit without making any mortal enemies in my 20s. As it turns out, that habit comes with its perks when working in product.

Growing up, did you ever wonder how many chairs you’d have to stack to reach the sky?

My first job as a product designer was at a small but energetic fintech startup whose engineers also dabbled in pulling data. I constantly bothered them with questions like, “How many exports did we have from that last feature launched?” and “How many admins created at least one rule on this page?” I was curious about quantitative analysis but did not know where to start.

I knew I wasn’t the only one. Even then, there was a growing need for basic data literacy in the tech industry, and it’s only getting more taxing by the year. Words like “data-driven,” “data-informed” and “data-powered” increasingly litter every tech organization’s product briefs. But where does this data come from? Who has access to it? How might I start digging into it myself? How might I leverage this data in my day-to-day design once I get my hands on it?

Data discovery for all: What’s in the way?

“Curiosity is our compass” is one of Kickstarter’s guiding principles. Powered by a desire for knowledge and information, curiosity is the enemy of many larger, older and more structured organizations — whether they admit it or not — because it hinders the production flow. Curiosity makes you pause and take time to explore and validate the “ask.” Asking as many what’s, how’s, why’s, who’s and how many’s as possible is important to help you learn if the work is worth your time.

#analytics, #business-intelligence, #column, #data-analysis, #data-management, #data-tools, #database, #developer, #ec-column, #startups

Meroxa raises $15M Series A for its real-time data platform

Meroxa, a startup that makes it easier for businesses to build the data pipelines to power both their analytics and operational workflows, today announced that it has raised a $15 million Series A funding round led by Drive Capital. Existing investors Root, Amplify and Hustle Fund also participated in this round, which together with the company’s previously undisclosed $4.2 million seed round now brings total funding in the company to $19.2 million.

The promise of Meroxa is that can use a single platform for their various data needs and won’t need a team of experts to build their infrastructure and then manage it. At its core, Meroxa provides a single Software-as-a-Service solution that connects relational databases to data warehouses and then helps businesses operationalize that data.

Image Credits: Meroxa

“The interesting thing is that we are focusing squarely on relational and NoSQL databases into data warehouse,” Meroxa co-founder and CEO DeVaris Brown told me. “Honestly, people come to us as a real-time FiveTran or real-time data warehouse sink. Because, you know, the industry has moved to this [extract, load, transform] format. But the beautiful part about us is, because we do change data capture, we get that granular data as it happens.” And businesses want this very granular data to be reflected inside of their data warehouses, Brown noted, but he also stressed that Meroxa can expose this stream of data as an API endpoint or point it to a Webhook.

The company is able to do this because its core architecture is somewhat different from other data pipeline and integration services that, at first glance, seem to offer a similar solution. Because of this, users can use the service to connect different tools to their data warehouse but also build real-time tools on top of these data streams.

Image Credits: Meroxa

“We aren’t a point-to-point solution,” Meroxa co-founder and CTO Ali Hamidi explained. “When you set up the connection, you aren’t taking data from Postgres and only putting it into Snowflake. What’s really happening is that it’s going into our intermediate stream. Once it’s in that stream, you can then start hanging off connectors and say, ‘Okay, well, I also want to peek into the stream, I want to transfer my data, I want to filter out some things, I want to put it into S3.”

Because of this, users can use the service to connect different tools to their data warehouse but also build real-time tools to utilize the real-time data stream. With this flexibility, Hamidi noted, a lot of the company’s customers start with a pretty standard use case and then quickly expand into other areas as well.

Brown and Hamidi met during their time at Heroku, where Brown was a director of product management and Hamidi a lead software engineer. But while Heroku made it very easy for developers to publish their web apps, there wasn’t anything comparable in the highly fragmented database space. The team acknowledges that there are a lot of tools that aim to solve these data problems, but few of them focus on the user experience.

Image Credits: Meroxa

“When we talk to customers now, it’s still very much an unsolved problem,” Hamidi said. “It seems kind of insane to me that this is such a common thing and there is no ‘oh, of course you use this tool because it addresses all my problems.’ And so the angle that we’re taking is that we see user experience not as a nice-to-have, it’s really an enabler, it is something that enables a software engineer or someone who isn’t a data engineer with 10 years of experience in wrangling Kafka and Postgres and all these things. […] That’s a transformative kind of change.”

It’s worth noting that Meroxa uses a lot of open-source tools but the company has also committed to open-sourcing everything in its data plane as well. “This has multiple wins for us, but one of the biggest incentives is in terms of the customer, we’re really committed to having our agenda aligned. Because if we don’t do well, we don’t serve the customer. If we do a crappy job, they can just keep all of those components and run it themselves,” Hamidi explained.

Today, Meroxa, which the team founded in early 2020, has over 24 employees (and is 100% remote). “I really think we’re building one of the most talented and most inclusive teams possible,” Brown told me. “Inclusion and diversity are very, very high on our radar. Our team is 50% black and brown. Over 40% are women. Our management team is 90% underrepresented. So not only are we building a great product, we’re building a great company, we’re building a great business.”  

#api, #business-intelligence, #cloud, #computing, #data-management, #data-warehouse, #database, #developer, #drive-capital, #enterprise, #heroku, #hustle-fund, #information-technology, #nosql, #product-management, #recent-funding, #software-engineer, #startups, #web-apps

TigerGraph raises $105M Series C for its enterprise graph database

TigerGraph, a well-funded enterprise startup that provides a graph database and analytics platform, today announced that it has raised a $105 million Series C funding round. The round was led by Tiger Global and brings the company’s total funding to over $170 million.

“TigerGraph is leading the paradigm shift in connecting and analyzing data via scalable and native graph technology with pre-connected entities versus the traditional way of joining large tables with rows and columns,” said TigerGraph found and CEO, Yu Xu. “This funding will allow us to expand our offering and bring it to many more markets, enabling more customers to realize the benefits of graph analytics and AI.”

Current TigerGraph customers include the likes of Amgen, Citrix, Intuit, Jaguar Land Rover and UnitedHealth Group. Using a SQL-like query language (GSQL), these customers can use the company’s services to store and quickly query their graph databases. At the core of its offerings is the TigerGraphDB database and analytics platform, but the company also offers a hosted service, TigerGraph Cloud, with pay-as-you-go pricing, hosted either on AWS or Azure. With GraphStudio, the company also offers a graphical UI for creating data models and visually analyzing them.

The promise for the company’s database services is that they can scale to tens of terabytes of data with billions of edges. Its customers use the technology for a wide variety of use cases, including fraud detection, customer 360, IoT, AI, and machine learning.

Like so many other companies in this space, TigerGraph is facing some tailwind thanks to the fact that many enterprises have accelerated their digital transformation projects during the pandemic.

“Over the last 12 months with the COVID-19 pandemic, companies have embraced digital transformation at a faster pace driving an urgent need to find new insights about their customers, products, services, and suppliers,” the company explains in today’s announcement. “Graph technology connects these domains from the relational databases, offering the opportunity to shrink development cycles for data preparation, improve data quality, identify new insights such as similarity patterns to deliver the next best action recommendation.”

#amgen, #analytics, #articles, #artificial-intelligence, #aws, #business-intelligence, #ceo, #citrix, #citrix-systems, #computing, #data, #database, #enterprise, #graph-database, #intuit, #jaguar-land-rover, #machine-learning, #tiger-global

NSO used real people’s location data to pitch its contact-tracing tech, researchers say

Spyware maker NSO Group used real phone location data on thousands of unsuspecting people when it demonstrated its new COVID-19 contact-tracing system to governments and journalists, researchers have concluded.

NSO, a private intelligence company best known for developing and selling governments access to its Pegasus spyware, went on the charm offensive earlier this year to pitch its contact-tracing system, dubbed Fleming, aimed at helping governments track the spread of COVID-19. Fleming is designed to allow governments to feed location data from cell phone companies to visualize and track the spread of the virus. NSO gave several news outlets each a demo of Fleming, which NSO says helps governments make public health decisions “without compromising individual privacy.”

But in May, a security researcher told TechCrunch that he found an exposed database storing thousands of location data points used by NSO to demonstrate how Fleming works — the same demo seen by reporters weeks earlier.

TechCrunch reported the apparent security lapse to NSO, which quickly secured the database, but said that the location data was “not based on real and genuine data.”

NSO’s claim that the location data wasn’t real differed from reports in Israeli media, which said NSO had used phone location data obtained from advertising platforms, known as data brokers, to “train” the system. Academic and privacy expert Tehilla Shwartz Altshuler, who was also given a demo of Fleming, said NSO told her that the data was obtained from data brokers, which sell access to vast troves of aggregate location data collected from the apps installed on millions of phones.

TechCrunch asked researchers at Forensic Architecture, an academic unit at Goldsmiths, University of London that studies and examines human rights abuses, to investigate. The researchers published their findings on Wednesday, concluding that the exposed data was likely based on real phone location data.

The researchers said if the data is real, then NSO “violated the privacy” of 32,000 individuals across Rwanda, Israel, Bahrain, Saudi Arabia and the United Arab Emirates — countries that are reportedly customers of NSO’s spyware.

The researchers analyzed a sample of the exposed phone location data by looking for patterns they expected to see with real people’s location data, such as a concentration of people in major cities and by measuring the time it took for individuals to travel from one place to another. The researchers also found spatial irregularities that would be associated with real data, such as star-like patterns that are caused by a phone trying to accurately pinpoint its location when the line of sight to the satellite is obstructed by tall buildings.

“The spatial ‘irregularities’ in our sample — a common signature of real mobile location tracks — further support our assessment that this is real data. Therefore, the dataset is most likely not ‘dummy’ nor computer generated data, but rather reflects the movement of actual individuals, possibly acquired from telecommunications carriers or a third-party source,” the researchers said.

The researchers built maps, graphs, and visualizations to explain their findings, while preserving the anonymity of the individuals whose location data was fed into NSO’s Fleming demo.

Gary Miller, a mobile network security expert and founder of cyber intelligence firm Exigent Media, reviewed some of the datasets and graphs, and concluded it was real phone location data.

Miller said the number of data points increased around population hubs. “If you take a scatter plot of cell phone locations at a given point in time, there will be consistency in the number of points in suburban versus urban locations,” he said. Miller also found evidence of people traveling together, which he said “looked consistent with real phone data.”

He also said that even “anonymized” location data sets can be used to tell a lot about a person, such as where they live and work, and who they visit. “One can learn a lot of details about individuals simply by looking at location movement patterns,” he said.

“If you add up all of the similarities it would be very difficult to conclude that this was not actual mobile network data,” he said.

A timeline of one person’s location data in Bahrain over a three-week period. Researchers say these red lines represent travel that seems plausible within the indicated time. (Image: Forensic Architecture/supplied)

John Scott-Railton, a senior researcher at Citizen Lab, said the data likely originated from phone apps that use a blend of direct GPS data, nearby Wi-Fi networks, and the phone’s in-built sensors to try to improve the quality of the location data. “But it’s never really perfect,” he said. “If you’re looking at advertising data — like the kind that you buy from a data broker — it would look a lot like this.”

Scott-Railton also said that using simulated data for a contact-tracing system would be “counterproductive,” as NSO would “want to train [Fleming] on data that is as real and representative as possible.”

“Based on what I saw, the analysis provided by Forensic Architecture is consistent with the previous statements by Tehilla Shwartz Altshuler,” said Scott-Railton, referring to the academic who said NSO told her that was based on real data.

“The whole situation paints a picture of a spyware company once more being cavalier with sensitive and potentially personal information,” he said.

NSO rejected the researchers’ findings.

“We have not seen the supposed examination and have to question how these conclusions were reached. Nevertheless, we stand by our previous response of May 6, 2020. The demo material was not based on real and genuine data related to infected COVID-19 individuals,” said an unnamed spokesperson. (NSO’s earlier statement made no reference to individuals with COVID-19.)

“As our last statement details, the data used for the demonstrations did not contain any personally identifiable information (PII). And, also as previously stated, this demo was a simulation based on obfuscated data. The Fleming system is a tool that analyzes data provided by end users to help healthcare decision-makers during this global pandemic. NSO does not collect any data for the system, nor does NSO have any access to collected data.”

NSO did not answer our specific questions, including where the data came from and how it was obtained. The company claims on its website that Fleming is “already being operated by countries around the world,” but declined to confirm or deny its government customers when asked.

Contact Us

Got a tip? Contact us securely using SecureDrop. Find out more here.

The Israeli spyware maker’s push into contact tracing has been seen as a way to repair its image, as the company battles a lawsuit in the United States that could see it reveal more about the governments that buy access to its Pegasus spyware.

NSO is currently embroiled in a lawsuit with Facebook-owned WhatsApp, which last year blamed NSO for exploiting an undisclosed vulnerability in WhatsApp to infect some 1,400 phones with Pegasus, including journalists and human rights defenders. NSO says it should be afforded legal immunity because it acts on behalf of governments. But Microsoft, Google, Cisco, and VMware filed an amicus brief this week in support of WhatsApp, and calling on the court to reject NSO’s claim to immunity.

The amicus brief came shortly after Citizen Lab found evidence that dozens of journalists were also targeted with Pegasus spyware by NSO customers, including Saudi Arabia and the United Arab Emirates. NSO disputed the findings.

#covid-19, #database, #espionage, #forensic-architecture, #government, #health, #mobile-phone, #nso-group, #privacy, #security, #simulation, #spyware, #telecommunications

Supabase raises $6M for its open-source Firebase alternative

Supabase, a YC-incubated startup that offers developers an open-source alternative to Google’s Firebase and similar platforms, today announced that it has raised a $6 million funding round led by Coatue, with participation from YC, Mozilla and a group of about 20 angel investors.

Currently, Supabase includes support for PostgreSQL databases and authentication tools, with a storage and serverless solution coming soon. It currently provides all the usual tools for working with databases — and listening to database changes — as well as a web-based UI for managing them. The team is quick to note that while the comparison with Google’s Firebase is inevitable, it is not meant to be a 1-to-1 replacement for it. And unlike Firebase, which uses a NoSQL database, Supabase is using PostgreSQL.

Indeed, the team relies heavily on existing open-source projects and contributes to them where it can. One of Supabase’s full-time employees maintains the PostgREST tool for building APIs on top of the database, for example.

“We’re not trying to build another system,” Supabase co-founder and CEO Paul Copplestone told me. “We just believe that already there are well-trusted, scalable enterprise open-source products out there and they just don’t have this usability component. So actually right now, Supabase is an amalgamation of six tools, soon to be seven. Some of them we built ourselves. If we go to market and can’t find anything that we think is going to be scalable — or really solve the problems — then we’ll build it and we’ll open-source it. But otherwise, we’ll use existing tools.”

Image Credits: Supabase

The traditional route to market for open-source tools is to create a tool and then launch a hosted version — maybe with some additional features — to monetize the work. Supabase took a slightly different route and launched a hosted version right away.

If somebody would want to host the service themselves, the code is available, but running your own PaaS is obviously a major challenge, but that’s also why the team went with this approach. What you get with Firebase, he noted, is that it’s a few clicks to set everything up. Supabase wanted to be able to offer the same kind of experience. “That’s one thing that self-hosting just cannot offer,” he said. “You can’t really get the same wow factor that you can if we offered a hosted platform where you literally [have] one click and then a couple of minutes later, you’ve got everything set up.”

In addition, he also noted that he wanted to make sure the company could support the growing stable of tools it was building and commercializing its tools based on its database services was the easiest way to do so.

Like other Y Combinator startups, Supabase closed its funding round after the accelerator’s demo day in August. The team had considered doing a SAFE round, but it found the right group of institutional investors that offered founder-friendly terms to go ahead with this institutional round instead.

“It’s going to cost us a lot to compete with the generous free tier that Firebase offers,” Copplestone said. “And it’s databases, right? So it’s not like you can just keep them stateless and shut them down if you’re not really using them. [This funding round] gives us a long, generous runway and more importantly, for the developers who come in and build on top of us, [they can] take as long as they want and then start monetizing later on themselves.

The company plans to use the new funding to continue to invest in its various tools and hire to support its growth.

Supabase’s value proposition of building in a weekend and scaling so quickly hit home immediately,” said Caryn Marooney, general partner at Coatue and Facebook’s former VP of Global Communications. “We are proud to work with this team, and we are excited by their laser focus on developers and their commitment to speed and reliability.”

#caryn-marooney, #cloud-computing, #coatue, #computing, #database, #developer, #firebase, #google-cloud, #nosql, #platform-as-a-service, #postgresql, #recent-funding, #serverless-computing, #startups, #supabase, #tc

Data breach at New York Sports Clubs owner exposed customer data

Town Sports International, the parent company of New York Sports Clubs and Christi’s Fitness gyms, is mopping up after a security lapse exposed customer data.

Security researcher Bob Diachenko received a tip from a contact, Sami Toivonen, about an unprotected server containing almost a terabyte of spreadsheets representing years of internal company data, including financial records and personal customer records. But because there was no password on the server, anyone could access the files inside.

The server was exposed for almost a year, Diachenko told TechCrunch.

Town Sports pulled the server offline a short time after Diachenko contacted the company. He shared his findings exclusively with TechCrunch, which independently verified the authenticity of the data by confirming details found in the spreadsheets with customers.

Spreadsheets found on the server contained customer names, postal addresses, email addresses, and phone numbers. The data also contained when a customer checks-in and at which gym location. Some also had notes on customer accounts, such as complaints and when customers were past due on a missed membership payment.

Chief executive Patrick Walsh did not respond to several requests for comment, which also asked if the company planned to inform customers of the security lapse.

Town Sports was forced to shutter its 185 gyms on the U.S. east coast after COVID-19 was declared a pandemic in mid-March. By the end of March, the company told financial regulators it had about 588,000 members.

One of the spreadsheets found on the exposed server showed that Town Sports had just 7,100 paying customers by mid-May, while 566,000 customers had their gym memberships frozen.

Town Sports began freezing accounts and refunding membership fees after the company continued to charge customers even after the lockdown began, a move that drew a threat of legal action from New York attorney general Letitia James, who accused the gym chain of “ripping off” its members.

The same spreadsheet still had customer data on some 665,000 cancelled accounts.

Earlier this month the gym chain filed for bankruptcy, just as states began allowing gyms to reopen, albeit with reduced capacity and safety measures in place.

#attorney-general, #data-breach, #database, #letitia-james, #new-york, #security, #spreadsheet, #united-states

Microsoft brings data services to its Arc multi-cloud management service

Microsoft today launched a major update to its Arc multi-cloud service that allows Azure customers to run and manage workloads across clouds — including those of Microsoft’s competitors — and their on on-premises data centers. First announced at Microsoft Ignite in 2019, Arc was always meant to not just help users manage their servers but to also allow them to run data services like Azure SQL and Azure Database for PostgreSQL close to where their data sits.

Today, the company is making good on this promise with the preview launch of Azure Arc enabled data services with support for, as expected, Azure SQL and Azure Database for PostgreSQL.

In addition, Microsoft is making the core feature of Arc, Arc enabled servers, generally available. These are the tools at the core of the service that allow enterprises can use the standard Azure Portal to manage and monitor their Windows and Linux servers across their multi-cloud and edge environments.

Image Credits: Microsoft

“We’ve always known that enterprises are looking to unlock the agility of the cloud — they love the app model, they love the business model — while balancing a need to maintain certain applications and workloads on premises,” Rohan Kumar, Microsoft’s corporate VP for Azure Data said. “A lot of customers actually have a multi-cloud strategy. In some cases, they need to keep the data specifically for regulatory compliance. And in many cases, they want to maximize their existing investments. They’ve spent a lot of CapEx.”

As Kumar stressed, Microsoft wants to meet customers where they are, without forcing them to adopt a container architecture, for example, or replace their specialized engineered appliances to use Arc.

“Hybrid is really [about] providing that flexible choice to our customers, meeting them where they are, and not prescribing a solution,” he said.

He admitted that this approach makes engineering the solution more difficult, but the team decided that the baseline should be a container endpoint and nothing more. And for the most part, Microsoft packaged up the tools its own engineers were already using to run Azure services on the company’s own infrastructure to manage these services in a multi-cloud environment.

“In hindsight, it was a little challenging at the beginning, because, you can imagine, when we initially built them, we didn’t imagine that we’ll be packaging them like this. But it’s a very modern design point,” Kumar said. But the result is that supporting customers is now relatively easy because it’s so similar to what the team does in Azure, too.

Kumar noted that one of the selling points for the Azure Data Services is also that the version of Azure SQL is essentially evergreen, allowing them to stop worrying about SQL Server licensing and end-of-life support questions.

#arc, #azure-arc, #cloud, #cloud-computing, #cloud-infrastructure, #computing, #database, #enterprise, #microsoft, #microsoft-ignite-2020, #microsoft-azure, #serverless-computing, #sql, #tc

Cloudera pulls sensitive files from its ‘open by design’ cloud servers

Enterprise cloud giant Cloudera has pulled several of its cloud storage servers offline, despite initially claiming the servers were “open by design,” after a security researcher found sensitive internal files inside.

Chris Vickery, director of risk research at security firm UpGuard, found the cloud storage servers — known as buckets — hosted on Amazon Web Services in late July. The data largely contained legacy Hortonworks data from prior to its $5.2 billion all-stock merger with Cloudera in January 2019.

When reached, Cloudera spokesperson Madge Miller told TechCrunch that the buckets were supposed to be open and contained files and code that were open to its customers, users, and the wider community. The company said, however, that it identified three files that contained confidential information and were removed from the buckets.

But soon after, the company reversed its position and pulled the buckets offline altogether.

Vickery, who shared his findings exclusively with TechCrunch, said that although the vast majority of files in the cloud buckets were for public and community consumption, he also found files containing credentials, account access tokens, passwords and other secrets for Cloudera’s internal Jenkins system, which the company uses for building and testing its software projects. The buckets also contained entire SQL databases for its internal build databases, Vickery said.

A “secrets” file containing passwords and credentials for Cloudera’s internal systems. (Image: UpGuard/supplied

Later, Cloudera confirmed the security lapse in a later email to TechCrunch.

“Thanks to the questions from the security researcher, we did a deep dive and found some credentials and SQL dumps in the public buckets which should not have been placed there. The credentials were for our internal Jenkins build process and the SQL dumps were of our build database,” the spokesperson said.

“We have since removed this information from the public buckets and taken further remediation steps by changing credentials and rotating keys. We also concluded we could close access to a few unused publicly accessible buckets.”

The company said that the sensitive data, since removed, did not contain any customer data or any other personally identifiable information.

In all, the security lapse could have been worse — even if the incident could have been avoided altogether.

But Vickery said the incident was important to disclose as it reveals the inherent risk in using overwhelmingly large cloud storage containers. In other words, the buckets were so big and had so many files that it becomes nearly impossible to notice when something sensitive is added to the bucket by mistake.

“When that many directories and files of varying format are all stashed away together, it becomes all too easy for something to be mistakenly put among them and remain unnoticed, as is what appears to have happened here,” wrote Vickery.

#cloud, #cloud-computing, #cloud-infrastructure, #cloudera, #computing, #database, #hortonworks, #information, #information-technology, #jenkins, #security, #spokesperson, #sql, #upguard, #web-services

COVID-19 hospital data is a hot mess after feds take control

Members of the medical staff treat a patient in the COVID-19 intensive care unit at the United Memorial Medical Center on July 28, 2020 in Houston, Texas. COVID-19 cases and hospitalizations have spiked since Texas reopened, pushing intensive-care units to full capacity and sparking concerns about a surge in fatalities as the virus spreads.

Enlarge / Members of the medical staff treat a patient in the COVID-19 intensive care unit at the United Memorial Medical Center on July 28, 2020 in Houston, Texas. COVID-19 cases and hospitalizations have spiked since Texas reopened, pushing intensive-care units to full capacity and sparking concerns about a surge in fatalities as the virus spreads. (credit: Getty | Go Nakamura)

As COVID-19 hospitalizations in the US approach the highest levels seen in the pandemic so far, national efforts to track patients and hospital resources remain in shambles after the federal government abruptly seized control of data collection earlier this month.

The Trump administration issued a directive to hospitals and states July 10, instructing them to stop submitting their daily COVID-19 hospital data to the US Centers for Disease Control and Prevention—which has historically handled such public health data—and instead submit it to a new database in the hands of the Department of Health and Human Services. The change was ostensibly made to streamline federal data collection, which is critical for assessing the state of the pandemic and distributing needed resources, such as personal protective equipment and remdesivir, an antiviral drug shown to shorten COVID-19 recovery times.

Watchdogs and public health experts were immediately aghast by the switch to the HHS database, fearing the data would be manipulated for political reasons or hidden from public view all together. However, the real threat so far has been the administrative chaos. The switch took effect July 15, giving hospitals and states just days to adjust to the new data collection and submission process.

Read 13 remaining paragraphs | Comments

#cdc, #covid-19, #data, #database, #healthcare-it, #hhs, #hospitalization, #infectious-disease, #it, #pandemic, #public-health, #sars-cov-2, #science

Oracle’s BlueKai tracks you across the web. That data spilled online

Have you ever wondered why online ads appear for things that you were just thinking about?

There’s no big conspiracy. Ad tech can be creepily accurate.

Tech giant Oracle is one of a few companies in Silicon Valley that has near-perfected the art of tracking people across the internet. The company has spent a decade and billions of dollars buying startups to build its very own panopticon of users’ web browsing data.

One of those startups, BlueKai, which Oracle bought for a little over $400 million in 2014, is barely known outside marketing circles, but it amassed one of the largest banks of web tracking data outside of the federal government.

BlueKai uses website cookies and other tracking tech to follow you around the web. By knowing which websites you visit and which emails you open, marketers can use this vast amount of tracking data to infer as much about you as possible — your income, education, political views, and interests to name a few — in order to target you with ads that should match your apparent tastes. If you click, the advertisers make money.

But for a time, that web tracking data was spilling out onto the open internet because a server was left unsecured and without a password, exposing billions of records for anyone to find.

Security researcher Anurag Sen found the database and reported his finding to Oracle through an intermediary — Roi Carthy, chief executive at cybersecurity firm Hudson Rock and former TechCrunch reporter.

TechCrunch reviewed the data and found names, home addresses, email addresses and other identifiable data in the database. The data also revealed sensitive users’ web browsing activity — from purchases to newsletter unsubscribes.

“There’s really no telling how revealing some of this data can be,” said Bennett Cyphers, a staff technologist at the Electronic Frontier Foundation, told TechCrunch.

“Oracle is aware of the report made by Roi Carthy of Hudson Rock related to certain BlueKai records potentially exposed on the Internet,” said Oracle spokesperson Deborah Hellinger. “While the initial information provided by the researcher did not contain enough information to identify an affected system, Oracle’s investigation has subsequently determined that two companies did not properly configure their services. Oracle has taken additional measures to avoid a reoccurrence of this issue.”

Oracle did not name the companies or say what those additional measures were, and declined to answer our questions or comment further.

But the sheer size of the exposed database makes this one of the largest security lapses this year.

The more it knows

BlueKai relies on vacuuming up a never-ending supply of data from a variety of sources to understand trends to deliver the most precise ads to a person’s interests.

Marketers can either tap into Oracle’s enormous bank of data, which it pulls in from credit agencies, analytics firms, and other sources of consumer data including billions of daily location data points, in order to target their ads. Or marketers can upload their own data obtained directly from consumers, such as the information you hand over when you register an account on a website or when you sign up for a company’s newsletter.

But BlueKai also uses more covert tactics like allowing websites to embed invisible pixel-sized images to collect information about you as soon as you open the page — hardware, operating system, browser and any information about the network connection.

This data — known as a web browser’s “user agent” — may not seem sensitive, but when fused together it can create a unique “fingerprint” of a person’s device, which can be used to track that person as they browse the internet.

BlueKai can also tie your mobile web browsing habits to your desktop activity, allowing it to follow you across the internet no matter which device you use.

Say a marketer wants to run a campaign trying to sell a new car model. In BlueKai’s case, it already has a category of “car enthusiasts” — and many other, more specific categories — that the marketer can use to target with ads. Anyone who’s visited a car maker’s website or a blog that includes a BlueKai tracking pixel might be categorized as a “car enthusiast.” Over time that person will be siloed into different categories under a profile that learns as much about you to target you with those ads.

(Sources: DaVooda, Filborg/Getty Images; Oracle BlueKai)

The technology is far from perfect. Harvard Business Review found earlier this year that the information collected by data brokers, such as Oracle, can vary wildly in quality.

But some of these platforms have proven alarmingly accurate.

In 2012, Target mailed maternity coupons to a high school student after an in-house analytics system figured out she was pregnant — before she had even told her parents — because of the data it collected from her web browsing.

Some might argue that’s precisely what these systems are designed to do.

Jonathan Mayer, a science professor at Princeton University, told TechCrunch that BlueKai is one of the leading systems for linking data.

“If you have the browser send an email address and a tracking cookie at the same time, that’s what you need to build that link,” he said.

The end goal: the more BlueKai collects, the more it can infer about you, making it easier to target you with ads that might entice you to that magic money-making click.

But marketers can’t just log in to BlueKai and download reams of personal information from its servers, one marketing professional told TechCrunch. The data is sanitized and masked so that marketers never see names, addresses or any other personal data.

As Mayer explained: BlueKai collects personal data; it doesn’t share it with marketers.

‘No telling how revealing’

Behind the scenes, BlueKai continuously ingests and matches as much raw personal data as it can against each person’s profile, constantly enriching that profile data to make sure it’s up to date and relevant.

But it was that raw data spilling out of the exposed database.

TechCrunch found records containing details of private purchases. One record detailed how a German man, whose name we’re withholding, used a prepaid debit card to place a €10 bet on an esports betting site on April 19. The record also contained the man’s address, phone number and email address.

Another record revealed how one of the largest investment holding companies in Turkey used BlueKai to track users on its website. The record detailed how one person, who lives in Istanbul, ordered $899 worth of furniture online from a homeware store. We know because the record contained all of these details, including the buyer’s name, email address and the direct web address for the buyer’s order, no login needed.

We also reviewed a record detailing how one person unsubscribed from an email newsletter run by an electronics consumer, sent to his iCloud address. The record showed that the person may have been interested in a specific model of car dash-cam. We can even tell based on his user agent that his iPhone was out of date and needed a software update.

The more BlueKai collects, the more it can infer about you, making it easier to target you with ads that might entice you to that magic money-making click.

The data went back for months, according to Sen, who discovered the database. Some logs dated back to August 2019, he said.

“Fine-grained records of people’s web-browsing habits can reveal hobbies, political affiliation, income bracket, health conditions, sexual preferences, and — as evident here — gambling habits,” said the EFF’s Cyphers. “As we live more of our lives online, this kind of data accounts for a larger and larger portion of how we spend our time.”

Oracle declined to say if it informed those whose data was exposed about the security lapse. The company also declined to say if it had warned U.S. or international regulators of the incident.

Under California state law, companies like Oracle are required to publicly disclose data security incidents, but Oracle has not to date declared the lapse. When reached, a spokesperson for California’s attorney general’s office declined to say if Oracle had informed the office of the incident.

Under Europe’s General Data Protection Regulation, companies can face fines of up to 4% of their global annual turnover for flouting data protection and disclosure rules.

Trackers, trackers everywhere

BlueKai is everywhere — even when you can’t see it.

One estimate says BlueKai tracks over 1% of all web traffic — an unfathomable amount of daily data collection — and tracks some of the world’s biggest websites: Amazon, ESPN, Forbes, Glassdoor, Healthline, Levi’s, MSN.com, Rotten Tomatoes, and The New York Times. Even this very article has a BlueKai tracker because our parent company, Verizon Media, is a BlueKai partner.

But BlueKai is not alone. Nearly every website you visit contains some form of invisible tracking code that watches you as you traverse the internet.

As invasive as it is that invisible trackers are feeding your web browsing data to a gigantic database in the cloud, it’s that very same data that has kept the internet largely free for so long.

To stay free, websites use advertising to generate revenue. The more targeted the advertising, the better the revenue is supposed to be.

While the majority of web users are not naive enough to think that internet tracking does not exist, few outside marketing circles understand how much data is collected and what is done with it.

Take the Equifax data breach in 2017, which brought scathing criticism from lawmakers after it collected millions of consumers’ data without their explicit consent. Equifax, like BlueKai, relies on consumers skipping over the lengthy privacy policies that govern how websites track them.

In any case, consumers have little choice but to accept the terms. Be tracked or leave the site. That’s the trade-off with a free internet.

But there are dangers with collecting web-tracking data on millions of people.

“Whenever databases like this exist, there’s always a risk the data will end up in the wrong hands and in a position to hurt someone,” said Cyphers.

Cyphers said the data, if in the hands of someone malicious, could contribute to identity theft, phishing or stalking.

“It also makes a valuable target for law enforcement and government agencies who want to piggyback on the data gathering that Oracle already does,” he said.

Even when the data stays where it’s intended, Cyphers said these vast databases enable “manipulative advertising for things like political issues or exploitative services, and it allows marketers to tailor their messages to specific vulnerable populations,” he said.

“Everyone has different things they want to keep private, and different people they want to keep them private from,” said Cyphers. “When companies collect raw web browsing or purchase data, thousands of little details about real people’s lives get scooped up along the way.”

“Each one of those little details has the potential to put somebody at risk,” he said.


Send tips securely over Signal and WhatsApp to +1 646-755-8849.

#amazon, #attorney-general, #bluekai, #data-collection, #database, #equifax, #general-data-protection-regulation, #iphone, #online-ads, #open-internet, #privacy, #security, #united-states, #web-traffic

VESoft raises $8M to meet China’s growing need for graph databases

Sherman Ye founded VESoft in 2018 when he saw a growing demand for graph databases in China. Its predecessors like Neo4j and TigerGraph had already been growing aggressively in the West for a few years, while China was just getting to know the technology that leverages graph structures to store data sets and depict their relationships, such as those used for social media analysis, e-commerce recommendations, and financial risk management.

VESoft is ready for further growth after closing an $8 million funding round led by Redpoint China Ventures, an investment firm launched by Silicon Valley-based Redpoint Ventures in 2005. Existing investor Matrix Partners China also participated in the Series pre-A round. The new capital will allow the startup to develop products and expand to markets in North America, Europe, and other parts of Asia.

The 30-people team is comprised of former employees from Alibaba, Facebook, Huawei, and IBM. It’s based in Hangzhou, a scenic city known for its rich history and housing Alibaba and its financial affiliate Ant Financial, where Ye previously worked as a senior engineer after his four-year stint with Facebook in California. From 2017 to 2018, the entrepreneur noticed that Ant Financial’s customers were increasingly interested in adopting graph databases as an alternative to relational databases, a model that had been popular since the 80s and normally organizes data into tables.

“While relational databases are capable of achieving many functions carried out by graph databases… they deteriorate in performance as the quantity of data grows,” Yu told TechCrunch during an interview. “We didn’t use to have so much data.”

Information explosion is one reason why Chinese companies are turning to graph databases, which can handle millions of transactions to discover patterns within scattered data. The technology’s rise is also a response to new forms of online businesses that depend more on relationships.

“Take recommendations for example. The old model recommends content based purely on user profiles, but the problem of relying on personal browsing history is it fails to recommend new things. That was fine for a long time as the Chinese [internet] market was big enough to accommodate many players. But as the industry becomes saturated and crowded… companies need to ponder how to retain existing users, lengthen their time spent, and win users from rivals.”

The key lies in serving people content and products they find appealing. Graph databases come in handy, suggested Yu, when services try to predict users’ interest or behavior as the model uncovers what their friends or people within their social circles like. “That’s a lot more effective than feeding them what’s trending.”

Neo4j compares relational and graph databases (Link)

The company has made its software open source, which the founder believed can help cultivate a community of graph database users and educate the market in China. It will also allow VESoft to reach more engineers in the English-speaking world who are well-acquainted with the open-source culture.

“There is no such thing as being ‘international’ or ‘domestic’ for a technology-driven company. There are no boundaries between countries in the open-source world,” reckoned Yu.

When it comes to generating income, the startup plans to launch a paid version for enterprises, which will come with customized plug-ins and host services.

The Nebula Graph, the brand of VESoft’s database product, is now serving 20 enterprise clients from areas across social media, e-commerce, and finance including big names like food delivery giant Meituan, popular social commerce app Xiaohongshu, and e-commerce leader JD.com. A number of overseas companies are also trialing Nebula.

The time is ripe for enterprise-facing startups with a technological moat in China as the market for consumers has been divided by incumbents like Tencent and Alibaba. This makes fundraising relatively easy for VESoft. The founder is confident that Chinese companies are rapidly catching up with their Western counterparts in the space, for the gargantuan amount of data and the myriad of ways data is used in the country “will propel the technology forward.”

#ant-financial, #asia, #china, #data-management, #database, #databases, #enterprise, #graph-database, #graph-databases, #hangzhou, #matrix-partners-china, #neo4j, #nosql, #open-source-software, #redpoint-ventures

Database security startup jSonar raises $50M from Goldman Sachs

Database security startup jSonar has secured $50 million in new funding from Goldman Sachs .

The company, with offices in Waltham, Mass. and Vancouver, B.C., will use the new funding to grow its database security offering. In return, Goldman Sachs executive David Campbell will join jSonar’s board.

jSonar’s database security platform can monitor every one of its customers’ databases at a glance, regardless of whether they’re stored on-premise or in the cloud, and supports every popular database platform. The platform is designed to make it easy for staff to monitor any database across their infrastructure with meaningful insights, without having to use multiple tools across different networks and databases.

The company is clearly doing something right: the platform is popular with some of the biggest banks, financial institutions, insurance firms and healthcare organizations, a huge driver of revenue for the startup.

Ron Bennatan, jSonar’s co-founder and chief technology officer said he was “thrilled” the investment.

“The rapidly shifting enterprise landscape, including cloud adoption, an explosion of database platforms, the pressing need for data security beyond only compliance, and years of frustration over runaway costs, has created a huge opportunity for us to rapidly expand,” he said.

#banks, #database, #executive, #financial-services, #goldman-sachs, #massachusetts, #security, #vancouver

MemSQL raises $50M in debt facility for its real-time database platform

As a number of startups get back into fundraising in earnest, one that is on a growth tear has closed a substantial debt round to hold on to more equity in the company as it inches to being cash-flow positive. MemSQL — the relational, real-time database used by organisations to query and analyse large pools of fast-moving data across cloud, hybrid and on-premise environments (customers include major banks, telecoms carriers, ride sharing giants, and even those building COVID-19 tracing apps) — has secured $50 million in debt, money that CEO Raj Verma says should keep it “well capitalised for the next several years” and puts it on the road to an IPO or potential private equity exit.

The funding is coming from Hercules Capital, which has some $4.3 billion under management and has an interesting history. On the one hand, it’s invested in companies that include Facebook (this was back in 2012, when Facebook was still a startup), but it’s also been in the news because its CEO was one of the high fliers accused in the college cheating scandal of 2019.

MemSQL does not disclose its valuation but Verma confirmed it is now significantly higher than it was at its last equity raise of $30 million in 2018 when it was valued at about $270 million, per data from PitchBook.

Why raise debt rather than equity? The company is already backed by a long list of impressive investors starting with Y Combinator, and including Accel, Data Collective, DST, GV (one of Google-owner Alphabet’s venture capital vehicles), Khosla, IA Ventures, In-Q-Tel (the CIA-linked VC) and many more. Verma said in an interview with TechCrunch that the startup had started to look at this fundraise before the pandemic hit.

It had “multiple options to raise an equity round” from existing and new investors, which quickly produced some eight term sheets. Ultimately, it took the debt route mainly because it didn’t need the capital badly enough to give up equity, and terms “are favourable right now,” making a debt facility the best option. “Our cash burn is in the single digits,” he said, and “we still have independence.”

The company has been on a roll in recent times. It grew 75% last year (note it was 200% in 2018) with cash burn of $8-9 million in that period and now has a annual recurring revenues of $40 million. Customers include three of the world’s biggest banks, which use MemSQL to power all of its algorithmic trading, major telecoms carriers, mapping providers (Verma declined to comment on whether investor Google is a customer), and more. While Verma today declines to talk about specific names, previous named customers have included Uber, Akamai, Pinterest, Dell EMC and Comcast.

And if the current health pandemic has had put a lot of pressure on some companies in the tech world, MemSQL is one of the group that’s been seeing a strong upswing in business.

Verma noted that this is down to multiple reasons. First, its customer base has not had a strong crossover with sectors like travel that have been hit hard by the economic slowdown and push to keep people indoors. Second, its platform has actually proven to be useful precisely in the present moment, with companies now being forced to reckon with legacy architecture and move to hybrid or all-cloud environments just to do business. And others like True Digital are specifically building contact-tracing applications to help address the spread of the novel coronavirus on MemSQL.

The company plays in a well-crowded area that includes big players like Oracle and SAP. Verma said that its tech stands apart from these because of its hybrid architecture and because it can provide speed improvements of some 30x with technology that — as we have noted before — allows users to push millions of events per day into the service while its users can query the records in real time. 

It also helps to have competitive pricing. “We are a favourable alternative,” Verma said.

“This structured investment represents a significant commitment from Hercules and provides an example of the breadth of our platform and our ability to finance growth-orientated, institutionally-backed technology companies at various stages. We are impressed with the work that the MemSQL management team has accomplished operationally and excited to begin our partnership with one of the promising companies in the database market,” said Steve Kuo, senior managing director technology group head for Hercules, in a statement.

#database, #developer, #enterprise, #hercules, #memsql, #recent-funding, #startups, #tc

Security lapse at India’s Jio exposed coronavirus symptom checker results

Since the start of the outbreak, governments and companies have scrambled to develop apps and websites that can help users identify COVID-19 symptoms.

India’s largest cell network Jio, a subsidiary of Reliance, launched its coronavirus self-test symptom checker in late March, just before the Indian government imposed a strict nationwide lockdown to prevent the further spread of the coronavirus. The symptom checker allows anyone to check their symptoms from their phone or Jio’s website to see if they may have become infected with COVID-19.

But a security lapse exposed one of the symptom checker’s core databases to the internet without a password, TechCrunch has found.

Jio’s coronavirus symptom checker. One of its databases exposed users’ responses. (Image: TechCrunch)

Security researcher Anurag Sen found the database on May 1, just after it was first exposed, and informed TechCrunch to notify the company. Jio quickly pulled the system offline after TechCrunch made contact. It’s not known if anyone else accessed the database.

“We have taken immediate action,” said Jio spokesperson Tushar Pania. “The logging server was for monitoring performance of our website, intended for the limited purpose of people doing a self-check to see if they have any COVID-19 symptoms.”

The database contains millions of logs and records starting April 17 through to the time that the database was pulled offline. Although the server contained a running log of website errors and other system messages, it also ingested vast numbers of user-generated self-test data. Each self-test was logged in the database and included a record of who took the test — such as “self” or a relative, their age, and their gender.

The data also included the person’s user agent, a small snippet of information about the user’s browser version and the operating system, often used to load the website properly but can also be used to track a user’s online activity.

The database also contains individual records of those who signed up to create a profile, allowing users to update their symptoms over time. These records contained the answers to each question asked by the symptom checker, including what symptoms they are experiencing, who they have been in contact with, and what health conditions they may have.

Some of the records also contained the user’s precise location, but only if the user allowed the symptom checker access to their browser or phone’s location data.

We’ve posted a redacted portion of one of the records below.

A redacted portion of the exposed database. (Image: TechCrunch)

From one sample of data we obtained, we found thousands of users’ precise geolocation from across India. TechCrunch was able to identify people’s homes using the latitude and longitude records found in the database.

Most of the location data is clustered around major cities, like Mumbai and Pune. TechCrunch also found users in the United Kingdom and North America.

The exposure could not come at a more critical time for the Indian telecoms giant. Last week Facebook invested $5.7 billion for a near-10% stake in Jio’s Platforms, valuing the Reliance subsidiary at about $66 billion.

Jio did not answer our follow-up questions, and the company did not say if it will inform those who used the symptom tracker of the security lapse.

#coronavirus, #database, #digital-media, #government, #india, #north-america, #operating-system, #privacy, #security, #spokesperson, #websites