Scraped Parler data is a metadata goldmine

Embattled social media platform Parler is offline after Apple, Google and Amazon pulled the plug on the site after the violent riot at the U.S. Capitol last week that left five people dead.

But while the site is gone (for now), millions of posts published to the site since the riot are not.

A lone hacker scraped millions of posts, videos and photos published to the site after the riot but before the site went offline on Monday, preserving a huge trove of potential evidence for law enforcement investigating the attempted insurrection, many of which allegedly used the platform to plan and coordinate the breach of the Capitol.

The hacker and internet archivist, who goes by the online handle @donk_enby, scraped the social network and uploaded copies to the Internet Archive, which hosts old and historical versions of web pages.

In a tweet, @donk_enby said she scraped data from Parler that included deleted and private posts, and the videos contained “all associated metadata.”

Metadata is information about a file — such as when it was made and on what device. This information is usually embedded in the file itself. The scraped videos from Parler appear to also include the precise location data of where the videos were taken. That metadata could be a goldmine of evidence for authorities investigating the Capitol riot, which may tie some rioters to their Parler accounts or help police to unmask rioters based on their location data.

Most web services remove metadata when you upload your photos and videos, but Parler apparently wasn’t.

Parler quickly became the social network of choice after President Trump was deplatformed from Twitter and Facebook for inciting the riot on January 6. But the tech giants said Parler violated their rules by not having a content moderation policy – which is what drew many users to the site.

Many of the posts made calls to “burn down [Washington] D.C.,” while others called for violence and the execution of Vice President Mike Pence.

Already several rioters have been arrested and charged with breaking into the Capitol building. Many of the rioters weren’t wearing masks (the pandemic notwithstanding), making it easier for them to be identified. But thanks to Parler’s own security blunder, many more could soon face an unwelcome knock at the door.

#amazon, #computing, #internet-archive, #law-enforcement, #microblogging, #operating-systems, #parler, #president, #real-time-web, #security, #social-network, #software, #trump, #vice-president, #washington, #web-services


How I Blew My Bitcoin on Sushi

In 2013, this reporter spent 10 Bitcoin, worth $1,000 at the time, on a dinner for dozens of strangers in San Francisco. The owner of the restaurant wisely held onto it.

#bitcoin-currency, #internet-archive, #personal-finances, #virtual-currency


New Spirits Rise in Old, Repurposed Churches

For years, attendance rates have dropped and congregations have closed nationwide. But many reused religious spaces are still sanctuaries.

#accidents-and-safety, #alexandria-va, #appalachian-region, #archives-and-records, #arkansas, #art, #audible-com, #bars-and-nightclubs, #bentonville-ark, #caesar-julius, #christian-science, #christians-and-christianity, #clapton-eric, #computers-and-the-internet, #coronavirus-2019-ncov, #duke-university, #historic-buildings-and-sites, #internet-archive, #jesus-christ, #kahle-brewster, #louisiana, #methodist-church, #monae-janelle, #nelson-willie, #new-orleans-la, #newark-nj, #ohio, #quarantine-life-and-culture, #rensselaer-polytechnic-institute, #restaurants, #restoration-and-renovation, #roman-catholic-church, #san-francisco-calif, #stained-glass, #states-us, #troy-ny, #united-methodist-church, #wayback-machine, #west-virginia, #wines


Wayback Machine and Cloudflare team up to archive more of the Web

Screenshot of the Internet Archive's home page, describing the site as

Enlarge / Screenshot of the Internet Archive’s home page, including the WayBack Machine’s search box. (credit: Internet Archive)

The Internet Archive and Cloudflare have teamed up to archive the content of websites that use Cloudflare’s Always Online service, increasing the odds that users will be able to view a recent version of a website during outages. The partnership will increase the number of webpages scanned by the Internet Archive, making the organization’s Wayback Machine more useful to Internet users in general.

“Websites that enable Cloudflare’s Always Online service will now have their content automatically archived, and if by chance the original host is not available to Cloudflare, then the Internet Archive will step in to make sure the pages get through to users,” said an announcement by Mark Graham, director of the Internet Archive’s Wayback Machine.

Cloudflare says its Always Online feature saves “a limited copy of your cached website to keep it online for your visitors” when the origin server is unavailable, ensuring that a website’s “most popular pages are represented.” Using the Wayback Machine will improve the Always Online service, Cloudflare CEO Matthew Prince said.

Read 7 remaining paragraphs | Comments

#biz-it, #cloudflare, #internet-archive, #wayback-machine


Internet Archive ends free e-book program, following publisher suit

The National Emergency Library is one of those well-intentioned ideas that was destined to get pushback. The brainchild of Internet Archive, the platform made north of 1.3 million books available for checkout, free of charge. The system was designed to supplement the book needs of educators, as libraries across the country remained closed during the COVID-19 pandemic.

Book publishers, on the other hand, weren’t having it. A consortium of four publishers filed suit against Internet Archive earlier this month. “Despite the ‘Open Library’ moniker, IA’s actions grossly exceed legitimate library services, do violence to the Copyright Act, and constitute willful digital piracy on an industrial scale,” Hachette, HarperCollins, Wiley and Penguin Random House wrote in the New York federal court suit.

The threat alone was enough. Internet Archive announced today that the library will close on June 15 — two weeks ahead of the original June 30 date.

“The complaint attacks the concept of any library owning and lending digital books, challenging the very idea of what a library is in the digital world,” the organization writes. “This lawsuit stands in contrast to some academic publishers who initially expressed concerns about the NEL, but ultimately decided to work with us to provide access to people cut off from their physical schools and libraries. We hope that similar cooperation is possible here, and the publishers call off their costly assault.”

Internet Archive goes on to cite all of the educators who have taken advantage of the offering, along with a librarian who utilized it to provide front-line workers with life-support manuals. In spite of those feel-good stories, however, publishers didn’t budge. Among other things, the companies took issue with IA’s lack of licensing fees and other agreed upon restrictions like traditional libraries.

It’s a longstanding complaint publishers have had against Internet Archive, going so far as accusing the organization of “willful digital piracy on an industrial scale.” Ultimately, it seems that packing up shop a couple of weeks early was the path of least resistance, though IA closes its post with hope for some collaboration going forward, writing, “Let’s build a digital system that works.”

#coronavirus, #covid-19, #e-books, #internet-archive, #lawsuit, #library, #tc


Internet Archive ends “emergency library” early to appease publishers

Internet Archive ends “emergency library” early to appease publishers

Enlarge (credit: Johner Images / Getty)

The Internet Archive has ended its National Emergency Library programs two weeks earlier than originally scheduled, the organization announced in a Wednesday blog post.

“We moved up our schedule because, last Monday, four commercial publishers chose to sue Internet Archive during a global pandemic,” the group wrote. The online library called on publishers to “call off their costly assault.”

But that doesn’t seem very likely. The Internet Archive isn’t ending its online book lending program altogether. Instead, the group is returning to a “controlled digital lending” (CDL) model that it had followed for almost a decade prior to March. Under that model, the group allows only one patron to digitally “check out” a book for each physical copy the library has in stock. If more people want to read a book than are physically available, patrons are added to a waiting list until someone checks the book back in.

Read 5 remaining paragraphs | Comments

#controlled-digital-lending, #internet-archive, #policy


Publishers Sue Internet Archive Over Free E-Books

Penguin Random House, HarperCollins, Hachette and Wiley accused the nonprofit of piracy for making over 1 million books free online.

#archives-and-records, #assn-of-american-publishers, #authors-guild, #book-trade-and-publishing, #books-and-literature, #computers-and-the-internet, #copyrights-and-copyright-violations, #coronavirus-2019-ncov, #hachette-book-group, #harpercollins-publishers, #internet-archive, #john-wileysons, #kahle-brewster, #libraries-and-librarians, #penguin-random-house, #shutdowns-institutional, #suits-and-litigation-civil, #writing-and-writers


Lawsuit over online book lending could bankrupt Internet Archive

A laminated sign which reads

Enlarge / The book drop outside the Spring Township library in Pennsylvania was closed on April 6, 2020. (credit: Ben Hasty/MediaNews Group/Reading Eagle via Getty Images)

Four of the nation’s leading book publishers have sued the Internet Archive, the online library best known for maintaining the Internet Wayback Machine. The Internet Archive makes scanned copies of books—both public domain and under copyright—available to the public on a site called the Open Library.

“Despite the Open Library moniker, IA’s actions grossly exceed legitimate library services, do violence to the Copyright Act, and constitute willful digital piracy on an industrial scale,” write publishers Hachette, HarperCollins, Wiley, and Penguin Random House in their complaint. The lawsuit was filed in New York federal court on Monday.

For almost a decade, the Open Library has offered users the ability to “borrow” scans of in-copyright books via the Internet. Until recently, the service was based on a concept called “controlled digital lending” that mimicked the constraints of a conventional library. The library would only “lend” as many digital copies of a book as it had physical copies in its warehouse. If all copies of a book were “checked out” by other patrons, you’d have to join a waiting list.

Read 23 remaining paragraphs | Comments

#copyright-law, #coronavirus, #fair-use, #internet-archive, #policy, #publishers


University libraries offer online “lending” of scanned in-copyright books

University libraries offer online “lending” of scanned in-copyright books

Enlarge (credit: Sethanan Saengsawang / EyeEm / Getty)

The coronavirus crisis has forced the closure of libraries around the world, depriving the public of access to millions of printed books. Books old enough to be in the public domain may be available for free download online. Many recent books are available to borrow in e-book form. But there are many other books—especially those published in the mid-to-late 20th century—that are hard to access without going to a physical library.

A consortium of university libraries called HathiTrust recently announced a solution to this problem, called the Emergency Temporary Access Service. It allows participating HathiTrust member libraries to offer their patrons digital scans of books that they can “check out” and read online.

HathiTrust has a history of pushing the boundaries of copyright. It was the defendant in a landmark 2014 ruling that established the legality of library book scanning. At the time, HathiTrust was only allowing people with print disabilities to access the full text of scanned books. Now HathiTrust is expanding access to more people—though still with significant limits.

Read 7 remaining paragraphs | Comments

#copyright, #hathitrust, #internet-archive, #policy


Authors fume as online library “lends” unlimited free books

Sign in front of Middle Country public Library in Centereach, New York, on March 26, 2020.

Enlarge / Sign in front of Middle Country public Library in Centereach, New York, on March 26, 2020. (credit: Thomas A. Ferrara/Newsday RM via Getty Images)

For almost a decade, the Internet Archive, an online library best known for its Internet Wayback Machine, has let users “borrow” scanned digital copies of books held in its warehouse. Until recently, users could only check out as many copies as the organization had physical copies. But last week, The Internet Archive announced it was eliminating that restriction, allowing an unlimited number of users to check out a book simultaneously. The Internet Archive calls this the National Emergency Library.

Initial media coverage of the service was strongly positive. The New Yorker declared it a “gift to readers everywhere.” But as word of the new service spread, it triggered a backlash from authors and publishers.

“As a reminder, there is no author bailout, booksellers bailout, or publisher bailout,” author Alexander Chee tweeted on Friday. “The Internet Archive’s ’emergency’ copyrights grab endangers many already in terrible danger.”

Read 10 remaining paragraphs | Comments

#aap, #authors-guild, #controlled-digital-lending, #copyright, #internet-archive, #policy


Internet Archive offers 1.4 million copyrighted books for free online

Yellow-and-black tape covers a hand-sized metal slot.

Enlarge / Hazard tape blocks the book returns slot at a London library. (credit: Richard Baker / Getty Images)

One of the casualties of coronavirus-related social distancing measures has been public libraries, which are shut down in many communities around the world. This week, the Internet Archive, an online library best known for running the Internet’s Wayback Machine, announced a new initiative to expand access to digital books during the pandemic.

For almost a decade, an Internet Archive program called the Open Library has offered people the ability to “check out” digital scans of physical books held in storage by the Internet Archive. Readers can view a scanned book in a browser or download it to an e-reader. Users can only check out a limited number of books at once and are required to “return” them after a limited period of time.

Until this week, the Open Library only allowed people to “check out” as many copies as the library owned. If you wanted to read a book but all copies were already checked out by other patrons, you had to join a waiting list for that book—just like you would at a physical library.

Read 23 remaining paragraphs | Comments

#aap, #authors-guild, #brewster-kahle, #internet-archive, #open-library, #policy