Tuesday, September 19, 2023

In 2023, digital privacy is, in many ways, a fiction: Data Commons is using AI to make the world’s public data more accessible and helpful

 

Parsons & Viljoen: Valuing Social Data


Welcome, John Scalzi, to the 25 years of blogging club! His wife Krissy threw him a surprise blog birthday party. 🎉❤️


Woman Goes To LAPD About Stolen Credit Card. One Of Them Stole It. “She’s fortunate that the bad cop was stupid enough to carry out his shopping spree in stores with security cameras.”


The Atlantic’s Guide to Privacy

The Atlantic’s Guide to Privacy [read free]: “In 2023, digital privacy is, in many ways, a fiction: Knowingly or not, we are all constantly streaming, beaming, being surveilled, scattering data wherever we go. Companies, governments, and our fellow citizens know more than we could ever imagine about our body, our shopping habits, even our kids. 

The question now isn’t how to protect your privacy altogether—it’s how to make choices that help you draw boundaries around what you most care about. Read on for our simple rules for managing your privacy, and get a list of personalized recommendations.”


Data Commons is using AI to make the world’s public data more accessible and helpful

Google Paper on Data Commons, September 12, 2023: “Publicly available data from open sources (e.g., United States Census Bureau (Census) [1], World Health Organization (WHO) [2], Intergovernmental Panel on Climate Change (IPCC) [3]) are vital resources for policy makers, students and researchers across different disciplines. Combining data from different sources requires the user to reconcile the differences in schemas, formats, assumptions, and more. This data wrangling is time consuming, tedious and needs to be repeated by every user of the data. Our goal with Data Commons (DC) is to help make public data accessible and useful to those who want to understand this data and use it to solve societal challenges and opportunities. We do the data processing and make the processed data widely available via standard schemas and Cloud APIs. Data Commons is a distributed network of sites that publish data in a common schema and interoperate using the Data Commons APIs. Data from different Data Commons can be ‘joined’ easily. The aggregate of these Data Commons can be viewed as a single Knowledge Graph. This Knowledge Graph can then be searched over using Natural Language questions utilizing advances in Large Language Models. This paper describes the architecture of Data Commons, some of the major deployments and highlights directions for future work.”

Data Sources Data in the Data Commons Graph comes from a variety of sources, each of which often includes multiple surveys. Some sources/surveys include a very large number of variables, some of which might not yet have been imported into Data Commons. The sources have been grouped by category and are listed alphabetically within each category.

  1. Agriculture
  2. Biomedical
  3. Crime
  4. Demographics
  5. Economy
  6. Education
  7. Energy
  8. Environment
  9. Health
  10. Housing
  11. We also maintain a list of upcoming data imports