1. When you have large amounts of data, your appetite for hypotheses tends to get even larger. And if it’s growing faster than the statistical strength of the data, then many of your inferences are likely to be false. They are likely to be white noise…
    f you have many, many columns—and we do in modern databases—you’ll get up into millions and millions of attributes for each person.

    Now, if I start allowing myself to look at all of the combinations of these features—if you live in Beijing, and you ride bike to work, and you work in a certain job, and are a certain age—what’s the probability you will have a certain disease or you will like my advertisement? … for any particular database, I will find some combination of columns that will predict perfectly any outcome, just by chance alone. … So it’s like having billions of monkeys typing. One of them will write Shakespeare.

  2. Where is that data, who is holding it, and what are the terms under which it is being shared with my employer?… Your biosensor data might also turn out be very interesting to law enforcement if you’re accused of a crime…

    What happens to privacy when “wellness” becomes a condition of your employment? After the drug test, here’s your tracker. Don’t take it off.

    “Do I really have a choice?” asks Peppet. “If I am the only person in my office who is not wearing my fitness tracker — doesn’t that start to make me look bad? You end up with stigma if you are not participating

  3. “If you went to bed last night as an industrial company, you’re going to wake up this morning as a software and analytics company,” GE Chairman and CEO Jeff Immelt told 100s of customers and analysts attending the third “Minds + Machines” summit. … Using its Predix technology, GE already captures 50 million data points collected and communicated by 10 million sensors installed on $1 trillion worth of equipment ranging from medical imaging systems to locomotives to jet engines
  4. interaction between objects, between objects and individuals’ devices, between individuals and other objects, and between objects and back-end systems will result in the generation of data flows that can hardly be managed with the classical tools used to ensure the adequate protection of the data subjects’ interests and rights. For instance, unlike other types of content, IoT-pushed data may not be adequately reviewable by the data subject prior to publication, which undeniably generates a risk of lack of control and excessive self-exposure for the user. Also, communication between objects can be triggered automatically as well as by default, without the individual being aware of it. In the absence of the possibility to effectively control how objects interact or to be able to define virtual boundaries by defining active or non-active zones for specific things, it will become extraordinarily difficult to control the generated flow of data. It will be even more difficult to control its subsequent use, and thereby prevent potential function creep.
  5. At times, the Industrial Internet has been lumped alongside the so-called Internet of Things, which usually describes the effort to bestow networked connectivity on, say, your home lighting or thermostat. Yet GE’s industrial effort is more ambitious than that. Immelt’s point in his speech was that GE could no longer just build big machines like locomotives and jet engines and gas turbine power plants—“big iron,” as it’s known within the company. It now had to create a kind of intelligence within the machines, which would collect and parse their data. As he saw it, the marriage of big-data analysis and industrial engineering promised a nearly unimaginable range of improvements. A new GEnx jet engine with a multitude of sensors could spin off an awesome amount of information. GE would in turn help predict, say, when a crucial engine part required repairs. GE would use data from machines like the Evolution to optimize performance to undreamed-of levels
  6. Researchers and clinicians at Brigham and Women’s Hospital, Massachusetts General Hospital, and other Partners institutions are studying how genes, lifestyle, and other factors affect people’s health and contribute to disease. To do this research, we are asking patients at Partners HealthCare hospitals to participate in the Partners HealthCare Biobank (Partners Biobank).

    To join the Partners Biobank you will be asked to provide a small blood sample linked to health information and family history data to be stored in a research sample and data repository. This state-of-the-art biobank will help researchers uncover the links between an individual’s genetics, family history, and environment in the development of disease and in people’s response to medications

  7. Nicholas Tatonetti, an assistant professor of biomedical informatics — an interdisciplinary field that combines computer science and medicine — develops algorithms to trawl medical databases and turn up correlations. For his doctoral thesis, he mined the F.D.A.’s records of adverse drug reactions to identify pairs of medications that seemed to cause problems when taken together. He found an interaction between two very commonly prescribed drugs: The antidepressant paroxetine (marketed as Paxil) and the cholesterol-lowering medication pravastatin were connected to higher blood-sugar levels. Taken individually, the drugs didn’t affect glucose levels. But taken together, the side-effect was impossible to ignore. “Nobody had ever thought to look for it,” Tatonetti says, “and so nobody had ever found it.
  8. she found that the scientific literature had no studies on patients like this to guide her. So she did something unusual: She searched a database of all the lupus patients the hospital had seen over the previous five years, singling out those whose symptoms matched her patient’s, and ran an analysis to see whether they had developed blood clots. “I did some very simple statistics and brought the data to everybody that I had met with that morning,” she says. The change in attitude was striking. “It was very clear, based on the database, that she could be at an increased risk for a clot.”…

    A large, costly and time-consuming clinical trial with proper controls might someday prove Frankovich’s hypothesis correct. But large, costly and time-consuming clinical trials are rarely carried out for uncommon complications of this sort. In the absence of such focused research, doctors and scientists are increasingly dipping into enormous troves of data that already exist — namely the aggregated medical records of thousands or even millions of patients to uncover patterns that might help steer care


    After Frankovich wrote about her experience in The New England Journal of Medicine in 2011, her hospital warned her not to conduct such analyses again until a proper framework for using patient information was in place

  9. Christian Rudder knows that the match-making experiment crossed a crucial ethical line. When the experiment was over, OkCupid sent emails to all the unwitting participants telling them of their correct match percentage. That’s not something you’d do if you really thought the initial lie was harmless or that users wouldn’t care. Notice after the fact is no substitute for informed consent up front — but it concedes the point that the experiment was something ethically different from the day-to-day operations of the site. You don’t write to users to tell them you tested a new font.
  10. General Electric plans to announce Monday that it has created a “data lake” method of analyzing sensor information from industrial machinery in places like railroads, airlines, hospitals and utilities. G.E. has been putting sensors on everything it can for a couple of years, and now it is out to read all that information quickly.

    The company, working with an outfit called Pivotal, said that in the last three months it has looked at information from 3.4 million miles of flights by 24 airlines using G.E. jet engines. G.E. said it figured out things like possible defects 2,000 times as fast as it could before.

  11. We are just asking the question: If we really wanted to be proactive, what would we need to know? You need to know what the fixed, well-running thing should look like.”

    The project won’t be restricted to specific diseases, and it will collect hundreds of different samples using a wide variety of new diagnostic tools. Then Google will use its massive computing power to find patterns, or “biomarkers,” buried in the information. The hope is that these biomarkers can be used by medical researchers to detect any disease a lot earlier.

  12. The larger debate is about what companies can do to their users without asking them first or telling them about it after. I asked Facebook yesterday what the review process was for conducting the study in January 2012, and its response reads a bit tone deaf. The focus is on whether the data use was appropriate rather than on the ethics of emotionally manipulating users to have a crappy day for science.
  13. 23:05 22nd Jan 2014

    Notes: 10

    Reblogged from journo-geekery

    Tags: data


    Via colleague James:

    From John W. Foreman:

    You see, the problem wasn’t “defeating an electronic keypad” at all. The problem was getting inside the room. Dan Aykroyd understood this.

    Thumbs up for the “Sneakers” metaphor.  Love that movie.

  14. What if the company kept the chocolates hidden in opaque containers but prominently displayed dried figs, pistachios and other healthful snacks in glass jars? The results: In the New York office alone, employees consumed 3.1 million fewer calories from M&Ms over seven weeks. That’s a decrease of nine vending machine-size packages of M&Ms for each of the office’s 2,000 employees
  15. the possibility of being present in the moment, it’s not having to think about, “How should I document this or remember this for later?” You sort of had the same thing as when you first got a mobile phone and started trusting it with the phone numbers of your friends — it’s not that you don’t have the phone numbers accessible anymore, it’s that you don’t have to remember them. It’s the same thing when you start using Memoto — you no longer have to think about taking photos of stuff you want to remember.