Friday, April 26, 2013

The Road to Big Data Passes Through Informatics

I have written a number of postings over the last year about various aspects of electronic health record (EHR) data, from the transition of the work of informatics from implementation to analytics to the problems that still prevent us from making optimal use of data, such as the difficulties of data entry. One of my themes has been that knowledge will not just fall out of the data; we will need to improve the quality and completeness of data to learn from it. The requirements for getting better data include widespread adherence to data standards, engaging and motivating those who enter data to improving it, making it easier for those individuals to enter quality data, and evolving our healthcare system to valuing this data. If we are not able to meet these challenges with our current data, it is unlikely we will be able to do so when we have "big data," i.e., that which is orders of magnitude larger and more complex beyond what we have now. No field has devoted more thought, research, or evaluation to the challenges of clinical and health data than informatics. Thus, whether it is tackling issues of how to implement systems in complex clinical settings; meeting the needs of clinicians, patients, and others; or how to maximize the quality of data, the road to making the best use of (big or non-big) data must pass through informatics.

An example of the fact that knowledge will not just fall out of the data comes from some research activity I have been involved in over the last couple years, which is the Text Retrieval Conference (TREC) Medical Records Track [1]. As those familiar with the field of information retrieval (IR) know, TREC is an annual "challenge evaluation," sponsored by the National Institute for Standards and Technology (NIST) [2]. Challenge evaluations bring research groups with common interests and use cases together to apply their systems to a common task or set of tasks, using a common data set, and comparing results using agreed-upon metrics (ideally in a scholarly and not an overly competitive forum). TREC operates on a yearly cycle, consisting of 5-7 "tracks" that each represent a specific focus of IR research. TREC began with the straightforward tasks of "ad hoc" retrieval (user entering queries into a search engine seeking relevant documents) and "routing" (user seeking relevant documents from a new stream of documents based knowledge of previous relevant documents). In subsequent years, TREC evolved to its current state of diverse tracks representing newer problems in IR, such as Web search, video searching, question-answering, cross-language retrieval, and user studies. (Some of these tracks have spawned their own challenge evaluations, especially in the area of cross-language evaluation, an important issue in Europe and Asia.) Virtually all tracks have focused on generic content, typically newswire or Web content, with very few being "domain specific," although I have been involved in two domain-specific tracks in the areas of genomics literature [3] and medical records [1].

In TREC and IR jargon, test collections consist of an adequately large and realistic collection of content, such as documents, medical records, Web pages, etc. [2]. Test collections also include a set of topics, usually at least 25-50 for statistical reliability [4], that are instances of the task being studied. A final component is human relevance judgments or assessments over the content items, indicating which are relevant and should be retrieved for each topic. Success is usually measured by some sort of aggregate statistic that combines the base measures of recall (proportion of relevant content items in the test collection retrieved) and precision (proportion of relevant content items in the search retrieved). (For those familiar with medical diagnostic test characteristics, these correspond to sensitivity and positive predictive value. The reciprocal of precision is also sometimes called number needed to retrieve, since it measures how many overall documents must be read or viewed for each relevant one retrieved.)

The use case for the track TREC Medical Records Track was identifying patients from a collection of medical records who might be candidates for clinical studies. This is a real-world task for which automated retrieval systems could greatly aid in ability to carry out clinical research, quality measurement and improvement, or other "secondary uses" of clinical data [4]. The metric used to measure systems employed was inferred normalized distributed cumulative gain (infNDCG), which takes into account some other factors, such as incomplete judgment of all documents retrieval by all research groups.

The data for the track was a corpus of de-identified medical records developed by the University of Pittsburgh Medical Center. Records containing data, text, and ICD-9 codes are grouped by "visits" or patient encounters with the health system. (Due to the de-identification process, it is impossible to know whether one or more visits might emanate from the same patient.) There were 93,551 documents mapped into 17,264 visits.

I was involved in a number of aspects of organizing this track. I contributed in both guiding the task (or use case) as well as leading some of track infrastructure activities, namely development of search topics and relevance assessments. This work has been aided greatly by students with medical and other expertise in the OHSU Biomedical Informatics Graduate Program.

The results of the TREC Medical Records Track provide a good example of why the road to big data passes through informatics, or in other words, why there is still considerable work to be done from an informatics standpoint before knowledge simply falls out of data. While the performance of systems in the track has been good from an IR standpoint, they also show these systems and approaches have a considerable ways to go before we can just turn the data analytics crank and have medical knowledge emanate. The magnitude of how far we need to go comes from the precision at various levels of retrieval (e.g., precision at 10 retrieved, 50 retrieved, 100 retrieved, etc.), demonstrating how many nonrelevant visits are retrieved. In the case of typical ad hoc IR, we can probably quickly dispense with documents are relatively easy to identify as not relevant. But this may be a more difficult task for complex patients and complex records.

A failure analysis over the data from the 2011 track carried out at OHSU demonstrated why there are still many challenges that need to be overcome [5]. This analysis found a number of reasons why visits frequently retrieved were not relevant:
  • Notes contain very similar term confused with topic
  • Topic symptom/condition/procedure done in the past
  • Most, but not all, criteria present
  • All criteria present but not in the time/sequence specified by the topic description
  • Topic terms mentioned as future possibility
  • Topic terms not present--can't determine why record was captured
  • Irrelevant reference in record to topic terms
  • Topic terms denied or ruled out
The analysis also found reasons why visits rarely retrieval were actually relevant:
  • Topic terms present in record but overlooked in search
  • Visit notes used a synonym for topic terms
  • Topic terms not named and must be derived
  • Topic terms present in diagnosis list but not visit notes
A number of research groups used a variety of techniques, such as synonym and query expansion, machine learning algorithms, and matching against ICD-9 codes, but still had results that were not better than manually constructed queries (which also require a form of informatics expertise in knowing how to query the clinical domain). The results data also show this is a challenging task, as the performance of different systems varied widely on different topics.

From my perspective, these results show that successful use of big data will not come just from smart algorithms and fast computer hardware. It will also require the informatics expertise to design and implement EHRs, high-quality and complete clinical data, and a proper understanding of the clinical/health domain to make most effective use of the data. As such, achieving the value of big data passes through informatics.


1. Voorhees, E and Hersh, W (2012). Overview of the TREC 2012 Medical Records Track. The Twenty-First Text REtrieval Conference Proceedings (TREC 2012), Gaithersburg, MD. National Institute for Standards and Technology.

2. Voorhees, EM and Harman, DK, Eds. (2005). TREC: Experiment and Evaluation in Information Retrieval. Cambridge, MA, MIT Press.

3. Hersh, W and Voorhees, E (2009). TREC genomics special issue overview. Information Retrieval. 12: 1-15.

4. Buckley, C and Voorhees, E (2000). Evaluating evaluation measure stability. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece. ACM Press. 33-40.

5. Edinger, T, Cohen, AM, et al. (2012). Barriers to retrieving patient information from electronic health record data: failure analysis from the TREC Medical Records Track. AMIA 2012 Annual Symposium, Chicago, IL, 180-188.

"Call Me Crazy": Lifetime's New Movie That Champions Hope and Resilience Around Mental Illness

*Warning: it was difficult to write this post without including a few small spoilers, but I hope you'll watch the whole film anyway.

On Saturday April 20th, Lifetime debuted "Call Me Crazy: A Five Film".  The film (which boasts a star-studded cast and director list) includes five short stories that examine the impact of mental illness from various perspectives.  Each story is named after the main character: "Lucy", "Grace", "Allison", "Eddie", and "Maggie".

In the first story, we are introduced to Lucy (played by Brittany Snow).  Lucy, a law student, has recently been admitted to a psychiatric institution after experiencing a schizophrenic episode.  She is struggling to see how she can live a "normal" life that includes relationships and a career.  Her clinician encourages her to finish law school because she has insight into something very few people understand (mental illness)- so who knows how many people she could help?

In "Grace", we meet a daughter who has been living with a bipolar mother for her entire life.  Grace is played beautifully by Sarah Hyland from "Modern Family"- I loved seeing her in a dramatic role.  We see the "highs" and "lows" of her mother's condition.  We also see the devastating impact that it has on Grace's life when it is not treated.  Grace often plays the role of caretaker- making sure her mother is safe.  We see her struggle to have her own life aside from her mother's illness.

"Allison" offers the viewers a twist.  She plays Lucy's younger sister.  So we step back from Lucy's view and we see how mental illness has affected her entire family.  Allison's childhood, her sense of safety, her relationship with her parents- were all changed as a result of her sister's illness.  She has bottled up a lot of anger and finds it difficult to support her sister through her recovery.

"Eddie" introduces the only male main character.  He is suffering from severe depression.  He has withdrawn from his wife and his friends.  He has stopped receiving help from his therapist.  We watch his wife intervene after discovering that he may be thinking about suicide.

Finally, "Maggie" introduces topics that (unfortunately) are all too common these days- post traumatic stress disorder (PTSD) and military sexual trauma among our returning veterans.  Maggie (played by Jennifer Hudson) was victimized during her time in the Army and its lasting impacts are threatening her ability to have a healthy relationship with her family.  Here we get another update on Lucy- she is now a lawyer and is representing Maggie in court.

While each story stands on its own, Lucy's story is woven throughout "Allison" and "Maggie" as well.  I really liked this strategy.  Not only because I became invested in her character during the first story...but also because seeing her evolve over time helped to demonstrate some key themes from this film- hope and resilience.

As Lucy says to Maggie: "I am living proof". [Of what?] "That there is hope".  In court, Lucy reminds Maggie's judge that having mental illness does not mean that you are a bad person or a bad mother.  She also reminds him about the importance of social support, "it is nearly impossible to get well alone".  Even though we see all of these characters at their lowest point- there is still hope that they can feel better, have strong relationships, and contribute positively to the world.

It seems fitting that Brittany Snow's character delivers these messages about hope and resilience, as she is a strong advocate for them in real life.  Together with the Jed Foundation and MTV, she founded Love is Louder.  Love is Louder is an inclusive movement that amplifies messages of love and support to combat negative messages resulting from bullying, loneliness, and stigma.  She has also publicly shared her own battles with anorexia, depression, and self harm.

As a health educator, I highly recommend this film as a resource for discussing mental illness, suicide, stigma, social support, and help-seeking.  Since each story is approximately 20 minutes, they can be broken down into segments or watched all together.  This film is a great example of Entertainment Education, which is an area of public health that acknowledges the strong impact that television and movies play in educating the public about health issues.

If you or someone you know is struggling with a mental illness, please reach out:
National Suicide Prevention Lifeline (1-800-273-8255)

Wednesday, April 24, 2013

Food for Health - Flaxseed

Flax can be consumed as whole seed, or in the milled or oil form. Flaxseeds are tiny, brownish, flat seeds which are very nutritious and if included in the daily diet, could help to keep lifestyle diseases such as diabetes and heart disease in check. Flax adds flavour, nutrition, and health benefits to a variety of foods and has a mild, nutty taste.

Flaxseedis a good source of the ‘good fat’

Tuesday, April 16, 2013

Managing Diabetes in Summer

Mrs. Sheela PaulMs. Rohini- Dietitian, MVNES

Summer is here and with the rise in temperature all of us need to take extra precautions to avoid the common heat related conditions that are possible such as heat stroke and de-hydration. For people with diabetes extra precautions also need to be considered as they are also managing an existing condition. They can enjoy all the summer

Monday, April 8, 2013

Biomedical and Health Informatics vs. Data Science, mHealth, etc. - New Disciplines or New Terminology?

When I entered the field of informatics in the 1980s, a great deal of the research was driven by "artificial intelligence" (AI). Many people were trying to build "rule-based expert systems," while those interested in knowledge representation were constructing "semantic networks." We rarely hear these terms in quotes these days, perhaps with the exception of AI that one hears occasionally. It is not, however, that no one is trying to build systems that guide decision-making and represent knowledge in complex ways, but we just different terminology now, such as clinical decision support and ontologies.

Fast forward to the present, and we see the introduction of new terms, most prominently right now data science [1] and mHealth [2]. Many who are doing work in these areas talk of them as the primary focus of their work. I question, however, whether these are truly new disciplines, or just concentrations (at least for those working in health-related areas) within biomedical and health informatics [3]?

I am most concerned about mHealth, when I see new people coming forward with brilliant ideas and truly innovative technologies, yet not incorporating the experiences from decades of work in informatics. I do not deny that some aspects of using mobile connected devices for health are truly novel, yet what I consider to be the basic principles of informatics still apply, namely things like scalability, interoperability, usability, and so forth. I just see nothing novel enough about mHealth to not call it part of informatics.

The same holds, in my opinion, for data science. There are certainly "computationalist" techniques of which many who work in informatics are not skilled. "Big data" applications will require specialized knowledge. But informatics is a broad field, and no one can master everything. There are other aspects of informatics, such as (I am repeating myself from the previous paragraph here) scalability, interoperability, usability, and so forth that must be married from the results of data science to make the latter's output truly usable. One case in point is the growing number of analyses that predict undesired outcomes, such as hospital readmissions [4]. I am as intellectually interested in these applications as much as anyone, but until it is shown these analyses can be actionable, they will mostly remain interesting theoretical exercises.

I am excited for mobile health applications and advanced uses of data techniques to improve health, healthcare, and research. I hope that those pursuing them do not lose sight of the larger picture of providing end-to-end value for the use of data, information, and knowledge in health-related endeavors, i.e., the goal of biomedical and health informatics [3].


1. Davenport, TH and Patil, DJ (2012). Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, October, 2012.
2. Krohn, R and Metcalf, D (2012). mHealth: From Smartphones to Smart Systems. Chicago, IL, Healthcare Information Management Systems Society.
3. Hersh, W (2009). A stimulus to define informatics and health information technology. BMC Medical Informatics & Decision Making. 9: 24.
4. Gildersleeve, R and Cooper, P (2013). Development of an automated, real time surveillance tool for predicting readmissions at a community hospital. Applied Clinical Informatics. 4: 153-169.

Wednesday, April 3, 2013




Music By :  Annu Malik
Lyricist : Anjaan

Music On : EMI 





1. Apno Men Main Begaanaa (part-2) 

  Singer/s : Kishore Kumar  
 Duration : 03:00 mins 

2. Dear Sir Aap Ko 
 Singer/s :Kishore Kumar,Asha Bhosle                  
  Duration : 08:12 mins 

3. Jigar Tham Lo  
Singer/s : Kishore Kumar                   
  Duration : 06:51 mins

4. O Dil Jaani                
Singer/s :  Asha Bhosle                 
Duration : 05:22 mins 

5Waqt Ke Saath                
Singer/s : Mohd. Aziz,Asha Bhosle                  
Duration : 07:38 mins


6.  Apno Mein Main Begaanaa (part 1)              
Singer/s :  Kishore Kumar                   
Duration : 05:37 mins


 DISCLAIMER: This blog promotes the appreciation of vinyl records in an encoded audio format called MP3 and hereby disclaims any violations of copyright law. The author of this blog does not engage in buying and/or selling songs in MP3 or any other format on this blog. Visitors of this blog are encouraged first and foremost to buy original records (to maintain the posterity of the vinyl record) and secondly, audio CDs. The music that is available here is meant for promotional and appreciation purposes only.