Solving the attribution problem in research

Imagine you have a name like me -Karthik. This is quite common in Tamilnadu and perhaps even in South India. Luckily since we don’t use surnames and rather father’s name as the last name in our state, each name is more likely to be unique. (Unless you have a family history of common names 😀). If I had been in the North and have a surname like Aggarwal or Gupta, it becomes signinficantly more difficulty to identify me as a unique individual even after knowing both first name and last name. In normal circumstances,this wouldn’t be a problem. However when you start publishing, this causes unwanted issues. In database terms, one way to uniquely identify an observation is to use composite fields -the combination of two fields,such as first name and last name. The strength of the composite field depends on the uniqueness of the combination. As I said before, this first name last name combo doesn’t work well in places where the surname is very common. There is perhaps a north south difference even in this.

Why is this a problem?

For individual faculty/researcher

You may have to use your name with title of publication or affiliation to retrieve your publications. This is cumbersome and can lead to under or over counting.

For the institute/ university

It is very hard to improve something we can’t measure. So an institute might want to track the research productivity of its faculty and researchers. One way is to have an aggregate of publications at the level of institute,department and individual. This would be automatically updated and a report can be produced quarterly. This helps us visualise the trends in publication and see when and where we need to buckle up and improve.
All of this requires identifying the publications and correctly attributing them to the respective authors. If there is a problem or error in indetification or attribution, then the whole exercise will be a waste of time.
A software called Researgence uses an approach of searching for all possible combinations of the relevant fields. This isn’t free, but can be used by universities and institutes to track their research output. As you can imagine this is computationally intensive and needs manual verification.
So we need some way to uniquely identify individuals and their contributions.
How can we simplify this process?
By following the same method that is used to uniquely identify inviduals – by assigning a unique id( for example a number or alphanumeric code) to every researcher. That will solve the problem of attribution.
Two services are available which help in this regard. If you are an academic, go over to both of these and sign up. Both are free to use.

  1. ResearcherID
  2. ORCID

From your next publication, you can let the journal known your ResearcherID or ORCID during submission itself. And it won’t matter how common your name is.



The Plain Language Movement & Law

The plain language movement started in both sides of the Atlantic in the 1970s to make law easy to understand. The legal documents were plagued by legalese and were thus inaccessible to the commoner. This problem can be traced back to almost a 1000 years when William, the Duke of Normandy defeated the Anglo-Saxon King Harold in the Battle of Hastings in 1066. As William and his followers spoke a dialect of French, English became the language of the common and lowly folk.
The courts and lawyers soon followed suit. Within a few decades the Legal system had became inscrutable to the common man. With the ascendancy of English came the urge to rid the system of the French and Latin terms and replace them with crisp Anglo Saxon words. The push to make common sense in common language fashionable had a reasonable amount of success.
The legal system and the people benefited a lot from making things simple. Unfortunately, the Plain Language movement only focused on the law, not medicine.

Saving Medicine From Medicalese

Flip(or click) through the pages of any medical journal and you will see how hard our language has become for anyone outside our profession to make sense of. Even among doctors, each discipline has its own jargon and stylistic idiosyncrasies making it harder for others to understand. We live in a time when obfuscation is celebrated as a skill and straight talk is scoffed at.
To give an example, I was reading a top endocrinology journal yesterday and was dismayed to find that the pages have been hijacked by genes, genes and more genes or molecules,molecules and more molecules. It felt like the journal had written in 100 size font in invisible ink – look, this is for the experts. No one else is welcome.
I am not arguing that the top journals should dumb down their content or ask authors to keep click baity titles. However I’m certain that the scientific community will be better served by a Cochrane style plain language summary for every scientific article. In fact developing a written version of the elevator pitch is likely to narrow our focus on what matters. However, most journals don’t have the space/ inclination for such summaries. We need a plain language movement for medicine.

What can we do in the meantime?

Kudos. It is a free online service to explain about your research in plain English. Each paper gets these four pieces of information – Title, What is about, Why is it important and the Perspectives of the author. Kudos also provides shareable links and can automatically post to Facebook, Twitter and LinkedIn. It can even track the response your article is generating! (It’s like having your own Altmetric dashboard)
Here’s a plain language summary of one of our papers – Tumor(s) Induced Osteomalacia- A curious case of double Trouble
If you are an academic, check out Kudos. It’s free and the experience can help you focus on what matters.

Getting started with case reports

A case report is the perfect starting point for a resident new to scholarly publishing. It is easy to write, requires little creativity (after all it is just a documentation of a patient that came to meet the doctor) and though has limited impact, has good educational value. More than anything else, it lowers the barrier to scientific writing.

There is a catch though – case reports are the low hanging fruits. Accordingly there is quite a bit of competition there – lot of people want to write, very few publishers want to publish. This has created a vacuum which has been fulfilled by speciality case report journals. These journals publish only case reports and therefore have a much higher acceptance rate – somewhere in the range of 30 to 70 %. The increased demand also causes a situation where publishers may resort to questionable practices. In fact, almost half the journals are found to be dubious.

How to identify the genuine journals?

The trick is to find those case report journals which are PubMed Indexed. Only one PubMed Indexed journal(published by Baishideng group) is known to indulge in questionable practices[Refer to the Excel file linked at the end of the article]. So a case report journal that is PubMed Indexed is highly likely to be genuine. For example, my first publication was a case report in BMJ case reports.  BMJ case reports has a decent acceptance rate, but in order to submit one of the authors or the institution must have subscription. Individual subscription costs around 185 GBP (around Rs.15000), but just one subscription in a department is more than enough. Be sure to check if your institution has subscription – in which case, you can contact the librarian to get the submission access code. BMJ case reports doesn’t have an impact factor as such (many case report only journals don’t.). However you can use the scimagojr 2 year citations per article as a reasonable proxy.

Of course, case reports are also published by journals that publish other stuff like reviews and original articles. However the acceptance rate is likely to be lower in these journals. If you are confident of your material, it is best to try in a general journal first before trying a case reports only journal. When in doubt, ask an expert.

A master list of case reports only journals can be accessed in Excel format here. Sadly I couldn’t get a master list of submission fees – if you have details on that, do let me know. If you found this post useful, please share with your friends.

Further reading

New journals for publishing medical case reports

Online workflow for writing articles

It has become increasingly common for people to collaborate on writing projects. The tools that enable such collaboration have improved over the years too and currently allow for a completely online workflow. Unfortunately many residents and early career researchers don’t take advantage of the recent developments. In this post, I will outline a completely online workflow for writing articles
This way, you could work with any number of people on the same project and all of you could have access to the same digital library from which you can cite. You might wonder that the functionality of team library has been available for quite some now in popular reference management software like Zotero. However , without going through a few hoops, you can’t get Zotero to work seamlessly with Google docs.
Of late, I am increasingly using Google docs for my document preparation needs. Sure it isn’t MS Word, but few people need the full power of MS Word for their routine documents. The ‘portability’ of a Google docs document is particularly attractive to me since I have computers running different operating systems.
Here’s my completely online workflow. Every component of the workflow is free.(as in free beer).


The advantages of this online workflow includes

  • No need to install any software
  • You always get the latest and greatest version
  • OS/device independent
  • Collaboration is easy and seamless

The F1000 workspace also has a desktop client and  and you can start working even if you already have a pdf collection. It also has  a word add in, if you prefer to write in MS Word.Try it out for your next article. You will be pleasantly surprised.

Decoding sensitivity and specificity

Last week, like most other weeks of the year, I had encountered the terms – sensitivity and specificity three times! These terms have somehow become the first class citizens in a physician’s lingo. They are so commonly bandied about that many of us don’t even stop to think about what they really  mean. 

Consider this scenario . One of my well meaning friends told me before  exams  that the sensitivity and specificity of different tests used to diagnose Cushing’s syndrome is a very important question and should be memorized. And I duly proceeded to do the same. To my utter chagrin most of these tests had very similar sensitivity and specificity.

Here is a table  showing different sensitivity and specificity of the tests.


Before proceeding further, let’s refresh what sensitivity and specificity means

In plain English,

Sensitivity = probability of the test being positive if you have a disease

Specificity = probability of testing negative if you don’t have a disease

It can be rewritten as

Sensitivity =  P(PositiveTest | Disease)

Specificity =P(NegativeTest | NoDisease)

The above notation for sensitivity is read as Probability of positive test, GIVEN a patient is diseased. Similarly for specificity.

Here in lies the problem – both sensitivity and specificity are Conditional Probabilities. In the best of days, probability can be a little difficult to grasp and conditional probabilities tend to confuse people even more.

Natural frequencies are an easier way to communicate the same information and understanding them better. Most clinicians aren’t really interested in the ‘sensivitiy’ and ‘specificity’ alone. What we want is actually positive and negative predictive values. If a test is positive, what is the probability of having the disease?

To know this we need to know the prevalence of the disease. A positive test is more likely to be true positive if the prevalence of the disease is high. A positive test is more likely to be a false alarm if the prevalence of the disease is low.


Effect of prevalence on test result

Let us for the sake of discussion, consider the prevalence of Cushing’s syndrome  is 1 % .

How do we decode this sensitivity and specificity into natural frequencies?

Let’s use an example from the picture given above. LDDST with 98 % sensitivity. False positivity rate, can be calculated as 1 – specificity. That is Around 3 % for LDDST.  So the three ingredients we need for transforming conditional probabilities to natural frequencies are

  1. Prevalence
  2. Sensitivity
  3. Specificity

Ten out of every 1000 (1% prevalence) people are expected to have Cushing’s syndrome. Of these ten people almost all test positive (98% sensitivity – rounded off). Of the 990 without disease, 3% (30 people) still test positive.

Let’s draw a tree to make this clearer. May be I am different, but a tree seems easier for me than a 2 x2 table.



So of all positive tests, that is 40, only 10 are expected to be true positive. In other words just 25 % !

Now repeat the same thing with other tests in the table. You will find that there is only a tiny sliver of difference between the tests. There is only so much information you can extract from a given test, which is why we need a combination of tests.

Over a period of time, drawing a tree like this will become second nature to  you if you mentally practice with approximations. Soon you will realize that natural frequencies are much easier for us to understand and think about than conditional probabilities. It sort of empowers both the doctor and the patient and helps in better communication of risks.

If you are a Bayesian at heart, you will cringe at the thought of prevalence having such a huge say in the outcome. What about your clinical acumen? Someone with a set of ‘strong’ signs and another with a set of ‘weak signs’ can’t have the same ‘arbitrary’ number influencing the interpretation of a blood test right?

Perhaps you will then agree that the model below represents the way we should think about blood tests better


In such a situation, we are better off educating PGs (even interns) about the uncertainty involved, instead of asking them to recite some numbers. As for me, I continue to draw trees 😉 

If you find this useful, feel free to share with your students/friends.

Aaplot: Easy way to draw annotated scatterplot in Stata

The standard way to draw scatter plot with a linear fit in Stata is quite simple. Even then you will have to use the built in graph editor for polishing it or making it publication ready.

Let me illustrate that with the auto dataset that ships with Stata. We will draw a scatter plot of mpg(miles per gallon) and the price of the cars. We will also draw a line of best fit and a confidence interval.

sysuse auto
twoway scatter mpg price || liftci mpg price

Now this is the unedited resulting graph.


As you can see, it needs labelling of the axis, line equation etc. Its not a lot of work, but still there is a way to make this easier.

Aaplot is a user written ado file, contributed by Nicholas Cox.

Install it like this

ssc install aaplot

Now you can draw the same thing with additional details with a single command.

aaplot mpg price,addplot(lfitci mpg price)


(Unlike R, you don’t need to load the addons separately in Stata! How cool is that :-))


Scientific publishing: Online platforms for writing

The early computers didn’t have a pretty user interface. They were geared towards the nerds and hobbyists – that was until Steve Jobs laid his eyes on the  GUI(Graphical user interface) developed in Palo Alto Research center by Xerox. The rest as they say is history.
Fast forward a few years. A Stanford computer scientist,Donald Knuth developed an entire document preparation system. It was robust and it was immediately lapped up by the Mathematics community : for it was a great way to format equations beautifully. This system was later made a little more friendly by Leslie Lamport and was called \LaTeX
It had many advantages

  • It delinked content and formatting
  • It could be scripted /automated with macros
  • It was just plain text with markup – thus making version control easy
  • It could output to a variety of formats that could be viewed from practically any device
  • It was free and open source

Unfortunately \LaTeX was and still is a little hard to learn. Consequently the life sciences community heavily uses WSIWYG(What you see is what you get) programs like MS Word for scientific writing.

\LaTeXneeded a front end –

  • one that is easy to use
  • does not require any installation (or at least opensource and cross platform)
  • has an easy way to add tables,figures and citations(which are not so easy to do if you use LaTeX and a customized citation style)

LyX was the first step – but it requires a local \LaTeX installation. Then came the likes of Overleaf and ShareLaTeX. They were both online, thus freeing the user from the need to install anything locally. Unfortunately they still very much retain the \LaTeX flavor and are thus not suitable for the average doctor. Then came Authorea – it had a freemium model, it was online, it was easy to use and almost felt like the long battle to develop an unintimidating face for \LaTeX had succeeded.It has a few templates for life science journals, but the operating word here is few.
I had thought that the whole process was complete. You could write papers online, collaboratively with anyone from anywhere and produce camera ready pdfs !
Well, it turns out life isn’t that simple. Enter the publisher. Each publisher has specific formatting guidelines for clarity, uniqueness and for good design. So most life science publishers prefer to receive the manuscript in MS Word format and wouldn’t want to touch your tex files. (There are of course, some exceptions). This formatting of everything as per inhouse style(including citations) keeps the journal unique and is unlikely to be solved anytime soon. As I said before,LaTeX has a nice front end, it could be scripted and automated. So what if, we could have a web app that can help do everything Authorea does and generate a journal specific pdf at the click of a button? It would solve the problem for the authors as well as the publishers!
Typeset  is one such tool. It is online, free (currently in beta), incredibly easy to use( text,tables,images and citations), can be versioned and can generate a pdf as per the journal requirements at the click of a button! I haven’t been this excited about the possibilities of an academic software in a longtime. I will outline some in the next post. For now, feel free to check out Typeset

Here’s a comparison of these web apps

Feature Overleaf Sharelatex Authorea Typeset
Free option
LaTeX usage
Version control ✔ (premium account only)
For those who don’t know LaTeX
Journal Specific Styles ✔ ( only few for medical journals) ✔ (4500+ journals)
Reproducible research(text+data+code)
Social Tools(comments,chat)

Bonus: If you have a Mac and would prefer an installed app with similar functionality, check out Manuscripts app.

Note: Of course, the most popular and straightforward online option is google doc, but I guess you are already quite proficient in its use.