Reproducible research with Stata

Anyone who has used R for statistical analysis would be familiar with the incredible power and ease of use of RMarkdown. Having the code, data and the narrative together in one document enables reproducible research. More than anything else, it makes life easy.

As a fan of Stata, I wasn’t as satisfied with the stock options in Stata. As usual, user contributed packages come to the rescue. It may not be as elegant or extensive as the RMarkdown – for instance you can’t make HTML5 presentations with Stata programmatically. However, with LaTeX you can create beautiful Beamer presentations directly from a single text file written in Markdown – an easy to use markup system. The opportunities are nearly infinite.

What you will need ?

  1. Stata 14 or above
  2. Markstat package ( install using ssc install markstat)
  3. Whereis package (install using ssc install whereis)
  4. Pandoc – install it from the official Pandoc site

Once you have installed Pandoc, install Markstat and Whereis packages in Stata. The next step is to tell Stata where Pandoc is located – so that it can convert the markdown file into a html document.

If you are on, windows and installed Pandoc in the default location, type this in Stata

If you are on a Mac, fire up the terminal and type

whereis pandoc

Copy the location of the pandoc executable, and use it to tell Stata where it is located , like so

whereis pandoc /usr/bin/pandoc

Now that you are done, get back to Stata. Open Stata do file editor and start typing your narrative. The code blocks are separated by a tab. This is ridiculously easy and is even easier than the official dyndoc option in Stata 15.

Once you are done, save the file as filename.stmd. Note the extension. If your operating system hides file extensions by default, make sure you display them. Otherwise you might save the file as which is not what we want. Now you can generate the final document which you can share with colleagues by typing

markstat using filename.stmd,bundle

The bundle option is to ensure that any image that is generated is knit into the html file. You can omit it if your stmd file doesn’t have any images, but it find it a good habit to use the option anyway.

Here’s my brief screencast of how it is done. (Sans audio)

The best part is, since the source document is essentially a text file, it can be easily shared with colleagues too.

Note: If you want pdfs to be generated you need to have LaTeX installed in your system. (MikTeX for windows, TeXLive for Linux and MacTeX for the Mac)


Solving the attribution problem in research

Imagine you have a name like me -Karthik. This is quite common in Tamilnadu and perhaps even in South India. Luckily since we don’t use surnames and rather father’s name as the last name in our state, each name is more likely to be unique. (Unless you have a family history of common names 😀). If I had been in the North and have a surname like Aggarwal or Gupta, it becomes signinficantly more difficulty to identify me as a unique individual even after knowing both first name and last name. In normal circumstances,this wouldn’t be a problem. However when you start publishing, this causes unwanted issues. In database terms, one way to uniquely identify an observation is to use composite fields -the combination of two fields,such as first name and last name. The strength of the composite field depends on the uniqueness of the combination. As I said before, this first name last name combo doesn’t work well in places where the surname is very common. There is perhaps a north south difference even in this.

Why is this a problem?

For individual faculty/researcher

You may have to use your name with title of publication or affiliation to retrieve your publications. This is cumbersome and can lead to under or over counting.

For the institute/ university

It is very hard to improve something we can’t measure. So an institute might want to track the research productivity of its faculty and researchers. One way is to have an aggregate of publications at the level of institute,department and individual. This would be automatically updated and a report can be produced quarterly. This helps us visualise the trends in publication and see when and where we need to buckle up and improve.
All of this requires identifying the publications and correctly attributing them to the respective authors. If there is a problem or error in indetification or attribution, then the whole exercise will be a waste of time.
A software called Researgence uses an approach of searching for all possible combinations of the relevant fields. This isn’t free, but can be used by universities and institutes to track their research output. As you can imagine this is computationally intensive and needs manual verification.
So we need some way to uniquely identify individuals and their contributions.
How can we simplify this process?
By following the same method that is used to uniquely identify inviduals – by assigning a unique id( for example a number or alphanumeric code) to every researcher. That will solve the problem of attribution.
Two services are available which help in this regard. If you are an academic, go over to both of these and sign up. Both are free to use.

  1. ResearcherID
  2. ORCID

From your next publication, you can let the journal known your ResearcherID or ORCID during submission itself. And it won’t matter how common your name is.

Insanely simple data collection

Problem: You want to collect data quickly using your mobile phone, but you have neither the resources nor the expertise to design a solution from scratch
Solution: EpiCollect

EpiCollect is a free data collection app developed at the Imperial College of London. In my opinion, it is the simplest way to collect data on a mobile, without writing a single line of code. It is so simple that you can make a fully functional data collection form in under 3 minutes! It even allows you to take the patient’s photo or read a bar code.


How to make a simple data collection form?

The steps are

  1. Go to EpiCollect website
  2. Login with your Google account
  3. Create a project
  4. Make a web form using the drag and drop form builder
  5. Set the access to your project to be private and visibility to be “visible”
  6. Download EpiCollect app to your phone
  7. Login and search for your project
  8. Start entering data
  9. Export to CSV(or JSON if you want) [ you can also view it on the web]

What is so exciting about EpiCollect?

  1. It is free for everyone – unlike REDCap you don’t need an institutional email id
  2. It allows flawless data collection using a mobile app – even offline
  3. It allows some special fields for data collection – like photos,audio,video, barcode – useful for qualitative research as well
  4. It allows geotagging – useful for field research
  5. It allows advanced data collection – data validation(even allowing regex – regular expressions),branching and jumps
  6. You can add multiple users to your project – for example, a multi-department registry

I want to know more. What can I do?

Go to the EpiCollect website and you will have all the information you need. I can guarantee that you will find the process very easy – even if you are a luddite. Just give it a spin.(Be sure to use the latest version-EpiCollect 5).

Note: You can do some other advanced stuff with this. For example, MicroReact (which uses EpiCollect allows researchers to track epidemics in real time)


The Plain Language Movement & Law

The plain language movement started in both sides of the Atlantic in the 1970s to make law easy to understand. The legal documents were plagued by legalese and were thus inaccessible to the commoner. This problem can be traced back to almost a 1000 years when William, the Duke of Normandy defeated the Anglo-Saxon King Harold in the Battle of Hastings in 1066. As William and his followers spoke a dialect of French, English became the language of the common and lowly folk.
The courts and lawyers soon followed suit. Within a few decades the Legal system had became inscrutable to the common man. With the ascendancy of English came the urge to rid the system of the French and Latin terms and replace them with crisp Anglo Saxon words. The push to make common sense in common language fashionable had a reasonable amount of success.
The legal system and the people benefited a lot from making things simple. Unfortunately, the Plain Language movement only focused on the law, not medicine.

Saving Medicine From Medicalese

Flip(or click) through the pages of any medical journal and you will see how hard our language has become for anyone outside our profession to make sense of. Even among doctors, each discipline has its own jargon and stylistic idiosyncrasies making it harder for others to understand. We live in a time when obfuscation is celebrated as a skill and straight talk is scoffed at.
To give an example, I was reading a top endocrinology journal yesterday and was dismayed to find that the pages have been hijacked by genes, genes and more genes or molecules,molecules and more molecules. It felt like the journal had written in 100 size font in invisible ink – look, this is for the experts. No one else is welcome.
I am not arguing that the top journals should dumb down their content or ask authors to keep click baity titles. However I’m certain that the scientific community will be better served by a Cochrane style plain language summary for every scientific article. In fact developing a written version of the elevator pitch is likely to narrow our focus on what matters. However, most journals don’t have the space/ inclination for such summaries. We need a plain language movement for medicine.

What can we do in the meantime?

Kudos. It is a free online service to explain about your research in plain English. Each paper gets these four pieces of information – Title, What is about, Why is it important and the Perspectives of the author. Kudos also provides shareable links and can automatically post to Facebook, Twitter and LinkedIn. It can even track the response your article is generating! (It’s like having your own Altmetric dashboard)
Here’s a plain language summary of one of our papers – Tumor(s) Induced Osteomalacia- A curious case of double Trouble
If you are an academic, check out Kudos. It’s free and the experience can help you focus on what matters.

The Insulin plant read that right. There’s actually a plant called insulin plant and it’s supposed to reduce blood glucose levels(no surprise there).You have probably heard of several natural remedies for diabetes and are rolling your eyes now.. The diabetes armamentarium is brimming with antidiabetic agents which are effective and proven. Some like GLP1 analogues and SGLT2 inhibitors have even proven to have cardiovascular benefits.

So why bother about a plant?

For a couple of reasons

  • Research : There are many plants which have potential antidiabetic diabetic activity – infact there are at least 111 plants which are known to reduce blood glucose. (1). However the Indian patent laws do not allow patenting plants and more importantly medicines derived from natural products. I have always had trouble understanding the second clause – even if you do some fancy chemical extraction and make a useful substance that was essentially hidden underground for millennia, you wouldn’t get a patent in India. Consequently the incentive to exploit the “natural remedies” for commercial gain is very limited. Thus,most of these plants/plant based substances may never reach the market as a tablet. Does that mean we can’t study them or learn from them? Not really – one can essentially mimic a natural substance, tweak it, call it bioinspiration and pretend that the molecular structure was an epiphany during a coffee break ! Or at least apply for an AYUSH grant to do some research – I’m a novice here, but I guess there can’t a better time to apply for AYUSH grants than now. Even if we aren’t involved in the business of making drugs, if the natural form is safe enough, we can consume them. Even if the effect is modest.
  • Clinical: Apart from the research aspect, there is a huge public craze for cost effective natural remedies or drugs derived from plants. The runaway success of products like BGR34 is a testimony to this.

Now you might wonder, if this plant stuff is good, it should have a good scientific backing. Indeed there’s a good body of research behind this. But let’s be frank – research is often locked behind paywalls. Even when it is ‘accessible’ it isn’t truly accessible to those outside the profession – most people are turned off by graphs,tables and statistics. The idea of this post is to simply strip the complexity off the published scientific literature and bring the reader upto speed on this quirky plant.

Here’s a brief bio of the insulin plant in Q&A format

What exactly is the insulin plant?

This plant belongs to the Costaceae family – two species are common , the Costus igneus and the Costus pictus. The leaves of this plant are sometimes taken as supplements for reducing blood sugar. Known as the Spiral flag( insulin chedi in Tamil and Malayalam), the plant can grow upto 2 feet and has colorful flowers.


What does the plant contain?

It contains triterpenoids such as α and β amyrin,lupeol, stigmatsterol.,Diosgenin etc. That’s a lot of active principles- but mostly we are yet to understand how these substances interact with one another and whether isolating them is more useful than the natural mixture in which they are found.

How do I get this plant?

The insulin plant can be obtained from a nursery or someone who is already using it. Care should be taken to avoid mistaking some other plant for this. For the purposes of research, the identity of the plant needs to be confirmed by the Botanical Survey of India,Coimbatore. They give an authentication certificate with a number and date.

Is it safe for human consumption?

Published Toxicity studies in animals show no major toxic effects in the short term (2). Anecdotal human evidence seems to support this. However one should remember that with plants/plant products, there are a lot of variables one must account for – subspecies,soil,part of the plant, extract or whole leaves, growth in shade vs sunlight etc. Since there are no published long term human studies, we are essentially on our own when consuming this. Consequently, those at risk of hypoglycemia (elderly, recurrent hypos, comorbid illness, kidney diseases) and pregnant women should strictly avoid experimenting on themselves.

Is it effective in reducing blood sugar?

Much of the published research on this plant is from animal studies. These animal studies generally show a reduction in blood glucose. You can get a gist of the published research in the form of table by clicking here

Homogeneity is hard to obtain in these studies. Only limited human data is available. The absence of data doesn’t mean absence of useful effect though.

Does it have any other uses?

These days plenty of drugs reduce glucose. It is only natural to expect more !. Plant products tend to have pleiotropic effects and may well have off target effects which we don’t want. There are some of the effects of the insulin plant.

  1. Hypolipidemic effect
  2. Antioxidant effect
  3. Diuretic effect
  4. Anticancer effect
  5. Reduces TSH (3)

What does the current research mean?

Very little is known about the insulin plant – especially the human use of it. However, with the public clamor for natural remedies, there may be a future for this plant/its products. Because of its pleiotropic effects, it might have a role in conditions such as prediabetes,subclinical hypothyroidism apart from diabetes.

To conclude, the insulin plant is a potential plant therapy for diabetes. However at present we don’t know much about its human use and thus must proceed with caution.It opens up several research areas. If found useful in raw form, it may become one of the cheapest ways of treating diabetes.

Further Reading

1. Eddouks M, Bidi A, El Bouhali B, Hajji L, Zeggwagh NA. Antidiabetic plants improving insulin sensitivity. J Pharm Pharmacol. 2014 Sep;66(9):1197–214.

2. Hegde PK, Rao HA, Rao PN. A review on Insulin plant (Costus igneus Nak). Pharmacogn Rev. 2014 Jan;8(15):67–72.

3. Ashwini S, Bobby Z, Sridhar MG, Cleetus CC. Insulin Plant (Costus pictus) Extract Restores Thyroid Hormone Levels in Experimental Hypothyroidism. Pharmacognosy Res. 2017 Mar;9(1):51–9.

Getting started with case reports

A case report is the perfect starting point for a resident new to scholarly publishing. It is easy to write, requires little creativity (after all it is just a documentation of a patient that came to meet the doctor) and though has limited impact, has good educational value. More than anything else, it lowers the barrier to scientific writing.

There is a catch though – case reports are the low hanging fruits. Accordingly there is quite a bit of competition there – lot of people want to write, very few publishers want to publish. This has created a vacuum which has been fulfilled by speciality case report journals. These journals publish only case reports and therefore have a much higher acceptance rate – somewhere in the range of 30 to 70 %. The increased demand also causes a situation where publishers may resort to questionable practices. In fact, almost half the journals are found to be dubious.

How to identify the genuine journals?

The trick is to find those case report journals which are PubMed Indexed. Only one PubMed Indexed journal(published by Baishideng group) is known to indulge in questionable practices[Refer to the Excel file linked at the end of the article]. So a case report journal that is PubMed Indexed is highly likely to be genuine. For example, my first publication was a case report in BMJ case reports.  BMJ case reports has a decent acceptance rate, but in order to submit one of the authors or the institution must have subscription. Individual subscription costs around 185 GBP (around Rs.15000), but just one subscription in a department is more than enough. Be sure to check if your institution has subscription – in which case, you can contact the librarian to get the submission access code. BMJ case reports doesn’t have an impact factor as such (many case report only journals don’t.). However you can use the scimagojr 2 year citations per article as a reasonable proxy.

Of course, case reports are also published by journals that publish other stuff like reviews and original articles. However the acceptance rate is likely to be lower in these journals. If you are confident of your material, it is best to try in a general journal first before trying a case reports only journal. When in doubt, ask an expert.

A master list of case reports only journals can be accessed in Excel format here. Sadly I couldn’t get a master list of submission fees – if you have details on that, do let me know. If you found this post useful, please share with your friends.

Further reading

New journals for publishing medical case reports

Online workflow for writing articles

It has become increasingly common for people to collaborate on writing projects. The tools that enable such collaboration have improved over the years too and currently allow for a completely online workflow. Unfortunately many residents and early career researchers don’t take advantage of the recent developments. In this post, I will outline a completely online workflow for writing articles
This way, you could work with any number of people on the same project and all of you could have access to the same digital library from which you can cite. You might wonder that the functionality of team library has been available for quite some now in popular reference management software like Zotero. However , without going through a few hoops, you can’t get Zotero to work seamlessly with Google docs.
Of late, I am increasingly using Google docs for my document preparation needs. Sure it isn’t MS Word, but few people need the full power of MS Word for their routine documents. The ‘portability’ of a Google docs document is particularly attractive to me since I have computers running different operating systems.
Here’s my completely online workflow. Every component of the workflow is free.(as in free beer).


The advantages of this online workflow includes

  • No need to install any software
  • You always get the latest and greatest version
  • OS/device independent
  • Collaboration is easy and seamless

The F1000 workspace also has a desktop client and  and you can start working even if you already have a pdf collection. It also has  a word add in, if you prefer to write in MS Word.Try it out for your next article. You will be pleasantly surprised.