create word cloud using the abstract from PubMed
Examples of how to create a word cloud using abstracts in PubMed
updated on Thu Oct 16 00:09:49 2014
PubMedWordcloud is avaliable on CRAN and GitHub
install.packages("PubMedWordcloud",dependencies=TRUE)
or
library(devtools)
install_github("felixfan/PubMedWordcloud") # from GitHub
library(PubMedWordcloud)
Since my first paper was published in 2007, I will retrieve all PMIDs of my paper from 2007 to this year (2014). I used both 'Yan-Hui Fan' and 'Yanhui Fan' as my name, so I assigned PMIDs for these two names to 'pmid1' and 'pmid2', respectively.
pmid1=getPMIDs(author="Yan-Hui Fan",dFrom=2007,dTo=2014,n=10)
pmid1
[1] "24935264" "24721834" "22698742" "22693232" "22564732" "22301463"
[7] "22015308" "21283797"
pmid2=getPMIDs(author="Yanhui Fan",dFrom=2007,dTo=2014,n=10)
pmid2
[1] "24890309" "20576513" "19412437"
There are eight PMIDs in 'pmid1' and three PMIDs in 'pmid2'.
Note that 'pmid1' and 'pmid2' are vectors, so it is easy to add or delete PMIDs to 'pmid1' and 'pmid2', or combine them. I also write a function to do it, in case you do not want to find out how to do it.
PMID "22698742" in 'pmid1' and "20576513" in 'pmid2' are published by others (have the same name with me). So I want to exclude them and then combine 'pmid1' and 'pmid2'.
rm1="22698742"
pmids1=editPMIDs(x=pmid1,y=rm1,method="exclude")
rm2="20576513"
pmids2=editPMIDs(x=pmid2,y=rm2,method="exclude")
pmids=editPMIDs(x=pmids1,y=pmids2,method="add")
Note: only unique PMIDs were kept
abstracts=getAbstracts(pmids)
clean data using paackage {tm}: remove Punctuations, remove Numbers, Translate characters to lower or upper case, remove stopwords, remove user specified words, Stemming words.
cleanAbs=cleanAbstracts(abstracts)
Plot withdafault parameters
plotWordCloud(cleanAbs,min.freq = 2, scale = c(2, 0.3))
Do not rotate words.
plotWordCloud(cleanAbs,min.freq = 2, scale = c(2, 0.3),rot.per=0)
Plot using other colors.
colors=colSets(type="Paired")
plotWordCloud(cleanAbs,min.freq = 2, scale = c(2, 0.3),colors=colors)
Clean the data with Stemming words is TRUE and plot again.
cleanAbs2=cleanAbstracts(abstracts,stemDoc =TRUE)
plotWordCloud(cleanAbs2,min.freq = 2, scale = c(2, 0.3))
**Note: ** 'plotWordCloud' uasually will generate a lot of warnings. Many words could not be fit on page. try to adjust the scale parameter, using smaller value may remove these warnings.
Shiny Pubmed Word Clouds
wordcloud
GOsummaries: Word cloud summaries of GO enrichment analysis
How I used R to create a word cloud, step by step
NCBI