GigaScience selected IBM Aspera Connect Server to rapidly transfer all the data sets that accompanied submitted manuscripts to the GigaScience database and IBM Aspera Console to manage and monitor the entire end-to-end transfer process. Content submitters, reviewers, and research users use the IBM Aspera Connect plug-in to upload and download large data sets at maximum speed.
Laurie Model Sets Download
In this paper, we present a heart disease prediction use case showing how synthetic data can be used to address privacy concerns and overcome constraints inherent in small medical research data sets. While advanced machine learning algorithms, such as neural networks models, can be implemented to improve prediction accuracy, these require very large data sets which are often not available in medical or clinical research. We examine the use of surrogate data sets comprised of synthetic observations for modeling heart disease prediction. We generate surrogate data, based on the characteristics of original observations, and compare prediction accuracy results achieved from traditional machine learning models using both the original observations and the synthetic data. We also use a large surrogate data set to build a neural network model (Perceptron) and compare the prediction results to the traditional machine learning algorithms (Logistic Regression, Decision Tree and Random Forest). Using traditional Machine Learning models with surrogate data, we achieved improved prediction stability within 2 percent variance at around 81 percent using ten fold validation. Using the neural network model with surrogate data we are able to improve the accuracy of heart disease prediction by nearly 16 percent to 96.7 percent while maintaining stability at 1 percent. We find the use of surrogate data to be a valuable tool, as a means to anonymize sensitive data and improve classification prediction.
The technological base of a society sets parameters for what type of family structure can exist and the degree of gender inequality generated. 6 Gerhard Lenski identifies five ideal types in a materially based model of social stratification. 7 Ideal types identify kinds of social organization in "pure" form with progressively weaker examples lying closer to the next type on a continuum. 8 Ideal types are useful for descriptive as well as comparative and explanatory purposes. In the real world, societies often combine elements of more than one ideal type.
GMMAT is an R package for performing genetic association tests in genome-wide association studies (GWAS) and sequencing association studies, for outcomes with distribution in the exponential family (e.g. binary outcomes) based on generalized linear mixed models (GLMMs). It can be used to analyze genetic data from individuals with population structure and relatedness. GMMAT fits a GLMM with covariate adjustment and random effects to account for population structure and familial or cryptic relatedness. For GWAS, GMMAT performs score tests for each genetic variant. For candidate gene studies, GMMAT can also perform Wald tests to get the effect size estimate for each genetic variant. For rare variant analysis from sequencing association studies, GMMAT performs the variant Set Mixed Model Association Tests (SMMAT), including the burden test, the sequence kernel association test (SKAT), SKAT-O and an efficient hybrid test of the burden test and SKAT, based on user-defined variant sets. See user manual here.References:
2ff7e9595c
Comments