The populace en masse places their trust, and often their lives, in the hands of investigators, physicians, and medical personnel. This also includes researchers who strive to improve our lives through lab endeavors. To do this effectively and legitimately, the data must be true and free from substantial bias and skewed results.
Image Credit: Gorodenkoff/Shutterstock.com
Data integrity is a measure of the accuracy and veracity that the results hold. Entire companies and fortunes are committed to safeguarding data, so it does not become compromised or lost.
Textbooks like "Reproducibility and Replicability in Science", "Enhancing Statistical Inference in Psychological Research via Prospective and Retrospective Design Analysis," and many others abide by several guidelines to uphold and promote scientific integrity. Many are clear-cut and axiomatic claims. All publications should be clear, concise, and complete regarding how the data was acquired and reported. Others are more nuanced and grounded in data preservation. Though science varies greatly in areas of study or variations in inquiry, these fundamental laws should never be conceded.
One inherent clause is that all publishings should describe methodologies, materials, measurements, and other important variables that contributed to the study. Equally important are the clear listings of all data excluded from the publication and reasons why that data is omitted.
P Values in Science
The p-value measures the significance level in statistics and represents a hypothesis test. This tests whether the probability of a random sample used within the experiment falls under the null hypothesis. The null hypothesis claims that the observed difference between the control and dependent variables is due to chance. The smaller the P-value is, the stronger evidence there is to reject the null hypothesis and the smaller the chance that your sample is different due to chance.
The American Statistical Association (ASA) has stated that the misunderstanding and misuse of statistical significance testing are rampant. They addressed this issue in 2016 by releasing a certain set of principles that would ameliorate the research community and preserve the integrity of resultant data.
The systemic problem of researchers regarding P-Values as ubiquitous truths is ongoing, while the claim itself is far from the case. Often data analysts will look at the P-Value and make a base judgment on the validity of the rest of the paper. Clauses 5 and 6 of the ASA's list of statistical truths state that "A p-value, or statistical significance, does not measure the size of an effect or the importance of a result" and, "By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis" respectively.
In addition, they also highlight the importance of reporting full and transparent results in clause 4. As time passed, the ASA printed a special title, "Statistical Inference in the 21st Century: A World Beyond P < 0.05", in 2019. Again, this highlighted the importance of prohibiting statistical significance as a "say all" result while reporting findings.
Recognition Regarding Data Integrity
Labs, principal investigators, and reporters are incentivized to circulate reproducible and reliable data. One modus of incentivizing comes in the form of awards and accreditations that are given to those that abide by these standards. Amongst these are notable badges, which recognize open practices and encourage those within the STEM fields to share their data in an open-source format, all in the name of reproducibility.
One such badge is the "Open Data Badge", motivating the collective pooling of information. According to the Association for Computing Machinery (ACM), more than 820 articles have been granted these badges of honor since their conception in 2016.
ACM badges, as well as badges from the Institute of Electrical and Electronics Engineers (IEEE), recognize "results" integrity amongst code, data, or both.
The criteria that enable one to receive such an award are the following: coverage (how well the results can be verified), ease of reproducibility, flexibility (how well it can dynamically impact queries and change behavior with each publication), and portability (how versatile the data can be when transported to different software's or hardware's.)
Image Credit: Gorodenkoff/Shutterstock.com
European Council for Nuclear Research (CERN) as an Example of Promoting Data Integrity
The CERN institute is one of the most highly regarded research facilities on the planet, with members of its committee receiving the Nobel Prize in physics in 2010, the Niels Bohr gold medal, the Sofja Kovalevskaja Award, and so many others. This has been accomplished through the scrupulous nature of their data process, acquisition, and publication. For example, the experimental and raw data provided by the Large Hadron Collider is processed and refined to leave no measurements unaccounted for. It does this while reporting datasets in formats that are apropos for physics analysts to assay. This data is also manipulated to tackle theoretical predictions and novel models of quantum physics.
Once the integrity and legitimacy of experimental data are confirmed, CERN's open data services will issue these findings to other collaborators and the hands of the public domain (provided with a brief embargo period of several years). The adoption of these policies, as well as the incentives issued by the whole of the scientific community, have given individual labs and crowd-funded teams a reason to promote data integrity.
- Rowhani-Farid Anisa, Aldcroft Adrian and Barnett Adrian G. (2020) Did awarding badges increase data sharing in BMJ Open? A randomized controlled trialR. Soc. open sci.7191818191818
- Kidwell Met al. (2016) Badges to acknowledge open practices: a simple, low-cost, effective method for increasing transparency. PLoS Biol. 14, e1002456.
- Funk C, Hefferon M, Kennedy B, Johnson C. (2019) Trust and Mistrust in Americans' Views of Scientific Experts. See
- Taichman D.B. et al. (2017) Data sharing statements for clinical trials: a requirement of the International Committee of Medical Journal Editors. Ann. Intern. Med. 167, 63-65.
- Katarina Anthony et al. (2010). Awards and Honours. Prix et recompense BUL-NA-2010-274, 42/2010 12
- Nahm F. S. (2017). What the P values really tell us. The Korean journal of pain, 30(4), 241–242.