Skip to Main Content

Artificial Intelligence: A Generative Guide for Researchers

Misinformation & Bias in AI

Misinformation

While generative AI tools can help users with such tasks as brainstorming for new ideas, organizing existing information, mapping out scholarly discussions, or summarizing sources, they are also notorious for not relying fully on factual information or rigorous research strategies. In fact, they are known for producing "hallucinations," an AI science term used to describe false information created by the AI system to defend its statements. Oftentimes, these "hallucinations" can be presented in a very confident manner and consist of partially or fully fabricated citations or facts.

Certain AI tools have even been used to intentionally produce false images or audiovisual recordings to spread misinformation and mislead the audience. Referred to as "deep fakes," these materials can be utilized to subvert democratic processes and are thus particularly dangerous. 

Additionally, the information presented by generative AI tools may lack currency as some of the systems do not necessarily have access to the latest information. Rather, they may have been trained on past datasets, thus generating dated representations of current events and the related information landscape.

Bias

Another potentially significant limitation of AI is the bias that can be embedded in the products it generates. Fed immense amounts of data and text available on the internet, these large language model systems are trained to simply predict the most likely sequence of words in response to a given prompt, and will therefore reflect and perpetuate the biases inherent in the inputted internet information. An additional source of bias lies in the fact that some generative AI tools utilize reinforcement learning with human feedback (RLHF), with the caveat that the human testers used to provide this feedback are themselves non-neutral. Accordingly, generative AI like ChatGPT is documented to have provided output that is socio-politically biased, occasionally even containing sexist, racist, or otherwise offensive information.       

Related Recommendations  

  • Meticulously fact-check all of the information produced by generative AI, including verifying the source of all citations the AI uses to support its claims.
  • Critically evaluate all AI output for any possible biases that can skew the presented information. 
  • Avoid asking the AI tools to produce a list of sources on a specific topic as such prompts may result in the tools fabricating false citations. 
  • When available, consult the AI developers' notes to determine if the tool's information is up-to-date.
  • Always remember that generative AI tools are not search engines--they simply use large amounts of data to generate responses constructed to "make sense" according to common cognitive paradigms.

Selected Readings 

Artificial Intelligence and Academic Integrity

Plagiarism

Generative AI tools have introduced new challenges in academic integrity, particularly related to plagiarism.

Plagiarism is typically defined as presenting someone else's work or ideas as one's own. While a generative AI tool might not qualify as a "someone," using text generated from an AI tool without citing is still considered plagiarism, according to Bucknell University's Honor Code and statements of Academic Responsibility, because the work is still not the researcher's own. Individual policies for using and crediting GAI tools might vary from class to class, so looking at the syllabus and having a clear understanding from your faculty member is important.

A note about plagiarism detection tools:

A number of AI detection tools are currently available to publishers and institutions, but there are concerns about low rates of accuracy and false accusations. Because generative AI tools do not generate large amounts of text word-for-word from existing works, it can be difficult for automated tools to detect plagiarism. Bucknell uses the AI detection feature of its plagiarism detection tool, Turnitin.

False Citations

Another area of academic integrity affected by GAI tools is that of false citations.

Providing false citations in research, whether intentional or unintentional, is a violation of Academic Responsibility and the Honor Code. GAI tools such as ChatGPT have been known to generate false citations, and even if the citations represent actual papers, the cited content in ChatGPT might still be inaccurate.

Related Recommendations

  • If GAI tools are only permitted to be used for topic development in the early stages of research, you might not need to cite them at all, but it's still important to check with your professor first.
  • If you are providing commentary or analysis on the text generated by a chatbot and are either paraphrasing its results or quoting it directly, a citation is always required. You can find more information on citing GAI tools on this guide's Citing Generative AI page.
  • If you are a researcher planning to publish in a journal, it is best to review that journal's policies on the permitted use of Generative AI tools. (See 'Selected Readings' below for a couple of examples of journal policies.)
  • It's important to always look up citations and check to make sure they are accurate, and if you're citing information from that source, to cite the original source rather than ChatGPT or whichever GAI tool you're using.

Selected Readings

AI, Authorship, & Copyright

U.S. Copyright law as it relates to the use of AI tools is still evolving. On August 30, 2023, the U.S. Copyright Office issued a Notice of Inquiry on copyright and artificial intelligence to assess how it will advise Congress on the matter, and if any legislative or regulatory measures need to be taken. 

Are Generative AI tools violating U.S. Copyright Law?

This is still being determined. There are currently several court cases directly relating to the unauthorized use of copyrighted material as training data for Generative AI tools. Individual authors, artists, and companies are suing OpenAI, GitHub, and other companies for using their work when training their AI products.

A note about copyright vs plagiarism:

Copyright violation is an issue that is separate from plagiarism. While plagiarism can be considered fraud if funding is involved, it is largely considered an issue of research integrity and ethics rather than a legal matter. The question of whether generative AI tools are engaging in plagiarism when they scrape data to generate content is also currently being debated.

Can an AI tool retain copyright as an author?

Copyright law currently has a human authorship requirement, and according to recent guidance, when an AI technology "determines the expressive elements of its output, the generated material is not the product of human authorship." What this means is that AI-generated art and text is not copyrightable on its own. The issue of AI and authorship is also considered in the editorial policies of Nature and Science.

Can I register my work with the U.S. Copyright Office if it was partially authored by an AI tool?

This depends on the extent to which the AI tool is part of the creative process. The more human creativity involved, the more likely it is that you will be able to register your work with the U.S. Copyright Office. While you own the copyright to anything you create (or had a large part in the creation of), copyright registration is important as a public record of your copyright claim, which will be helpful to you if you are interested in licensing your work.

Selected Readings

Privacy and AI

Breaches of Privacy & Danger of Re-Identification

There are currently also multiple privacy concerns associated with the use of generative AI tools. The most prominent issues revolve around the possibility of a breach of personal/sensitive data and re-identification. More specifically, most AI-powered language models, including ChatGPT, require for users to input large amounts of data to be trained and generate new information products effectively. This translates into personal or sensitive user-submitted data becoming an integral part of the collection of material used to further train the AI without the explicit consent of the user. Moreover, certain generative AI policies even permit AI developers to profit off of this personal/sensitive information by selling it to third parties. Even in cases when clear identifying personal information is not entered by AI user, the utilization of the system carries a risk of re-identification as the submitted dataset may contain patterns allowing for the generated information to be linked back to the individual or entity.  

Given these issues, extensive downloading of Library materials to build AI training corpora is prohibited. Additionally, some Library content providers prohibit any amount of their content being used with AI tools (please see Bucknell's Policy).   

Related Recommendations

  • Avoid sharing any personal or sensitive information via the AI-powered tools. 
  • Do not download Library materials (i.e., articles, ebooks, infographics, psychographics, or other datasets) into AI as it is prohibited.
  • Always review the privacy policy of the generative AI tools before utilizing them. Be cautious about policies that permit for the inputted data to be freely distributed to third-party vendors and/or other users. 

Selected Readings