While generative AI tools can help users with such tasks as brainstorming for new ideas, organizing existing information, mapping out scholarly discussions, or summarizing sources, they are also notorious for not relying fully on factual information or rigorous research strategies. In fact, they are known for producing "hallucinations," an AI science term used to describe false information created by the AI system to defend its statements. Oftentimes, these "hallucinations" can be presented in a very confident manner and consist of partially or fully fabricated citations or facts.
Certain AI tools have even been used to intentionally produce false images or audiovisual recordings to spread misinformation and mislead the audience. Referred to as "deep fakes," these materials can be utilized to subvert democratic processes and are thus particularly dangerous.
Additionally, the information presented by generative AI tools may lack currency as some of the systems do not necessarily have access to the latest information. Rather, they may have been trained on past datasets, thus generating dated representations of current events and the related information landscape.
Another potentially significant limitation of AI is the bias that can be embedded in the products it generates. Fed immense amounts of data and text available on the internet, these large language model systems are trained to simply predict the most likely sequence of words in response to a given prompt, and will therefore reflect and perpetuate the biases inherent in the inputted internet information. An additional source of bias lies in the fact that some generative AI tools utilize reinforcement learning with human feedback (RLHF), with the caveat that the human testers used to provide this feedback are themselves non-neutral. Accordingly, generative AI like ChatGPT is documented to have provided output that is socio-politically biased, occasionally even containing sexist, racist, or otherwise offensive information.
Generative AI tools have introduced new challenges in academic integrity, particularly related to plagiarism.
Plagiarism is typically defined as presenting someone else's work or ideas as one's own. While a generative AI tool might not qualify as a "someone," using text generated from an AI tool without citing is still considered plagiarism, according to Bucknell University's Honor Code and statements of Academic Responsibility, because the work is still not the researcher's own. Individual policies for using and crediting GAI tools might vary from class to class, so looking at the syllabus and having a clear understanding from your faculty member is important.
A note about plagiarism detection tools:
A number of AI detection tools are currently available to publishers and institutions, but there are concerns about low rates of accuracy and false accusations. Because generative AI tools do not generate large amounts of text word-for-word from existing works, it can be difficult for automated tools to detect plagiarism. Bucknell uses the AI detection feature of its plagiarism detection tool, Turnitin.
Another area of academic integrity affected by GAI tools is that of false citations.
Providing false citations in research, whether intentional or unintentional, is a violation of Academic Responsibility and the Honor Code. GAI tools such as ChatGPT have been known to generate false citations, and even if the citations represent actual papers, the cited content in ChatGPT might still be inaccurate.
U.S. Copyright law as it relates to the use of AI tools is still evolving. On August 30, 2023, the U.S. Copyright Office issued a Notice of Inquiry on copyright and artificial intelligence to assess how it will advise Congress on the matter, and if any legislative or regulatory measures need to be taken.
This is still being determined. There are currently several court cases directly relating to the unauthorized use of copyrighted material as training data for Generative AI tools. Individual authors, artists, and companies are suing OpenAI, GitHub, and other companies for using their work when training their AI products.
A note about copyright vs plagiarism:
Copyright violation is an issue that is separate from plagiarism. While plagiarism can be considered fraud if funding is involved, it is largely considered an issue of research integrity and ethics rather than a legal matter. The question of whether generative AI tools are engaging in plagiarism when they scrape data to generate content is also currently being debated.
Copyright law currently has a human authorship requirement, and according to recent guidance, when an AI technology "determines the expressive elements of its output, the generated material is not the product of human authorship." What this means is that AI-generated art and text is not copyrightable on its own. The issue of AI and authorship is also considered in the editorial policies of Nature and Science.
This depends on the extent to which the AI tool is part of the creative process. The more human creativity involved, the more likely it is that you will be able to register your work with the U.S. Copyright Office. While you own the copyright to anything you create (or had a large part in the creation of), copyright registration is important as a public record of your copyright claim, which will be helpful to you if you are interested in licensing your work.
There are currently also multiple privacy concerns associated with the use of generative AI tools. The most prominent issues revolve around the possibility of a breach of personal/sensitive data and re-identification. More specifically, most AI-powered language models, including ChatGPT, require for users to input large amounts of data to be trained and generate new information products effectively. This translates into personal or sensitive user-submitted data becoming an integral part of the collection of material used to further train the AI without the explicit consent of the user. Moreover, certain generative AI policies even permit AI developers to profit off of this personal/sensitive information by selling it to third parties. Even in cases when clear identifying personal information is not entered by AI user, the utilization of the system carries a risk of re-identification as the submitted dataset may contain patterns allowing for the generated information to be linked back to the individual or entity.
Given these issues, extensive downloading of Library materials to build AI training corpora is prohibited. Additionally, some Library content providers prohibit any amount of their content being used with AI tools (please see Bucknell's Policy).