a complete web mining/ text mining/ social media mining solution based on a combination of natural language processing (NLP), information retrieval (IR) and machine learning (ML) techniques. You are provided a choice of four topics that broadly fall into the AI for Social Impact area. Each project (except one) will have a standard dataset and ground truth, enabling quantitative evaluation. Many of these are from past or ongoing challenges and have been attempted by other teams. We encourage you to use any available online tools or platforms to develop your solution. You should strive to produce results in the top 10% of any previously published results on the same dataset. While there is a quantitative evaluation component in the last stage, you must develop a live demo system. This may involve developing a user interface so you can demonstrate the system. This project will satisfy the MS project requirements specified by the CSE department. While the problem definition and evaluation dataset have been fixed, there is ample room for creativity on your part in further enhancement of the solution and implementation. Be creative, and most importantly, pace yourself properly during the semester. Your project is divided into three phases, which are described in more detail later on in this document: Phase 1: Submission of project proposal and in-person presentation of your proposal. This includes a comprehensive literature review on your selected topic, a necessary step before you begin designing your system! Phase 2: Interim report describing the evaluation of the baseline system. Phase 3: Final submission of technical paper, code, and in-class presentation of your end-to-end system. Project Option 1 - LLMs in the Health Sciences Task Overview: Multi-evidence Natural Language Inference for Clinical Trial Data (NLI4CT) Big Picture Researchers have initiated a challenge centered around Clinical Trial Reports (CTRs) related to breast cancer treatments to enhance how artificial intelligence interprets and utilizes medical reports. These reports are critical for medical professionals to determine the safety and efficacy of new treatments. Still, they are voluminous and complex, making it challenging for individuals to review each one thoroughly. The challenge involves the AI analyzing summaries of these reports, focusing on key aspects such as eligibility criteria, treatment specifics, trial outcomes, and observed adverse effects. Researchers crafted statements about these summaries, requiring the AI to assess whether these statements are true or false or if there's insufficient information to decide. To further test the Al's capabilities, these statements were intentionally altered by modifying numbers, changing words, or restructuring sentences. The ultimate goal is to refine the Al's consistency in understanding and ability to logically deduce information, thereby supporting medical professionals in making informed decisions about patient care. This underscores the potential of AI to significantly contribute to medical science, particularly in the realm of personalized medicine, by streamlining the interpretation and application of extensive clinical trial data. This task is based on a collection of breast cancer CTRs (extracted from https://clinicaltrials.gov/ct2/home), statements, explanations, and labels annotated by domain expert annotators. Task: Textual Entailment For the task, we have CTRs into 4 sections: • Eligibility criteria - A set of conditions for patients to be allowed to take part in the clinical trial Intervention - Information concerning the treatment type, dosage, frequency, and duration being studied. Results - Number of participants in the trial, outcome measures, units, and the results. ● Adverse events - These are signs and symptoms observed in patients during the clinical trial. The annotated statements are sentences with an average length of 19.5 tokens, that make some type of claim about the information contained in one of the sections in the CTR premise. The statements may make claims about a single CTR or compare 2 CTRs. The task is to determine the inference relation (entailment vs contradiction) between CTR - statement pairs. The training set we provide is identical to the training set used in our previous task, however, we have performed a variety of interventions on the test set and development set statements, either preserving or inversing the entailment relations. We will not disclose the technical details adopted to perform the interventions to guarantee fair competition and in the interest of encouraging approaches that are robust and not simply designed to tackle these interventions. The technical details will be made publicly available after the evaluation phase, and in our task description paper. Intervention targets • Numerical - LLMs still struggle to consistently apply numerical and quantitative reasoning. As NLI4CT requires this type of inference, we will specifically target the models' numerical and quantitative reasoning abilities. • Vocabulary and syntax - Acronyms and aliases are significantly more prevalent in clinical texts than general domain texts, and disrupt the performance of clinical NLI models. Additionally, models may experience shortcut learning, relying on syntactic patterns for inference. We target these concepts and patterns with an intervention. • Semantics - LLMs struggle with complex reasoning tasks when applied to longer premise-hypothesis pairs. We also intervene on the statements to exploit this. - Notes The specific type of intervention performed on a statement will not be available at test or training time. Dataset We will provide you with the dataset for training (which will be available in Github). Evaluation The evaluation of performance on this task will involve several steps. First, we will assess performance on the original NLI4CT statements without any interventions. This assessment will be based on Macro F1-score. Then, we will measure the Faithfulness and Consistency. (More details will be provided in the Github Repository) Bonus Points Completing all Objectives, writing a research paper, and submitting to at least a workshop/conference. Project Option 2 - LLMs in the Social Sciences Task: Multilingual Detection Of Persuasion Techniques In Memes Big Picture Imagine you're in a world where images paired with catchy text, known as memes, are not just for laughs but can also sway people's opinions and spread misinformation. These memes can be powerful on social media, reaching and influencing countless users with simple yet impactful messages. Some memes use sneaky tactics to persuade or mislead, such as making things seem simpler than they are, calling people names to discredit them, or using emotional appeals to bypass rational thinking. Technical Description We refer to propaganda whenever information is purposefully shaped to foster a predetermined agenda. Propaganda uses psychological and rhetorical techniques to reach its purpose. Such techniques include the use of logical fallacies and appealing to the emotions of the audience. Logical fallacies are usually hard to spot since the argumentation, at first sight, might seem correct and objective. However, a careful analysis shows that the conclusion cannot be drawn from the premise without the misuse of logical rules. Another set of techniques makes use of emotional language to induce the audience to agree with the speaker only on the basis of the emotional bond that is being created, provoking the suspension of any rational analysis of the argumentation. Memes consist of an image superimposed with text. The role of the image in a deceptive meme is either to reinforce/complement a technique in the text or to convey itself one or more persuasion techniques. Tasks Subtask 1 - Given only the "textual content” of a meme, identify which of the 20 persuasion techniques, organized in a hierarchy, it uses. If the ancestor node of a technique is selected, only a partial reward is given. This is a hierarchical multilabel classification problem. Subtask 2 - Given a meme, identify which of the 22 persuasion techniques, organized in a hierarchy, are used both in the textual and in the visual content of the meme (multimodal task). If the ancestor node of a technique is selected, only a partial reward will be given. This is a hierarchical multilabel classification problem. You can find info on the hierarchy below. Ad Hominem Persuasion Ethos Pathos Justification Logos Reasoning Distraction Simplification Name Calling Bandwagon Appeal to Emotion (Visual) Slogans Repetition Straw Man Causal Oversimplification Appeal to Doubt Exaggeration Authority Intentional Vagueness Red Herring Black & White Fallacy Smears Glittering Generalities Loaded Language Whataboutism Thought Terminating Cliché Reductio ad Hitlerum Flag Waving Appeal to Fear Transfer Color Legend Black = first level Red = second Level Blue third level Green fourth level White = techniques Hierarchy of the techniques for Subtask 2a (in Subtask 1 "Transfer" and "Appeal to Strong emotion" are not present). The hierarchy is also inspired by this document Dataset We will provide the dataset, more information will be on Github Evaluation Subtask 1 Subtask 1 is a hierarchical multilabel classification problem. Taking the figure above with the hierarchy as an example, any node of the DAG can be a predicted label. The gold label is always a leaf node of the DAG. If the prediction is the correct label, We use hierarchical-F1 as the official evaluation measure. Subtask 2 Subtask 2 is a hierarchical multilabel classification problem. We use hierarchical-F1 as the official evaluation measure. Bonus Points Completing all Objectives, writing a research paper, and submitting to at least a workshop/conference.