*************************************
*** README FOR BAHETY ET AL. 2021 ***
*************************************

This README describes the replication materials for Bahety et al. 2021. There are three folders included in this directory.

ADMIN: This folder contains the questionnaires used in the study. There are two files.
- COVID-19_Project_Household_Questionnaire.pdf is the main phone questionnaire used in collecting the primary data
- Qualitative Questionnaire_English & Hindi.pdf is the questionnaire from the qualitative interviews.

CODE: This folder contains two sub-folders. Before running any of the STATA code, the file "00 Master File"  should be updated and run to set the proper working directories.
- Update line 9 of the Master File with the user's username
- Update line 10 of the Master File with the user's R path, necessary for calling the adaptive trial code in the randomization inference
- Update line 11 of the Master File with the path to this replication folder
- Update line 12 of the Master File with the path of the desired output from this replication archive
- Finally, in addition to updating the Master File, the R code file "exploration_sampling.R", located in "Replication_Files/CODE/Replication_Code/", should be updated in line 5 to point to the root replication output directory (i.e. the "Replication_Output" folder)
- Optionally, edit line 60 of the Master File to adjust the number of batches generated by the Randomization Inference procedure (2000 were used for the manuscript, but may take several hours or more on personal computers)

NOTE: Make sure that you have installed the two required libraries, 'tidyverse' and 'lubridate', within your R installation. If you are unsure, check the code file Replication_Code/exploration_sampling.R for more details.

Within the CODE folder, first there is a folder called Adaptive_Trial that contains the code that was used to administer experiment, calculating the new treatment shares as data was flowed in. There are three files here. Note that as is, these files cannot be directly run to produce the output in the Adaptive_Trial_Output folder. That is because these files were changed slightly each day as the experiment progressed. We include them here for transparency purposes and in case they are helpful to other researchers wishing to conduct a similar experiment.
- 1a Prep adaptive trial input.do: This file loads the latest raw survey data, cleans it, and adds it to the existing survey data to produce the input for the exploration sampling. The locals at the beginning were updated to give the correct dates each time this file was run.
- 1b_exploration_sampling.R: In order to run this code, the path must be updated on line 5.  The code running the exploration sampling itself is almost entirely borrowed from Kasy and Sautmann 2021, to whom we are very grateful. This file takes as input the cumulative outcome and treatment data up to that date as a prior, and it produces as outputs the treatment shares and the posterior probabilities that a given arm is optimal for each round.
- 1c Assign treatments.do: This file takes the treatment shares produced from the exploration sampling and creates the file to be uploaded to telerivet that sends the SMS messages for the next treatment round. The locals at the beginning were updated to give the correct dates each time this file was run.

Second, there is a folder called Replication_Code that contains all code necessary to take the raw data that was collected from the field and produce the analysis, tables, and figures in the paper. By running the remainder of the 00 Master File.do file, all of these other do files will be run, recreating the full replication of the tables and figures. All STATA code was written in STATA 16.

Here is a brief description of the functions of each individual code file (more info is available in the comments of each individual file):
- 01_data_merging:
- 02_data_cleaning:
- 3a Posterior Probabilities.do: creates the table presenting the posterior probability that each arm is optimal, along with the mean outcomes for each arm.
- 04a_randomization_inference: Creates a Randomization Inference dataset (set $nbatch in Master File)
- 04a_ri_merge: Merge the Randomization Inference dataset with the survey data
- 04b_analysis_descriptives: Generate various analysis descriptives (see header of .do file for detailed list)
- 04b_analysis_itt: Basic treatment effect regressions
- 04b_analysis_firststage: Treatment effect first stage regressions
- 04b_analysis_itt_hetero: Treatment effect regressions with heterogeneity by: Period of Study, Literacy, Recall Timing, Unemployment
- 04b_analysis_itt_spill: Spillover treatment effect regressions for Respiratory Hygiene and Mask Wearing outcomes
- 04b_analysis_treatmentshares: Describing trends in treatment assignment by survey round
- 04b_analysis_itt_sds: Treatment effect regressions by Social Desirability Score (SDS)
- 04b_analysis_itt_risks: Treatment effect on perceived risks of a) getting sick from COVID and b) dying from COVID
- 04b_analysis_figures: Generate study figures
- 04b_analysis_2sls: 2SLS treatment effect on the treated analysis
- 04b_analysis_itt_controlstest: ITT regressions without controls

DATA: This folder contains six sub-folders. First, there is a folder called Qualitative_Surveys that contains the results from the qualitative interviews. Second, there is a folder called Delivery_Reports that contains the administrative data from the SMS deliveries. Third, there is a folder called Raw_Survey_Data that contains all of the raw responses from the phone surveys. Fourth, there is a folder called Adaptive_Trial_Output that contains the output from the code in the Adaptive_Trial folder. This output was generated each day that the experiment was run. Fifth, there is a folder called Saran_Admin_Data which contains the original dataset that was used to construct the sample. All personally identifying information has been removed from all data. Lastly, there is a folder called Treatment_Assignments that contains the list of daily treatment assignments generated from the exploration sampling algorithm.

- The files finaldistancingpriors.csv and finalhwpriors.csv in the folder DATA/Adaptive_Trial_Output/Posteriors contain the final posterior probabilities that a given arm is optimal for each treatment arm, as generated from the 1b_exploration_sampling.R script.