Gen AI Power Tools
for the experienced researcher
1
One dataset
30
Copy-ready prompts
5
Recurring techniques
Workshop agenda
| Module | Time | Output |
|---|---|---|
| Dataset orientation | 15 min | |
| Excel analysis prompts | 45 min | |
| Research methods prompts | 45 min | |
| IMRaD manuscript prompts | 60 min | |
| Techniques recap + Q&A | 15 min |
The sensor dataset
One-paragraph description
Variable dictionary
| Column | Type | Description | Range / Values |
|---|---|---|---|
date | String | ||
day_of_year | Integer | 1-365 | |
weekday | String | Mon, Tue, ..., Sun | |
season | Categorical | cool, hot, rainy | |
period | Binary | pre_policy, post_policy | |
temperature_c | Numeric | ||
humidity_pct | Numeric | ||
pm25_ugm3 | Numeric | ||
wind_speed_ms | Numeric | ||
rainfall_mm | Numeric |
Embedded patterns — for instructors
| Pattern | Where to find it | Expected value |
|---|---|---|
| r ≈ −0.87, p < 0.001 | ||
| r ≈ +0.69, p < 0.001 | ||
| PM2.5 ~ season | ||
Teaching tip:
Reveal these patterns only after participants run prompts.
10 prompts for Excel data analysis
Prompt 01
Descriptive statistics with confidence intervals
Compute mean, median, SD, min, max, and 95% confidence interval for temperature_c, humidity_pct, pm25_ugm3, wind_speed_ms, and rainfall_mm. Output as a table. Handle missing values by listwise deletion and report the n used for each variable.
Prompt 02
Correlation matrix with significance flags
Build a Pearson correlation matrix for all 5 numeric columns. For each pair, also compute the p-value. In a second table, flag correlations as: strong (|r|>0.7), moderate (0.4-0.7), weak (<0.4). Note which pairs are significant at p<0.05.
Prompt 03
Two-sample t-test on the policy intervention
Compare pm25_ugm3 between period="pre_policy" and "post_policy". Run a Welch's t-test (unequal variances). Report: mean of each group, mean difference with 95% CI, t-statistic, df, p-value. State whether the difference is significant at alpha=0.05 and quote the exact source cells you used.
Prompt 04
One-way ANOVA across seasons
Test whether pm25_ugm3 differs across the three seasons (cool, hot, rainy). Run one-way ANOVA: report F-statistic, df between and within, p-value. If p<0.05, follow up with Tukey HSD pairwise comparisons and identify which seasons differ.
Prompt 05
Linear regression with RMSE
Fit a linear regression: pm25_ugm3 ~ temperature_c + humidity_pct + wind_speed_ms. Report: coefficients with 95% CI and p-values, R-squared, adjusted R-squared, RMSE, and residual standard error. Then list the 10 rows with the largest absolute residuals — these are candidate anomalies.
Prompt 06
Time series decomposition
Treat pm25_ugm3 as a daily time series indexed by date. Decompose it into trend, seasonal (period=365), and residual components using additive decomposition. Output the trend and residual columns next to the original data. Report whether the trend slope is positive or negative and by how much per year.
Prompt 07
Weekly pattern detection
Group pm25_ugm3 by the weekday column. Compute mean and 95% CI for each weekday. Run a one-way ANOVA across weekdays. Report which days differ from the overall mean. Hypothesis to check: Sunday is lower than weekdays.
Prompt 08
Anomaly detection with z-scores
For each row, compute a z-score for temperature_c, humidity_pct, pm25_ugm3, and wind_speed_ms relative to that row's season group (not the global mean). Flag any row where two or more variables have |z|>2.5 as a candidate anomaly. List the flagged rows with their date and which variables were extreme.
Prompt 09
Missing data audit and imputation comparison
For each numeric column, report the count and percentage of missing values, and the dates where they occur. Then compute the column mean three ways: (1) listwise deletion, (2) mean imputation, (3) linear interpolation by date. Show how much the resulting means differ and which approach you'd recommend for this dataset.
Prompt 10
Hypothesis-driven analysis with verification
Hypothesis: humidity is the strongest predictor of low pm25_ugm3 (washout effect). Test this by: (1) Pearson r and p-value for humidity_pct vs pm25_ugm3 (2) Same for rainfall_mm vs pm25_ugm3 (3) Same for wind_speed_ms vs pm25_ugm3 Rank predictors by |r|. Then quote the exact cell ranges used for each calculation so I can audit the result.
For workshop facilitators:
"Show the test assumptions and whether they're met."
10 prompts for research methods design
Prompt 01
Research question generation from data structure
You are a senior environmental health researcher. Looking at this dataset (730 days, variables: temperature, humidity, PM2.5, wind speed, rainfall, with a policy intervention at day 365), generate 5 publishable research questions this data could answer. For each: state the question, the variables involved, and which journal audience it would fit (clinical, environmental, policy, or atmospheric science).
Prompt 02
Study design selection with justification
I want to evaluate whether the policy intervention at day 365 actually reduced PM2.5. Recommend the most rigorous study design this dataset supports — and name the design (interrupted time series, before-after, pre-post comparison, etc.). Justify why that design fits, list its key assumptions, and identify which assumptions this dataset may violate.
Prompt 03
Hypotheses with directional predictions
Based on the dataset variables, formulate 4 testable hypotheses: 2 directional (H1, H2) and 2 non-directional (H3, H4). For each: state H0 and H1, identify the statistical test, specify alpha, and predict effect direction. Frame them so each could appear in a methods section verbatim.
Prompt 04
SMART objectives from a research question
Take this research question: "Did the policy intervention reduce PM2.5 levels in Phayao between 2023 and 2024, controlling for seasonal and meteorological factors?" Rewrite it as one primary objective and 3 secondary objectives, each following SMART criteria (Specific, Measurable, Achievable, Relevant, Time-bound). Output in the format used in grant applications.
Prompt 05
Sample size and power justification
This dataset has 365 pre-policy and 365 post-policy daily readings for PM2.5. Working backward: (1) for a two-sample t-test with alpha=0.05 and power=0.80, what minimum detectable effect size could this n support? (2) what was the observed effect size (Cohen's d) in the data? (3) was this study adequately powered to detect smaller policy effects? Show the calculations.
Prompt 06
Variable operationalization
For a paper using this dataset, write the "Variables and measurements" subsection of the Methods. For each of the 5 numeric variables: state the construct it measures, units, measurement frequency, expected range, treatment of missing values, and any transformation needed (e.g., log transform for skewed PM2.5). Write in formal academic prose, ~250 words.
Prompt 07
Confounder identification and control strategy
For the research question "Did the policy reduce PM2.5?", list potential confounders in this dataset and outside it. For each: classify as measured (in the data), partially measured, or unmeasured. Recommend a statistical control strategy for each measured confounder (e.g., include as covariate, stratify, match). State which unmeasured confounders most threaten causal inference.
Prompt 08
Inclusion / exclusion criteria for the analytic sample
Before analyzing, I need to define which rows go into the analytic sample. Draft inclusion and exclusion criteria as they would appear in a methods section. Consider: missing data thresholds, outlier handling (the 12 anomaly days), seasonal balance, and the policy transition window. For each criterion, justify the decision and state how many rows it would drop. Output as a PRISMA-style flow.
Prompt 09
Statistical analysis plan (SAP)
Write a complete Statistical Analysis Plan for this dataset. Include sections: (1) Primary analysis with exact test, (2) Secondary analyses, (3) Sensitivity analyses, (4) Handling of missing data, (5) Multiple comparisons correction, (6) Software and version, (7) Pre-specified subgroup analyses, (8) What constitutes a positive finding. Format as if for trial registration. Maximum 600 words.
Prompt 10
Limitations and threats to validity
Acting as Reviewer 2 for an environmental health journal, identify the 5 most serious methodological limitations of any study using this dataset to make causal claims about the policy intervention. For each limitation: (1) name it precisely (confounding, regression to mean, ecological fallacy, etc.), (2) explain why it threatens validity here, (3) suggest a specific analytic remedy, (4) state whether the remedy would fully or partially address it.
10 prompts for manuscript writing (IMRaD)
Prompt 01 · Title
Title generation
You are an experienced environmental health author publishing in mid-tier journals (IF 3-5). Based on this study — two years of daily PM2.5, temperature, humidity, wind, and rainfall data from a northern Thailand site, with a policy intervention at day 365 that reduced PM2.5 by ~4 µg/m³ — generate 5 candidate titles. Constraints: - 12-18 words each - Include design type (interrupted time series) - Include the variable of primary interest (PM2.5) - Avoid "novel", "comprehensive", or other low-information words - Output as numbered list with one-sentence rationale for each Rank them at the end from most to least journal-ready.
Why it works
Prompt 02 · Abstract
Structured abstract (250 words)
Write a structured abstract using the following sections: Background, Objective, Methods, Results, Conclusions. Study facts to use: - Dataset: 730 daily readings, Jan 2023 - Dec 2024, Phayao Thailand - Variables: PM2.5, temperature, humidity, wind, rainfall - Intervention: air-quality policy starting day 365 - Primary analysis: Welch's t-test, pre vs post policy PM2.5 - Result: mean PM2.5 dropped from 36.05 to 32.15 ug/m3, p<0.001 - Correlation findings: humidity inversely correlated with PM2.5 (r=-0.87, p<0.001) Hard limits: - Total 250 words (+/- 10) - No citations - Past tense for Methods and Results - Each section header on its own line, bold Do NOT invent numbers not listed above. If a section needs a detail I didn't give, write [TO ADD] in square brackets.
Why it works
Prompt 03 · Introduction
Introduction with funnel structure
Draft the Introduction section (~600 words, 4 paragraphs) following the classic funnel structure: Paragraph 1 (broad context): Global air pollution burden, PM2.5 as a health concern, why low- and middle-income countries are disproportionately affected. End with the regional relevance to Southeast Asia and biomass-burning seasons. Paragraph 2 (narrower context): What is known about PM2.5 drivers in northern Thailand — meteorology, agricultural burning, transboundary haze. Identify the established relationships (humidity washout, temperature-PM2.5 coupling). Paragraph 3 (the gap): What is NOT known. Specifically: limited evaluation of recent local air-quality policies using rigorous interrupted time-series designs at the daily resolution. Paragraph 4 (this study): State the aim, the design (ITS), the primary outcome (daily PM2.5), the timeframe, and the 3 specific objectives. Do not invent citations. Where a citation is needed, insert [CITATION: brief description of source needed] so I can fill in from my reference manager.
Why it works
Prompt 04 · Methods (design)
Methods: Study design and setting
Write the "Study design and setting" subsection of Methods, ~200 words, past tense, formal academic register. Facts to incorporate: - Design: retrospective observational, interrupted time series - Setting: single fixed monitoring station, Phayao province, northern Thailand - Period: 1 Jan 2023 to 30 Dec 2024 (730 consecutive days) - Intervention: hypothetical air-quality policy effective from 1 Jan 2024 - Data source: daily aggregated sensor readings - Ethical approval: not required (no human subjects, public environmental data) Style requirements: - No first person - Define abbreviations on first use - Cite reporting guideline: STROBE for observational studies End with a single sentence noting the absence of conflicts of interest. Do not pad.
Why it works
Prompt 05 · Methods (statistics)
Methods: Statistical analysis
Write the "Statistical analysis" subsection of Methods, 250-300 words. Describe the following analyses in the order performed: 1. Descriptive statistics (mean, SD, median, IQR by season and period) — software: R 4.3.0 2. Pearson correlation matrix with p-values 3. Welch's two-sample t-test comparing pre- vs post-policy PM2.5 4. One-way ANOVA across seasons with Tukey HSD post-hoc 5. Multiple linear regression: PM2.5 ~ temperature + humidity + wind_speed + period 6. Sensitivity analysis: re-run primary t-test excluding the 12 extreme-weather days 7. Missing data: listwise deletion (n=8 cells, <2% missingness) For each, specify: the test, the assumptions checked, alpha level (0.05, two-sided), and what software function or package was used. End with: "All analyses were pre-specified before data inspection." Past tense throughout. Do not justify the choice of tests in this paragraph — that belongs in the Discussion.
Why it works
Prompt 06 · Results (primary)
Results: Descriptives and primary finding
Write the first half of the Results section (~400 words), covering: (1) descriptive characteristics of the dataset and (2) the primary analysis. Use exactly these values: - Total days analyzed: 730 - Missing data: 8 cells (1.1%); listwise deletion applied - Pre-policy mean PM2.5: 36.05 ug/m3 (SD 18.2) - Post-policy mean PM2.5: 32.15 ug/m3 (SD 17.4) - Mean difference: -3.90 ug/m3 (95% CI: -5.51 to -2.29) - Welch's t = -4.74, df = 727.3, p < 0.001 - Cohen's d = 0.22 (small effect) Style rules: - Report numbers exactly as given; round to one decimal place unless otherwise specified - Past tense - No interpretation or comparison to other studies - Refer to "Table 1" and "Figure 1" (do not create them) - Open with a single sentence summarizing the sample If you need a number I didn't provide, insert [VERIFY: what's needed] rather than guessing.
Why it works
Prompt 07 · Results (secondary)
Results: Secondary analyses
Write the second half of the Results section (~350 words), covering secondary and sensitivity analyses, in the same order as listed in the Methods. Use these exact values: Correlation matrix (Pearson r, all p<0.001 unless noted): - humidity vs PM2.5: r = -0.87 - temperature vs PM2.5: r = +0.69 - temperature vs humidity: r = -0.59 - wind_speed vs PM2.5: r = +0.19 - rainfall vs PM2.5: r = -0.23 ANOVA across seasons (PM2.5): F = 218.4, df = 2/725, p < 0.001 Tukey HSD: hot vs cool p<0.001, hot vs rainy p<0.001, rainy vs cool p<0.001 (all pairs differ) Linear regression: R-squared = 0.81, RMSE = 7.8 ug/m3 - humidity coefficient: -0.95 (95% CI -1.04 to -0.86), p<0.001 - temperature coefficient: +1.21 (95% CI 1.02 to 1.40), p<0.001 - wind_speed coefficient: +0.42 (95% CI 0.08 to 0.76), p=0.015 - period (post-policy): -3.42 (95% CI -4.31 to -2.53), p<0.001 Sensitivity analysis (excluding 12 extreme-weather days): mean difference -3.71 ug/m3, p<0.001 — result robust Use Table 2 for the correlation matrix and Table 3 for regression output. Keep prose tight: lead each paragraph with the finding, then the supporting statistic.
Why it works
Prompt 08 · Discussion
Discussion with the 5-part structure
Write the Discussion section (~800 words, 5 paragraphs)
following this structure exactly:
Paragraph 1 — Principal findings (~120 words): Restate the
main results in plain language, without statistics.
Paragraph 2 — Comparison with prior work (~200 words): How do
these findings align with or differ from existing literature?
Mention the humidity-PM2.5 inverse relationship as consistent
with established atmospheric chemistry.
[CITATION needed: Southeast Asia PM2.5 review]
[CITATION needed: ITS evaluation of air policies]
Paragraph 3 — Mechanisms and interpretation (~200 words): Why
might the policy have produced this effect? What does the
humidity dominance in the regression model imply?
Paragraph 4 — Strengths and limitations (~200 words):
Strengths: daily resolution, 2-year window, pre-specified
analysis plan, sensitivity analysis robust.
Limitations: single monitoring site, no co-pollutant data,
unmeasured confounders (traffic, regional fires), policy effect
confounded with year 2 secular trends.
Paragraph 5 — Implications and future research (~100 words):
What should policymakers take from this? What next?
Do not introduce new results. Do not repeat the abstract.
Use hedged language for causal claims ("our findings are
consistent with", not "the policy caused").
Why it works
Prompt 09 · Conclusion
Conclusion (exactly 3 sentences)
Write a Conclusion section of EXACTLY 3 sentences, no more, no less. Sentence 1: State the principal finding without statistics. Sentence 2: State the practical or policy implication. Sentence 3: State the single most important next step for research. Do not begin with "In conclusion" or "In summary". Do not introduce new findings. Use measured language — avoid "proves", "demonstrates conclusively", or "should be implemented".
Why it works
Prompt 10 · Cover letter
Cover letter to editor
You are submitting this manuscript to [TARGET JOURNAL] as corresponding author. Draft a cover letter, single page, ~300 words, with the following structure: Paragraph 1: Submission statement — manuscript title, type (original research), and one-sentence summary of the study. Paragraph 2: Why this fits the journal — reference 1-2 themes from the journal's scope (insert [JOURNAL SCOPE NOTE] as a placeholder for me to fill). Paragraph 3: Significance — the 2 most important contributions (rigorous ITS evaluation of a local policy; meteorology-adjusted effect estimation). Paragraph 4: Standard statements — all authors approved the submission, no conflicts of interest, data and code available on reasonable request, manuscript not under consideration elsewhere, no overlap with prior publications. Close with thanks and the author's name placeholder. Tone: confident but not boastful. Do not use "we believe" or "we hope". Do not list every result.
Why it works
The meta-prompt to teach last:
"Now act as Reviewer 2 for [target journal]. Critique the draft you just wrote. Identify the 5 weakest sentences and suggest a stronger rewrite for each."
The 5 techniques to take home
| Technique | Where it appears | Why it works |
|---|---|---|
Resources & further reading
- ICMJE recommendations on AI use — icmje.org ·
- Nature's policy on generative AI — nature.com (Editorial policies) ·
- WAME AI recommendations — wame.org ·
- STROBE statement — strobe-statement.org ·
- Cochrane on AI in systematic reviews — cochrane.org ·
- Anthropic prompt engineering guide — docs.claude.com ·
Bottom line: