Workshop Materials

Gen AI Power Tools
for the experienced researcher

One dataset

Copy-ready prompts

Recurring techniques

What's inside

Workshop agenda

Module	Time	Output
Dataset orientation	15 min
Excel analysis prompts	45 min
Research methods prompts	45 min
IMRaD manuscript prompts	60 min
Techniques recap + Q&A	15 min

Module 1

The sensor dataset

⬇

environmental_sensor_data.csv

730 rows × 10 columns · ~50 KB · UTF-8 encoded

Download CSV

One-paragraph description

Module 1

Variable dictionary

Column	Type	Range / Values
`date`	String
`day_of_year`	Integer	1-365
`weekday`	String	Mon, Tue, ..., Sun
`season`	Categorical	cool, hot, rainy
`period`	Binary	pre_policy, post_policy
`temperature_c`	Numeric
`humidity_pct`	Numeric
`pm25_ugm3`	Numeric
`wind_speed_ms`	Numeric
`rainfall_mm`	Numeric

Module 1

Embedded patterns — for instructors

Pattern	Where to find it	Expected value



		r ≈ −0.87, p < 0.001
		r ≈ +0.69, p < 0.001

	PM2.5 ~ season

Teaching tip: Reveal these patterns only after participants run prompts.

Module 2

10 prompts for Excel data analysis

Prompt 01

Descriptive statistics with confidence intervals

Compute mean, median, SD, min, max, and 95% confidence interval
for temperature_c, humidity_pct, pm25_ugm3, wind_speed_ms, and
rainfall_mm. Output as a table. Handle missing values by
listwise deletion and report the n used for each variable.

Prompt 02

Correlation matrix with significance flags

Build a Pearson correlation matrix for all 5 numeric columns.
For each pair, also compute the p-value. In a second table,
flag correlations as: strong (|r|>0.7), moderate (0.4-0.7),
weak (<0.4). Note which pairs are significant at p<0.05.

Prompt 03

Two-sample t-test on the policy intervention

Compare pm25_ugm3 between period="pre_policy" and "post_policy".
Run a Welch's t-test (unequal variances). Report: mean of each
group, mean difference with 95% CI, t-statistic, df, p-value.
State whether the difference is significant at alpha=0.05 and
quote the exact source cells you used.

Prompt 04

One-way ANOVA across seasons

Test whether pm25_ugm3 differs across the three seasons (cool,
hot, rainy). Run one-way ANOVA: report F-statistic, df between
and within, p-value. If p<0.05, follow up with Tukey HSD pairwise
comparisons and identify which seasons differ.

Prompt 05

Linear regression with RMSE

Fit a linear regression: pm25_ugm3 ~ temperature_c + humidity_pct
+ wind_speed_ms. Report: coefficients with 95% CI and p-values,
R-squared, adjusted R-squared, RMSE, and residual standard error.
Then list the 10 rows with the largest absolute residuals — these
are candidate anomalies.

Prompt 06

Time series decomposition

Treat pm25_ugm3 as a daily time series indexed by date. Decompose
it into trend, seasonal (period=365), and residual components
using additive decomposition. Output the trend and residual
columns next to the original data. Report whether the trend
slope is positive or negative and by how much per year.

Prompt 07

Weekly pattern detection

Group pm25_ugm3 by the weekday column. Compute mean and 95% CI
for each weekday. Run a one-way ANOVA across weekdays. Report
which days differ from the overall mean. Hypothesis to check:
Sunday is lower than weekdays.

Prompt 08

Anomaly detection with z-scores

For each row, compute a z-score for temperature_c, humidity_pct,
pm25_ugm3, and wind_speed_ms relative to that row's season group
(not the global mean). Flag any row where two or more variables
have |z|>2.5 as a candidate anomaly. List the flagged rows with
their date and which variables were extreme.

Prompt 09

Missing data audit and imputation comparison

For each numeric column, report the count and percentage of
missing values, and the dates where they occur. Then compute
the column mean three ways: (1) listwise deletion, (2) mean
imputation, (3) linear interpolation by date. Show how much
the resulting means differ and which approach you'd recommend
for this dataset.

Prompt 10

Hypothesis-driven analysis with verification

Hypothesis: humidity is the strongest predictor of low pm25_ugm3
(washout effect). Test this by:
(1) Pearson r and p-value for humidity_pct vs pm25_ugm3
(2) Same for rainfall_mm vs pm25_ugm3
(3) Same for wind_speed_ms vs pm25_ugm3
Rank predictors by |r|. Then quote the exact cell ranges used
for each calculation so I can audit the result.

For workshop facilitators: "Show the test assumptions and whether they're met."

Module 3

10 prompts for research methods design

Prompt 01

Research question generation from data structure

You are a senior environmental health researcher. Looking at this
dataset (730 days, variables: temperature, humidity, PM2.5, wind
speed, rainfall, with a policy intervention at day 365), generate
5 publishable research questions this data could answer. For each:
state the question, the variables involved, and which journal
audience it would fit (clinical, environmental, policy, or
atmospheric science).

Prompt 02

Study design selection with justification

I want to evaluate whether the policy intervention at day 365
actually reduced PM2.5. Recommend the most rigorous study design
this dataset supports — and name the design (interrupted time
series, before-after, pre-post comparison, etc.). Justify why
that design fits, list its key assumptions, and identify which
assumptions this dataset may violate.

Prompt 03

Hypotheses with directional predictions

Based on the dataset variables, formulate 4 testable hypotheses:
2 directional (H1, H2) and 2 non-directional (H3, H4). For each:
state H0 and H1, identify the statistical test, specify alpha,
and predict effect direction. Frame them so each could appear
in a methods section verbatim.

Prompt 04

SMART objectives from a research question

Take this research question: "Did the policy intervention reduce
PM2.5 levels in Phayao between 2023 and 2024, controlling for
seasonal and meteorological factors?" Rewrite it as one primary
objective and 3 secondary objectives, each following SMART
criteria (Specific, Measurable, Achievable, Relevant, Time-bound).
Output in the format used in grant applications.

Prompt 05

Sample size and power justification

This dataset has 365 pre-policy and 365 post-policy daily readings
for PM2.5. Working backward: (1) for a two-sample t-test with
alpha=0.05 and power=0.80, what minimum detectable effect size
could this n support? (2) what was the observed effect size
(Cohen's d) in the data? (3) was this study adequately powered
to detect smaller policy effects? Show the calculations.

Prompt 06

Variable operationalization

For a paper using this dataset, write the "Variables and
measurements" subsection of the Methods. For each of the 5
numeric variables: state the construct it measures, units,
measurement frequency, expected range, treatment of missing
values, and any transformation needed (e.g., log transform
for skewed PM2.5). Write in formal academic prose, ~250 words.

Prompt 07

Confounder identification and control strategy

For the research question "Did the policy reduce PM2.5?", list
potential confounders in this dataset and outside it. For each:
classify as measured (in the data), partially measured, or
unmeasured. Recommend a statistical control strategy for each
measured confounder (e.g., include as covariate, stratify,
match). State which unmeasured confounders most threaten
causal inference.

Prompt 08

Inclusion / exclusion criteria for the analytic sample

Before analyzing, I need to define which rows go into the
analytic sample. Draft inclusion and exclusion criteria as
they would appear in a methods section. Consider: missing
data thresholds, outlier handling (the 12 anomaly days),
seasonal balance, and the policy transition window. For each
criterion, justify the decision and state how many rows it
would drop. Output as a PRISMA-style flow.

Prompt 09

Statistical analysis plan (SAP)

Write a complete Statistical Analysis Plan for this dataset.
Include sections: (1) Primary analysis with exact test,
(2) Secondary analyses, (3) Sensitivity analyses, (4) Handling
of missing data, (5) Multiple comparisons correction, (6)
Software and version, (7) Pre-specified subgroup analyses,
(8) What constitutes a positive finding. Format as if for
trial registration. Maximum 600 words.

Prompt 10

Limitations and threats to validity

Acting as Reviewer 2 for an environmental health journal,
identify the 5 most serious methodological limitations of any
study using this dataset to make causal claims about the policy
intervention. For each limitation: (1) name it precisely
(confounding, regression to mean, ecological fallacy, etc.),
(2) explain why it threatens validity here, (3) suggest a
specific analytic remedy, (4) state whether the remedy would
fully or partially address it.

Module 4

10 prompts for manuscript writing (IMRaD)

Prompt 01 · Title

Title generation

You are an experienced environmental health author publishing in
mid-tier journals (IF 3-5). Based on this study — two years of
daily PM2.5, temperature, humidity, wind, and rainfall data from
a northern Thailand site, with a policy intervention at day 365
that reduced PM2.5 by ~4 µg/m³ — generate 5 candidate titles.

Constraints:
- 12-18 words each
- Include design type (interrupted time series)
- Include the variable of primary interest (PM2.5)
- Avoid "novel", "comprehensive", or other low-information words
- Output as numbered list with one-sentence rationale for each

Rank them at the end from most to least journal-ready.

Why it works

Prompt 02 · Abstract

Structured abstract (250 words)

Write a structured abstract using the following sections:
Background, Objective, Methods, Results, Conclusions.

Study facts to use:
- Dataset: 730 daily readings, Jan 2023 - Dec 2024, Phayao Thailand
- Variables: PM2.5, temperature, humidity, wind, rainfall
- Intervention: air-quality policy starting day 365
- Primary analysis: Welch's t-test, pre vs post policy PM2.5
- Result: mean PM2.5 dropped from 36.05 to 32.15 ug/m3, p<0.001
- Correlation findings: humidity inversely correlated with
  PM2.5 (r=-0.87, p<0.001)

Hard limits:
- Total 250 words (+/- 10)
- No citations
- Past tense for Methods and Results
- Each section header on its own line, bold

Do NOT invent numbers not listed above. If a section needs a
detail I didn't give, write [TO ADD] in square brackets.

Why it works

Prompt 03 · Introduction

Introduction with funnel structure

Draft the Introduction section (~600 words, 4 paragraphs)
following the classic funnel structure:

Paragraph 1 (broad context): Global air pollution burden, PM2.5
as a health concern, why low- and middle-income countries are
disproportionately affected. End with the regional relevance to
Southeast Asia and biomass-burning seasons.

Paragraph 2 (narrower context): What is known about PM2.5 drivers
in northern Thailand — meteorology, agricultural burning,
transboundary haze. Identify the established relationships
(humidity washout, temperature-PM2.5 coupling).

Paragraph 3 (the gap): What is NOT known. Specifically: limited
evaluation of recent local air-quality policies using rigorous
interrupted time-series designs at the daily resolution.

Paragraph 4 (this study): State the aim, the design (ITS), the
primary outcome (daily PM2.5), the timeframe, and the 3 specific
objectives.

Do not invent citations. Where a citation is needed, insert
[CITATION: brief description of source needed] so I can fill
in from my reference manager.

Why it works

Prompt 04 · Methods (design)

Methods: Study design and setting

Write the "Study design and setting" subsection of Methods,
~200 words, past tense, formal academic register.

Facts to incorporate:
- Design: retrospective observational, interrupted time series
- Setting: single fixed monitoring station, Phayao province,
  northern Thailand
- Period: 1 Jan 2023 to 30 Dec 2024 (730 consecutive days)
- Intervention: hypothetical air-quality policy effective from
  1 Jan 2024
- Data source: daily aggregated sensor readings
- Ethical approval: not required (no human subjects, public
  environmental data)

Style requirements:
- No first person
- Define abbreviations on first use
- Cite reporting guideline: STROBE for observational studies

End with a single sentence noting the absence of conflicts of
interest. Do not pad.

Why it works

Prompt 05 · Methods (statistics)

Methods: Statistical analysis

Write the "Statistical analysis" subsection of Methods, 250-300
words. Describe the following analyses in the order performed:

1. Descriptive statistics (mean, SD, median, IQR by season and
   period) — software: R 4.3.0
2. Pearson correlation matrix with p-values
3. Welch's two-sample t-test comparing pre- vs post-policy PM2.5
4. One-way ANOVA across seasons with Tukey HSD post-hoc
5. Multiple linear regression: PM2.5 ~ temperature + humidity +
   wind_speed + period
6. Sensitivity analysis: re-run primary t-test excluding the 12
   extreme-weather days
7. Missing data: listwise deletion (n=8 cells, <2% missingness)

For each, specify: the test, the assumptions checked, alpha
level (0.05, two-sided), and what software function or package
was used. End with: "All analyses were pre-specified before
data inspection."

Past tense throughout. Do not justify the choice of tests in
this paragraph — that belongs in the Discussion.

Why it works

Prompt 06 · Results (primary)

Results: Descriptives and primary finding

Write the first half of the Results section (~400 words),
covering: (1) descriptive characteristics of the dataset and
(2) the primary analysis.

Use exactly these values:
- Total days analyzed: 730
- Missing data: 8 cells (1.1%); listwise deletion applied
- Pre-policy mean PM2.5: 36.05 ug/m3 (SD 18.2)
- Post-policy mean PM2.5: 32.15 ug/m3 (SD 17.4)
- Mean difference: -3.90 ug/m3 (95% CI: -5.51 to -2.29)
- Welch's t = -4.74, df = 727.3, p < 0.001
- Cohen's d = 0.22 (small effect)

Style rules:
- Report numbers exactly as given; round to one decimal place
  unless otherwise specified
- Past tense
- No interpretation or comparison to other studies
- Refer to "Table 1" and "Figure 1" (do not create them)
- Open with a single sentence summarizing the sample

If you need a number I didn't provide, insert [VERIFY: what's
needed] rather than guessing.

Why it works

Prompt 07 · Results (secondary)

Results: Secondary analyses

Write the second half of the Results section (~350 words),
covering secondary and sensitivity analyses, in the same order
as listed in the Methods.

Use these exact values:

Correlation matrix (Pearson r, all p<0.001 unless noted):
- humidity vs PM2.5: r = -0.87
- temperature vs PM2.5: r = +0.69
- temperature vs humidity: r = -0.59
- wind_speed vs PM2.5: r = +0.19
- rainfall vs PM2.5: r = -0.23

ANOVA across seasons (PM2.5): F = 218.4, df = 2/725, p < 0.001
Tukey HSD: hot vs cool p<0.001, hot vs rainy p<0.001, rainy vs
cool p<0.001 (all pairs differ)

Linear regression: R-squared = 0.81, RMSE = 7.8 ug/m3
- humidity coefficient: -0.95 (95% CI -1.04 to -0.86), p<0.001
- temperature coefficient: +1.21 (95% CI 1.02 to 1.40), p<0.001
- wind_speed coefficient: +0.42 (95% CI 0.08 to 0.76), p=0.015
- period (post-policy): -3.42 (95% CI -4.31 to -2.53), p<0.001

Sensitivity analysis (excluding 12 extreme-weather days):
mean difference -3.71 ug/m3, p<0.001 — result robust

Use Table 2 for the correlation matrix and Table 3 for
regression output. Keep prose tight: lead each paragraph with
the finding, then the supporting statistic.

Why it works

Prompt 08 · Discussion

Discussion with the 5-part structure

Write the Discussion section (~800 words, 5 paragraphs)
following this structure exactly:

Paragraph 1 — Principal findings (~120 words): Restate the
main results in plain language, without statistics.

Paragraph 2 — Comparison with prior work (~200 words): How do
these findings align with or differ from existing literature?
Mention the humidity-PM2.5 inverse relationship as consistent
with established atmospheric chemistry.
[CITATION needed: Southeast Asia PM2.5 review]
[CITATION needed: ITS evaluation of air policies]

Paragraph 3 — Mechanisms and interpretation (~200 words): Why
might the policy have produced this effect? What does the
humidity dominance in the regression model imply?

Paragraph 4 — Strengths and limitations (~200 words):
Strengths: daily resolution, 2-year window, pre-specified
analysis plan, sensitivity analysis robust.
Limitations: single monitoring site, no co-pollutant data,
unmeasured confounders (traffic, regional fires), policy effect
confounded with year 2 secular trends.

Paragraph 5 — Implications and future research (~100 words):
What should policymakers take from this? What next?

Do not introduce new results. Do not repeat the abstract.
Use hedged language for causal claims ("our findings are
consistent with", not "the policy caused").

Why it works

Prompt 09 · Conclusion

Conclusion (exactly 3 sentences)

Write a Conclusion section of EXACTLY 3 sentences, no more, no
less.

Sentence 1: State the principal finding without statistics.
Sentence 2: State the practical or policy implication.
Sentence 3: State the single most important next step for
research.

Do not begin with "In conclusion" or "In summary". Do not
introduce new findings. Use measured language — avoid "proves",
"demonstrates conclusively", or "should be implemented".

Why it works

Prompt 10 · Cover letter

Cover letter to editor

You are submitting this manuscript to [TARGET JOURNAL] as
corresponding author. Draft a cover letter, single page,
~300 words, with the following structure:

Paragraph 1: Submission statement — manuscript title, type
(original research), and one-sentence summary of the study.

Paragraph 2: Why this fits the journal — reference 1-2 themes
from the journal's scope (insert [JOURNAL SCOPE NOTE] as a
placeholder for me to fill).

Paragraph 3: Significance — the 2 most important contributions
(rigorous ITS evaluation of a local policy; meteorology-adjusted
effect estimation).

Paragraph 4: Standard statements — all authors approved the
submission, no conflicts of interest, data and code available
on reasonable request, manuscript not under consideration
elsewhere, no overlap with prior publications.

Close with thanks and the author's name placeholder.

Tone: confident but not boastful. Do not use "we believe" or
"we hope". Do not list every result.

Why it works

The meta-prompt to teach last: "Now act as Reviewer 2 for [target journal]. Critique the draft you just wrote. Identify the 5 weakest sentences and suggest a stronger rewrite for each."

Module 5

The 5 techniques to take home

Technique	Where it appears	Why it works

Reference

Resources & further reading

ICMJE recommendations on AI use — icmje.org ·
Nature's policy on generative AI — nature.com (Editorial policies) ·
WAME AI recommendations — wame.org ·
STROBE statement — strobe-statement.org ·
Cochrane on AI in systematic reviews — cochrane.org ·
Anthropic prompt engineering guide — docs.claude.com ·

Bottom line:

Gen AI Power Toolsfor the experienced researcher

One dataset

Copy-ready prompts

Recurring techniques

Workshop agenda

The sensor dataset

environmental_sensor_data.csv

One-paragraph description

Variable dictionary

Embedded patterns — for instructors

10 prompts for Excel data analysis

Descriptive statistics with confidence intervals

Correlation matrix with significance flags

Two-sample t-test on the policy intervention

One-way ANOVA across seasons

Linear regression with RMSE

Time series decomposition

Weekly pattern detection

Anomaly detection with z-scores

Missing data audit and imputation comparison

Hypothesis-driven analysis with verification

10 prompts for research methods design

Research question generation from data structure

Study design selection with justification

Hypotheses with directional predictions

SMART objectives from a research question

Sample size and power justification

Variable operationalization

Confounder identification and control strategy

Inclusion / exclusion criteria for the analytic sample

Statistical analysis plan (SAP)

Limitations and threats to validity

10 prompts for manuscript writing (IMRaD)

Title generation

Structured abstract (250 words)

Introduction with funnel structure

Methods: Study design and setting

Methods: Statistical analysis

Results: Descriptives and primary finding

Results: Secondary analyses

Discussion with the 5-part structure

Conclusion (exactly 3 sentences)

Cover letter to editor

The 5 techniques to take home

Resources & further reading

Gen AI Power Tools
for the experienced researcher