Survey data that actually handles
multiselect
A Python library that treats multiselect questions as first-class objects. Auto-detects format, generates frequency tables, and runs cross-tabulations — no custom wrangling required.
import survy # Handles wide format (hobby_1, hobby_2, hobby_3)# and compact format ("Sport;Book") — auto-detectedsurvey = survy.read_csv("survey_data.csv") # One-liner frequency tableprint(survey["hobby"].frequencies) # value count percent# Sport 45 56.2%# Reading 38 47.5%# Gaming 22 27.5%# Music 18 22.5%
The problem
Multiselect is hard. It shouldn't be.
Multiselect questions appear in nearly every survey, but standard data tools treat them as plain strings or scattered columns — forcing you to write the same wrangling logic project after project.
import pandas as pdfrom collections import Counter df = pd.read_csv("survey_data.csv") # 1. Detect multiselect columns manuallyhobby_cols = [c for c in df.columns if c.startswith("hobby_")] # 2. Normalize wide format to listsdf["hobby"] = df[hobby_cols].apply( lambda row: [v for v in row if pd.notna(v)], axis=1) # 3. Count manuallyall_vals = [v for lst in df["hobby"] for v in lst]freq = pd.Series(Counter(all_vals))freq = freq.sort_values(ascending=False)print(freq)
import survy survey = survy.read_csv("survey_data.csv")print(survey["hobby"].frequencies)
Features
How it works
Everything you need to analyze survey data with multiselect questions, without the boilerplate.
Smart Format Detection
Automatically detects whether your multiselect data uses wide format (hobby_1, hobby_2) or compact format ("Sport;Book") — no configuration needed.
Frequency Tables
Generate count and percent frequency tables for any variable in one line. Multiselect columns are counted correctly without any manual expansion.
Cross-Tabulation
Run cross-tabs between any two variables, including multiselect × single-select. Supports count, percent, and numeric aggregation modes.
Statistical Testing
Optional significance testing built in. Two-proportion z-tests for categorical variables, Welch's t-tests for numeric comparisons.
Multiple File Formats
Read from CSV, Excel, SPSS (.sav), and JSON. Export to all the same formats. The compact parameter controls how multiselect columns are written out.
AI-Ready
Companion agent skills in survy-agent-skills give LLM coding assistants structured guidance to generate correct survy code reliably.
Demo
See the API in action
Clean, readable code for the most common survey analysis tasks.
import survy survey = survy.read_csv("survey_data.csv") # List all variablesprint(survey)# Survey (4 variables)# Variable(id=gender, label=gender, value_indices={'Female': 1, 'Male': 2}, base=3)# Variable(id=yob, label=yob, value_indices={}, base=3)# Variable(id=hobby, label=hobby, value_indices={'Book': 1, 'Movie': 2, 'Sport': 3}, base=3)# Variable(id=animal, label=animal, value_indices={'Cat': 1, 'Dog': 2}, base=3) # Check variable typeprint(survey["hobby"].type)# "multiselect" # Frequency table (one line)print(survey["hobby"].frequencies) # value count percent# Sport 45 56.2%# Reading 38 47.5%# Gaming 22 27.5%# Music 18 22.5%
AI Integration
survy-agent-skills
Reference documents designed for LLM-based coding assistants — Claude, Copilot, and similar tools — so they can read, understand, and write correct survy code without hallucinating parameters or inventing methods.
survey-analysisUse when working with survey data through the survy API — loading files, handling multiselect questions, computing frequencies and crosstabs, exporting, updating labels, and filtering respondents.
questionnaire-readingUse when parsing a questionnaire design document (.docx, .xlsx, .pdf, or plain text). Produces a standardised questionnaire-design.md capturing every question's ID, label, options, and skip logic.
Installation
npx skills add https://github.com/hoanghaoha/survyPhilosophy
Design philosophy
survy is built on a few core beliefs about how survey analysis should work.
Format-agnostic by default
You shouldn't need to know whether your data is wide or compact to start analyzing it. survy detects the format automatically and normalizes it internally — you just load data and start working.
Analysis-ready output
Common survey operations — frequencies, cross-tabs, significance testing — are built in. No custom aggregation logic, no re-implementing the same patterns across every project.
Integrates with your stack
Outputs are standard Python structures. Export to the formats your team already uses: SPSS for analysts, Excel for stakeholders, CSV and JSON for pipelines.