Python · pip install survy

Survey data that actually handles
multiselect

A Python library that treats multiselect questions as first-class objects. Auto-detects format, generates frequency tables, and runs cross-tabulations — no custom wrangling required.

View on GitHub

demo.py

import survy # Handles wide format (hobby_1, hobby_2, hobby_3)# and compact format ("Sport;Book") — auto-detectedsurvey = survy.read_csv("survey_data.csv") # One-liner frequency tableprint(survey["hobby"].frequencies) #   value      count  percent#   Sport         45    56.2%#   Reading       38    47.5%#   Gaming        22    27.5%#   Music         18    22.5%

The problem

Multiselect is hard. It shouldn't be.

Multiselect questions appear in nearly every survey, but standard data tools treat them as plain strings or scattered columns — forcing you to write the same wrangling logic project after project.

Without survy

import pandas as pdfrom collections import Counter df = pd.read_csv("survey_data.csv") # 1. Detect multiselect columns manuallyhobby_cols = [c for c in df.columns              if c.startswith("hobby_")] # 2. Normalize wide format to listsdf["hobby"] = df[hobby_cols].apply(    lambda row: [v for v in row if pd.notna(v)],    axis=1) # 3. Count manuallyall_vals = [v for lst in df["hobby"] for v in lst]freq = pd.Series(Counter(all_vals))freq = freq.sort_values(ascending=False)print(freq)

With survy

import survy survey = survy.read_csv("survey_data.csv")print(survey["hobby"].frequencies)

No column detection boilerplate

No manual list expansion

Works with wide and compact format

Same API for CSV, Excel, SPSS, JSON

Features

How it works

Everything you need to analyze survey data with multiselect questions, without the boilerplate.

Smart Format Detection

Automatically detects whether your multiselect data uses wide format (hobby_1, hobby_2) or compact format ("Sport;Book") — no configuration needed.

Frequency Tables

Generate count and percent frequency tables for any variable in one line. Multiselect columns are counted correctly without any manual expansion.

Cross-Tabulation

Run cross-tabs between any two variables, including multiselect × single-select. Supports count, percent, and numeric aggregation modes.

Statistical Testing

Optional significance testing built in. Two-proportion z-tests for categorical variables, Welch's t-tests for numeric comparisons.

Multiple File Formats

Read from CSV, Excel, SPSS (.sav), and JSON. Export to all the same formats. The compact parameter controls how multiselect columns are written out.

AI-Ready

Companion agent skills in survy-agent-skills give LLM coding assistants structured guidance to generate correct survy code reliably.

Demo

See the API in action

Clean, readable code for the most common survey analysis tasks.

import survy survey = survy.read_csv("survey_data.csv") # List all variablesprint(survey)# Survey (4 variables)#   Variable(id=gender, label=gender, value_indices={'Female': 1, 'Male': 2}, base=3)#   Variable(id=yob, label=yob, value_indices={}, base=3)#   Variable(id=hobby, label=hobby, value_indices={'Book': 1, 'Movie': 2, 'Sport': 3}, base=3)#   Variable(id=animal, label=animal, value_indices={'Cat': 1, 'Dog': 2}, base=3) # Check variable typeprint(survey["hobby"].type)# "multiselect" # Frequency table (one line)print(survey["hobby"].frequencies) #   value      count  percent#   Sport         45    56.2%#   Reading       38    47.5%#   Gaming        22    27.5%#   Music         18    22.5%

AI Integration

survy-agent-skills

Reference documents designed for LLM-based coding assistants — Claude, Copilot, and similar tools — so they can read, understand, and write correct survy code without hallucinating parameters or inventing methods.

hoanghaoha/survy-agent-skills

survey-analysis

Use when working with survey data through the survy API — loading files, handling multiselect questions, computing frequencies and crosstabs, exporting, updating labels, and filtering respondents.

questionnaire-reading

Use when parsing a questionnaire design document (.docx, .xlsx, .pdf, or plain text). Produces a standardised questionnaire-design.md capturing every question's ID, label, options, and skip logic.

Installation

npx skills add https://github.com/hoanghaoha/survy

Philosophy

Design philosophy

survy is built on a few core beliefs about how survey analysis should work.

Format-agnostic by default

You shouldn't need to know whether your data is wide or compact to start analyzing it. survy detects the format automatically and normalizes it internally — you just load data and start working.

Analysis-ready output

Common survey operations — frequencies, cross-tabs, significance testing — are built in. No custom aggregation logic, no re-implementing the same patterns across every project.

Integrates with your stack

Outputs are standard Python structures. Export to the formats your team already uses: SPSS for analysts, Excel for stakeholders, CSV and JSON for pipelines.

Survey data that actually handles multiselect