pyaota.generator.wordexport2rawyaml module

Convert a ZyBooks-exported zip into a YAML multiple-choice question bank.

Supported zip layouts

  1. QTI XML (preferred): an items/ folder containing one QTI v2.1 XML file per question.

  2. Word documents (legacy fallback): three .docx files following the ZyBooks naming convention.

Usage:

python zybooks_zip_to_yaml.py export.zip output.yaml

Requirements:

pip install python-docx pyyaml (Word path only)

pyaota.generator.wordexport2rawyaml.convert(zip_path, output_yaml_path)[source]

Convert a ZyBooks-exported zip into a YAML question bank.

Auto-detects the format: if the zip contains an items/ directory with XML files, the QTI parser is used. Otherwise, falls back to the Word document parser.

Parameters:
  • zip_path (str or Path) – Path to the exported zip file.

  • output_yaml_path (str or Path) – Path to the output YAML file.

pyaota.generator.wordexport2rawyaml.convert_subcommand(args)[source]

CLI subcommand wrapper for convert().

pyaota.generator.wordexport2rawyaml.find_docs_in_extracted_dir(extract_dir)[source]

Return (questions_doc_path, answer_key_doc_path) from an extracted ZyBooks zip.

pyaota.generator.wordexport2rawyaml.is_choice_line(text)[source]
Return type:

bool

pyaota.generator.wordexport2rawyaml.is_question_start(text)[source]
Return type:

bool

pyaota.generator.wordexport2rawyaml.main()[source]
pyaota.generator.wordexport2rawyaml.paragraph_text(paragraph)[source]

Extract text from a paragraph, inserting spaces between runs at font boundaries.

Word splits text into “runs” at every formatting change. paragraph.text simply concatenates the run texts, which drops whitespace when a font change falls between two words (e.g. “Hello” in bold + “world” in normal becomes “Helloworld”). This helper inserts a single space between consecutive runs when neither side already has whitespace at the boundary.

pyaota.generator.wordexport2rawyaml.parse_answer_key(answer_doc_path)[source]

Parse an answer-key .docx where each answer is on a line like:

1) b
2) a

Returns a dict mapping question number to answer letter ('a''d').

pyaota.generator.wordexport2rawyaml.parse_qti_item(xml_path)[source]

Parse a single QTI v2.1 assessmentItem XML file into a question dict.

Returns None if the file is not a valid assessmentItem (e.g. manifest).

pyaota.generator.wordexport2rawyaml.parse_qti_items_dir(items_dir)[source]

Parse all QTI XML files in a directory, returning a numerically sorted list.

pyaota.generator.wordexport2rawyaml.parse_questions(questions_doc_path, answers)[source]

Parse the questions .docx into a list of dicts matching the YAML schema:

  • id: Q1 points: 1 type: mcq stem: [ {type: text|code, text: “…”, …}, … ] choices: [ {key: ‘a’, text: ‘…’}, … ] correct: ‘b’