Skip to Content
DocumentationWorking with pyXLMSThe pyXLMS File Format

Description of the pyXLMS file format

Reading files with parser.read(*, engine="Custom") requires the following data format for crosslinks and crosslink-spectrum-matches. While the column names can be adjusted via the parameter column_mapping, the format of the columns needs to stay the same for successful parsing. Any column that is not required can be safely omitted. This format is also output by transform.to_dataframe().

For an extended description including additional examples please refer to here .

Data required for parsing crosslink-spectrum-matches:

Column NameRequiredData TypeExample 1Description
Alpha PeptidestrPEPKTIDEUnmodified amino acid sequence of the alpha peptide in uppercase letters
Alpha Peptide Modificationsstr(4:[DSS|138.06808])Modifications of the alpha peptide, see ➡️ Modification Encoding
Alpha Peptide Crosslink Positionint4Position of the crosslinker in the alpha peptide (1-based)
Alpha ProteinsstrG3ECR1Accession of the associated protein(s) of the alpha peptide, if multiple proteins are given they should be delimited by a semicolon
Alpha Proteins Crosslink Positionsint, str13Position of the crosslinker in the associated alpha protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Alpha Proteins Peptide Positionsint, str10Position of the alpha peptide in the associated alpha protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Alpha Scorefloat0.837Score of the alpha peptide
Alpha Decoybool, strFalseWhether the alpha peptide is from the target (False) or decoy (True) database
Beta PeptidestrPEPKTIDEUnmodified amino acid sequence of the beta peptide in uppercase letters
Beta Peptide Modificationsstr(4:[DSS|138.06808])Modifications of the beta peptide, see ➡️ Modification Encoding
Beta Peptide Crosslink Positionint4Position of the crosslinker in the beta peptide (1-based)
Beta ProteinsstrG3ECR1Accession of the associated protein(s) of the beta peptide, if multiple proteins are given they should be delimited by a semicolon
Beta Proteins Crosslink Positionsint, str13Position of the crosslinker in the associated beta protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Beta Proteins Peptide Positionsint, str10Position of the beta peptide in the associated beta protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Beta Scorefloat0.837Score of the beta peptide
Beta Decoybool, strFalseWhether the beta peptide is from the target (False) or decoy (True) database
CSM Scorefloat0.99513Score of the crosslink-spectrum-match
Spectrum Filestr2025_03_17_EXP1_RUN3_R1.rawFile name of the spectrum file
Scan Nrint1703The scan number of the spectrum the match was identified in
Precursor Chargeint3Precursor charge of the crosslink spectrum
Retention Timefloat530.17Retention time of the crosslink spectrum in seconds
Ion Mobilityfloat170.41Ion mobility, CCS, or compensation voltage of the crosslink spectrum

Additional resources:

Modification Encoding

Modifications are encoded with the following values:

  • position: The 1-based position of the modification in the peptide sequence
    • should be parse-able as int data type
  • name: The name of the modification, for example Oxidation
    • should be parse-able as str data type
  • mass: The monoisotopic delta mass of the modification, for example 15.994915
    • should be parse-able as float data type

Any modification is then encoded as (position:[name|mass]), multiple modifications should be delimited by a semicolon ;. In the rare case that there is more than one modification on the same position, their names should be delimited by a comma ,. See examples below:

  • (4:[DSS|138.06808])
  • (1:[DSS|138.06808]);(5:[Oxidation|15.994915])
  • (5:[Substitution, Oxidation|13.541798])

Data required for parsing crosslinks:

Column NameRequiredData TypeExample 1Description
Alpha PeptidestrPEPKTIDEUnmodified amino acid sequence of the alpha peptide in uppercase letters
Alpha Peptide Crosslink Positionint4Position of the crosslinker in the alpha peptide (1-based)
Alpha ProteinsstrG3ECR1Accession of the associated protein(s) of the alpha peptide, if multiple proteins are given they should be delimited by a semicolon
Alpha Proteins Crosslink Positionsint, str13Position of the crosslinker in the associated alpha protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Alpha Decoybool, strFalseWhether the alpha peptide is from the target (False) or decoy (True) database
Beta PeptidestrPEPKTIDEUnmodified amino acid sequence of the beta peptide in uppercase letters
Beta Peptide Crosslink Positionint4Position of the crosslinker in the beta peptide (1-based)
Beta ProteinsstrG3ECR1Accession of the associated protein(s) of the beta peptide, if multiple proteins are given they should be delimited by a semicolon
Beta Proteins Crosslink Positionsint, str13Position of the crosslinker in the associated beta protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Beta Decoybool, strFalseWhether the beta peptide is from the target (False) or decoy (True) database
Crosslink Scorefloat0.99513Score of the crosslink

Additional resources:

Last updated on