Baishiya Karst Cave
Baishiya Karst Cave (BKC; 35.45° N, 102.57° E, 3,280 metres above sea level) is located in the Ganjia Basin, northeastern Tibetan Plateau4 (Extended Data Fig. 1 and Supplementary Information section 2). It is a karstic cave and lies around 20 m above the riverbed of the Jianglagou river in front of the cave. The Xiahe mandible (Xiahe 1) was found in this cave in 1980 and has been dated to at least 160 ka by U-series dating of carbonate crust on the mandible3. The Xiahe 1 individual has been identified as a Denisovan by palaeoproteomic analysis3. Two connected units (T2 and T3, 1 m × 2 m, respectively) were excavated in this cave in 2018 and 2019, revealing 11 layers. The chronological framework for layers 2–10 built by optically stimulated luminescence (OSL) and radiocarbon dating methods indicates that prehistoric hominins occupied the cave from around 190 ka to around 30 ka (ref. 4). Denisovan mtDNA extracted from sediments of layers 4 and 7 indicates that the site was occupied by Denisovans at around 100 ka, around 60 ka and possibly as late as 45 ka (ref. 4), providing further unequivocal evidence of Denisovan occupation at BKC. In addition, Denisovan mtDNA was also recovered from layers 2 and 3, and the age of this is still under detailed calculation and evaluation.
Chronological framework
A previous study4 used OSL and radiocarbon dating to establish a chronological framework for layers 2–10 of T2. In this study, we apply this framework to both T2 and T3, but use the maximum age range to represent the age of each layer (Supplementary Table 2.2). See Supplementary Information section 2 for details.
Sample selection
A total of 3,642 bone specimens were systematically collected from T2 and T3 during excavations in 2018 and 2019, 3,582 of which were recorded with three-dimensional coordinates, all stored in the Key Laboratory of Western China’s Environmental Systems (Ministry of Education) at Lanzhou University. Among these, we selected almost all bones longer than 20 mm (n = 2,407), as well as some smaller bone fragments (shorter than 20 mm) with morphological characteristics (n = 160) suitable for morphological taxonomic analysis, for a total of 2,567 specimens (Supplementary Table 2.1). Among these specimens, 2,407 specimens were uncovered from layers 1 to 11, 138 specimens derived from two historical-era pits (H1 and H3) and 22 specimens lacked specific layer information. Because layers 1–10 are seriously destroyed by the two historical pits (H1 and H3), only parts of these layers are preserved4, resulting in the small number of bones (n = 764) collected from layers 1–9. Therefore, about 64% of the analysed bone assemblage derives from layers 10 and 11 (n = 1,643). In this assemblage, we selected 1,857 specimens for ZooMS analysis, including 53 specimens with taxonomic identifications based on morphological analysis for proteomic confirmation of their taxonomic assignments (Supplementary Table 2.1).
ZooMS reference samples
Current ZooMS reference databases are dominated by fauna from western Eurasia, supplemented with family-specific reference datasets. In addition, a variety of predicted peptide mass marker series from COL1 sequences exist based on available genomic resources. Nevertheless, a substantial number of mammalian taxa that are potentially present at BKC are not represented in any of these resources. We therefore collected 39 specimens with known taxonomic information, representing 39 species and 29 genera, that are known to currently inhabit the wider Himalayan region, or are known to have been present in this region in the Pleistocene or Holocene (Supplementary Data 3). Among these reference specimens, ten are of Pleistocene or Holocene age, whereas others are modern specimens. The 39 reference specimens also included 6 specimens from species for which genomic sequences are available but for which no reference matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) collagen type I peptide mass fingerprints have been reported before, such as the giant panda (Ailuropoda melanoleuca) and the Tibetan antelope (Pantholops hodgsonii). Reference specimens were collected from Lanzhou University, the Institute of Zoology, Chinese Academy of Sciences (CAS) and the Northwest Institute of Plateau Biology, CAS (Supplementary Data 3).
After obtaining peptide mass fingerprints for each reference specimen (see extraction method below), 14 species, representing 14 genera, were selected for LC–MS/MS analysis to validate (novel) peptide marker masses.
Zooarchaeological analysis
Diagnostic skeletal elements were identified using modern and ancient comparative vertebrate specimens from the Key Laboratory of Western China’s Environmental Systems (Ministry of Education) at Lanzhou University and comparative osteological atlases31,32,33,34,35,36,37. The identified taxa were classified into six body-size classes (class 0–class V) based on live weight, following previously established criteria38: class I (Marmota himalayana, Lepus oiostolus, Hystrix sp., Mustelidae, Vulpes ferrilata); class II (Procapra sp., Moschus sp., Panthera sp., Canis lupus); class III (Pseudois nayaur, Ovis ammon, Cervus elaphus, Crocuta crocuta); class IV (Equus sp., Bos mutus); and class V (Coelodonta sp.). For carnivores, large animals correspond to class II or class III, and small animals correspond to class I. For herbivores, mega, large, medium and small animals correspond to classes V, IV, III or II and I, respectively. In addition, birds are classified as class 0.
Each specimen was observed under a portable low-magnification hand lens (20×) using an oblique light source, and was observed and photographed using a Keyence VHX-7000N digital microscope at different magnifications when necessary. Bone surface modifications, including anthropogenic modifications (cut and chop marks, percussion marks and notches and impact bone flakes), animal gnawing traces (pits, punctures, scores, furrowing and so on), trampling and modern mechanical modifications produced by excavation tools, were recorded according to the literature39,40,41,42,43. The weathering stages of the bone surface were classified into four levels (I–IV) according to the classical criteria39. We also recorded burnt bone specimens on the basis of macroscopic colour changes during exposure to fire44. The fracture types of bone fragments longer than 20 mm were recorded and classified according to previously described criteria45. The five anatomical regions in this paper are head (including horn and antler); axial column (including vertebrae, ribs and pelvis); front and hind limbs (including scapulae, humeri, radii and ulnae, metacarpi, femurs, tibiae and metatarsals); carpals and tarsals; and feet, on the basis of the structure defined in a previous report46.
Bone retouchers were identified according to previously described criteria47. Expedient bone tools were identified on the basis of quantitative criteria described previously48,49. Specifically, bone fragment specimens with more than four to six flake scars resulting from knapping and/or retouching, arranged with a high frequency of continuity and/or interspersion, can be interpreted as having been formed by purposeful percussion and are identified as expedient bone tools.
For our study, measures of taxonomic abundance are based on the number of identified specimens (NISP) rather than the minimum number of individuals (MNI). On one hand, the NISP can be compared quantitatively with the ZooMS data, which are essentially a NISP count. On the other hand, the MNI of most taxa in each layer is 1 or 2 (Supplementary Data 5A), which is not conductive to estimates of taxonomic abundance. The NISP in our study is calculated as the number of specimens identified to species (for example, Aquila chrysaetos) and genus (for example, Coelodonta sp.), and occasionally to (sub)family (for example, Caprinae)50.
ZooMS collagen extraction and digestion
We sampled approximately 10–30 mg of each specimen. Two protocols were applied to the BKC bone samples as well as the reference samples: the non-destructive ammonium bicarbonate buffer (AmBic) extraction method51 and the acid-insoluble (acid) extraction method19. Details for both protocols have been provided previously12,19. The acid protocol was applied to all reference samples and all BKC specimens. In addition, the AmBic protocol was applied to all reference samples and 192 BKC specimens. In brief, before extraction, all of the bone samples were stored in 100 µl AmBic solution overnight, to remove any soluble contamination. After removing this solution, for the AmBic protocol, 100 µl ammonium bicarbonate (50 mM) was added to the bone samples and followed by incubation for one hour at 65 °C. Then, 50 µl of the supernatant containing the soluble protein was transferred to a new 96-well plate or Eppendorf tube and digested with trypsin (Promega, V115A) at 37 °C overnight. Digestion was terminated with 1 µl of 5% trifluoroacetic acid (TFA). Peptide clean-up and purification was done on C18 ZipTips (Thermo Fisher Scientific) or C18 plates (Thermo Fisher Scientific), with elution in 50 µl or 100 µl, respectively, with 0.1% TFA washing solution and 0.1% TFA in 50% acetonitrile as conditioning and elution solution. For the acid protocol, bone samples were demineralized in 0.6 M hydrochloric acid (HCl) solution for one to two days. The HCl was discarded, and the pellet was washed three times with 50 mM AmBic until the pH was around 8. Subsequent steps were identical to the AmBic protocol. All reference samples were processed in individual Eppendorf tubes, and the BKC bone samples were processed using 96-well plates.
Extraction of the Xiahe 2 bone proteome
Four proteomic digests were generated and used for LC–MS/MS analysis of the Xiahe 2 rib specimen, deriving from a total of two bone samples taken from this specimen. Both samples were first stored overnight in 50 mM AmBic to remove any soluble protein contamination potentially present on the bone surfaces. Subsequently, the first protein extraction of the first sample concerned an AmBic ZooMS extraction in which the sample was incubated in 50 mM AmBic for one hour at 65 °C. Digestion of the solubilized proteins within the resulting supernatant was subsequently performed in solution and overnight, at 37 °C (trypsin, Promega, V115A). The remaining pellet as well as the second bone sample were subsequently demineralized in 0.6 M HCl for one or two days, until demineralization was observed. The extracts were centrifuged, and the acidic solution containing acid-soluble proteins was removed, dried down in a speedvac and resuspended in 50 mM AmBic. Subsequently, protein digestion was performed overnight using trypsin at 37 °C. Finally, the fourth extract concerned the resuspension of the remaining, demineralized protein pellet of the first sample in 50 mM AmBic. Again, protein digestion was conducted using trypsin at 37 °C. In each case, peptides were acidified to terminate the digestion process using 1% TFA, briefly centrifuged to pellet any remaining mineral or protein residues and purified using C18 ZipTips as described above for ZooMS. LC–MS/MS analysis was performed on 10 µl of each of the four peptide eluates.
MALDI-TOF MS
For MALDI-TOF MS analyses, 1 µl of the eluted peptides was mixed with 1 µl of matrix solution (1% α-cyano-4-hydroxycinnamic acid in the conditioning solution) and spotted onto a 384-well MALDI target plate. Each sample was spotted in triplicate. All of the MS spectra data were obtained at the Fraunhofer IZI, Leipzig, using the Autoflex Speed LRF MALDI-TOF (Bruker). Spectral replicate merging was performed in R using the MALDIquant v.1.22.1 and MALDIquantForeign v.0.14 packages52,53,54. The peptide marker masses were observed in mMass55, and identified in comparison with the updated ZooMS database (Supplementary Data 4).
LC–MS/MS analysis
Ten microlitres of the eluted peptides of 14 species (Supplementary Data 3) was processed using LC–MS/MS analysis at the University of Copenhagen.
For liquid chromatography, an easy NanoLC from Thermo Fisher Scientific, with the gradient specified in Supplementary Table 3.1, was used at 250 nl per min. The loading was made at 500 nl per min. The mobile phases are A: 5% acetonitrile, 0.1% formic acid; B: 95% acetonitrile, 0.1% formic acid. The emitter consisted of a Polymicro flexible fused silica capillary tubing of 75 µm inner diameter and 20 cm length home pulled and packed with C18 bounded silica particles of 1.9 µm diameter (ReproSil-Pur, C18-AQ, Dr. Maisch). The column was mounted on an electrospray source with a column oven set at 40 °C.
For mass spectrometry, the source voltage was +2,000 V with an ion transfer tube set at 275 °C. An Exploris 480 from Thermo Fisher Scientific was operating in data-dependent mode consisting of a first MS1 scan at a resolution of 120,000 between m/z values of 350 and 1,400. The top ten monoisotopic precursors were selected if above an intensity of 2 × 104 with a charge state between 2 and 6, and were then dynamically excluded after one appearance with their isotopes (±20 ppm) for 20 s. The selected peptides were acquired for MS2 at an Orbitrap resolving power of 60,000, with a normalized collision energy (HCD) set at 30%, a quadrupole isolation width of 1.2 m/z and a first m/z of 100.
COL1 peptide marker validation
Peptide sequences were acquired from the LC–MS/MS raw data using PEAKS v.7.0 (ref. 56), with closely related species with available collagen sequences forming reference databases. Deamidation (NQ), hydroxylation (P) and oxidation (M) were set as variable modifications, and no fixed modifications were included. Parent mass error tolerance was set to 10 ppm, fragment ion tolerance to 0.07 Da and trypsin as the protease. Peptides were filtered for a false discovery rate (FDR) of 0.5%, on the basis of previous conservative recommendations made for PEAKS analysis of skeletal palaeoproteomes57, and sequence reconstruction was performed in R52 using the packages Tidyverse v.2.0.0 (ref. 58), Janitor v.2.2.0 (ref. 59), Biostrings v.2.68.1 (ref. 60) and msa61. Polymorphisms between the references and the reconstructed sequence within the peptide marker were manually validated in PEAKS.
LC–MS/MS analysis of Xiahe 2
Peptide sequences were identified from LC–MS/MS data using PEAKS v.7.0 (ref. 56) with a database consisting of the human reference proteome (UP000005640 with one protein sequence per gene, downloaded 22-02-2022) with added archaic variation from Neanderthals62 and a Denisovan63. Parent mass error tolerance was set to 10.0 ppm and fragment mass error tolerance to 0.07 Da. Deamidation (NQ), hydroxylation (P), oxidation (M) and pyro-Glu (E and Q) were set as variable modifications. Peptides were filtered for 0.5% FDR and exported for further processing.
Construction of the Xiahe 2 protein sequences and phylogenetic analysis
For proteomic data from the Xiahe 2 individual, protein sequences were reconstructed for all proteins with five or more peptides (Supplementary Table 4.1 and Supplementary Data 6) in R52 using the packages Janitor v.2.2.0 (ref. 59), Biostrings v.2.68.1 (ref. 60) and Tidyverse v.2.0.0 (ref. 58). The proteins.csv file exported from PEAKS was used to get an overview of the proteins present in the dataset and their abundance, whereas the protein-peptides.csv file was used for sequence reconstruction. For each amino acid position, a majority consensus was called on the basis of peptide counts. The reconstructed sequences were thereafter aligned using Geneious Prime v.2023.2.1 with a list of reference proteomes, and all polymorphisms were manually validated in PEAKS. The reference proteomes used were the human reference proteome UP000005640 (downloaded from Uniprot 17-01-2022), translations of three Neanderthal genomes62 and a translation of a Denisovan genome63.
A phylogeny was constructed using the reference proteomes outlined above, as well as corresponding proteins from Gorilla gorilla, Pongo abelii and Pan troglodytes. Sequences were aligned in Geneious Prime to identify and correct for isoform variations between references. In brief, positions 1264–1270 (based on P29400) of COL4A5 were removed for the Neanderthals and P. troglodytes; positions 261–313 (based on P12107) of COL11A1 were removed for all individuals; 235 unknowns were added after position 219 (based on P39060) of COL18A1 for G. gorilla, owing to missing portions of the reference sequence; and positions 3–21 of COL27A1 (based on Q8IZC6) for G. gorilla were replaced with unknowns owing to major sequence differences. Protein sequences were concatenated by the individual, with a concatenated protein length of 31,781 amino acid positions.
Phylogenetic analyses were performed using the Dayhoff substitution model and partitioning by protein. The P. abelii sequence was used as an outgroup. A Bayesian tree was generated using MrBayes v.3.2.7 (ref. 64), with settings following a previous report3, except for the analysis only being run for 100,000 generations, because the standard deviation of split frequencies was already zero by this point. RAxML v.4.0 (ref. 65) analysis was run through Geneious Prime, with 1,000 bootstraps (rapid bootstrapping with search for best-scoring ML tree). The phylogeny was plotted in R using Ape v.5.7.1 (ref. 66).
Faunal community analysis
To better compare datasets obtained by morphological observations and ZooMS, we united all results under ZooMS taxonomic group levels, because these are generally less specific than the species designations obtained through morphological analysis. For example, we assign the taxa Bos cf. mutus and Bos grunniens from the morphological taxonomic groups into Bos sp., because these species cannot be separated using ZooMS. However, to better display the morphological data, we kept the species information of some specimens by using the species names, such as Bos grunniens or Bos cf. mutus, in some places when necessary. Taxon names used in the text are according to the specific datasets (morphology, ZooMS and both combined) and the source dataset information is indicated wherever necessary. The combined dataset, a hybrid of morphological and ZooMS taxonomic identifications, was used to analyse the composition of the faunal community at BKC and to calculate diversity indices. Within this dataset, we excluded groups such as birds, rodents and taxonomic assignments to inter-order or sub-order levels.
The Shannon–Weaver and Simpson indices, including the confidence intervals (97.5–2.5%), were calculated according to a previously described method67 to assess the community ecology of each stratigraphic unit. These were calculated in R52 using the vegan v.2.6-4 package68. The values of Shannon–Weaver and Simpson indices are positively proportional to diversity, suggesting that higher index values indicate higher diversity.
Deamidation
The deamidation of glutamine (Gln; Q) to glutamic acid (Glu; E) in bone specimens is one of the common post-translational modifications in archaeological contexts. According to previous studies69,70, an effective method has been proposed to calculate glutamine deamidation for selected collagen type I peptides. Here we calculate glutamine deamidation for two peptides, COL1α1 508–519 (GVQGPPGPAGPR; P1105) and COL1α1 435–453 (DGEAGAQGPPGPAGPAGER; P1706), that have relatively slow deamidation rates69. They are also commonly present in BKC MALDI-TOF MS spectra, and their peptide sequences are identical for most terrestrial mammals19. Deamidation is expressed on a scale of 0 to 1, with 1 indicating no deamidation and 0 indicating complete deamidation.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.