Molecular characterization of accripin11, a soluble shell protein with an acidic C‐terminus, identified in the prismatic layer of the Mediterranean fan mussel Pinna nobilis (Bivalvia, Pteriomorphia)

We have identified a novel shell protein, accripin11, as a major soluble component of the calcitic prisms of the fan mussel Pinna nobilis. Initially retrieved from a cDNA library, its full sequence is confirmed here by transcriptomic and proteomic approaches. The sequence of the mature protein is 103 residues with a theoretical molecular weight of 11 kDa and is moderately acidic (pI 6.74) except for its C‐terminus which is highly enriched in aspartic acid. The protein exhibits a peculiar cysteine pattern in its central domain. The full sequence shares similarity with six other uncharacterized molluscan shell proteins from the orders Ostreida, Pteriida and Mytilida, all of which are pteriomorphids and produce a phylogenetically restricted pattern of nacro‐prismatic shell microstructures. This suggests that accripin11 is a member of a family of clade‐specific shell proteins. A 3D model of accripin11 was predicted with AlphaFold2, indicating that it possesses three short alpha helices and a disordered C‐terminus. Recombinant accripin11 was tested in vitro for its ability to influence the crystallization of CaCO3, while a polyclonal antibody was able to locate accripin11 to prismatic extracts, particularly in the acetic acid‐soluble matrix. The putative functions of accripin11 are further discussed in relation to shell biomineralization.

We have identified a novel shell protein, accripin11, as a major soluble component of the calcitic prisms of the fan mussel Pinna nobilis. Initially retrieved from a cDNA library, its full sequence is confirmed here by transcriptomic and proteomic approaches. The sequence of the mature protein is 103 residues with a theoretical molecular weight of 11 kDa and is moderately acidic (pI 6.74) except for its C-terminus which is highly enriched in aspartic acid. The protein exhibits a peculiar cysteine pattern in its central domain. The full sequence shares similarity with six other uncharacterized molluscan shell proteins from the orders Ostreida, Pteriida and Mytilida, all of which are pteriomorphids and produce a phylogenetically restricted pattern of nacro-prismatic shell microstructures. This suggests that accrip-in11 is a member of a family of clade-specific shell proteins. A 3D model of accripin11 was predicted with AlphaFold2, indicating that it possesses three short alpha helices and a disordered C-terminus. Recombinant accripin11 was tested in vitro for its ability to influence the crystallization of CaCO 3 , while a polyclonal antibody was able to locate accripin11 to prismatic extracts, particularly in the acetic acid-soluble matrix. The putative functions of accripin11 are further discussed in relation to shell biomineralization.
Mollusk shells are natural composite materials that display exceptional mechanical properties, despite being synthesized at ambient temperatures, pressures and biological pH values [1]. These properties emerge from the complex and hierarchical arrangement of crystalline units in well-defined microstructures exemplified by nacre, prisms and crossed-lamellar. Their construction is controlled by extracellular organic macromolecules, many of which become occluded in the mature mineral phase. While many mollusk shells are mostly made of two to three layers of calcium carbonate (calcite or aragonite, or the association of the two polymorphs), they also contain a minute amount (usually < 1%) of proteins, glycoproteins, polysaccharides, pigments and lipids [2] that drastically modify the properties of the mature biomineral.
Of the array of occluded macromolecules, proteins have long been the most studied: 141 years separate the first "modern" chemical analysis of a shell performed by Fr emy (which led to the term "conchiolin" [3]) from the publication of the first full sequence of a shell protein, nacrein, which is associated with the nacreous layer of the Japanese pearl oyster [4]. In between, proteins extracted in bulk were the focus of numerous biochemical characterizations (summarized in [5][6][7]). Most of these studies have focused on the nacre of diverse species and have disregarded other common types of shell microstructures such as the prismatic layer [8]. On account of its distinctive microstructure, comprising flat aragonitic tablets arranged to hinder crack propagation, nacre possesses a fracture toughness of at least a thousand times higher than that required for fragmenting geological aragonites [9]. Consequently, the knowledge of a limited number of nacre proteins generated an overly optimistic perspective to mimic these properties in vitro [10][11][12]. Today, with high-throughput screening techniques, that is, transcriptomics combined with proteomics, it is clear that the deposition of nacre, and likely all types of shell microstructures, requires a large set of proteins of extraordinarily diverse functions [8]. Some of these functions include calcium binding, enzymatic activities, modification of 3D frameworks (e.g., chitin modification), inhibition of protease activity, antibacterial activity and signaling functions. Note that relatively few functions of shell-forming proteins have been verified by in vitro experiments.
Many proteins are occluded into the shell during the shell-forming process, constituting the shell matrix (the "shellome") while othersmore elusiveare supposed to remain at the interface between the calcifying epithelium and the mineralization front, before being degraded or recycled. Both occluded and non-occluded proteins are likely to play important roles in mineral deposition, but this aspect, evidenced for the first time 15 years ago [13], remains largely unexplored [14].
The growing list of proteins that are identified as true shell components comprises several members exhibiting functional domains similar to those of proteins found in non-mineralizing systems. For example, nacrein, referred to above, contains a carbonic anhydrase (CA) domain and, from experimental testing, possesses CA activity [15]. Tyrosinases have been identified and may function in cross-linking, pigmentation and defense mechanisms [16,17]. Proteins with a typical extracellular matrix signature, such as Von Willebrand factor type A domain, also belong to this category [18] as well as proteins with protease inhibitor domains [19,20].
Conversely, the majority of shell-associated proteins share no sequence similarity with proteins in relatively highly studied non-calcifying systems. This is typically the case of proteins with primary structures dominated by low-complexity domains (LCDs) or repetitive lowcomplexity domains (RLCDs), also called "compositionally biased regions." Almost none of the LCD/ RLCD-containing proteins have been assigned unequivocal molecular functions in biomineralization, while their abundance in calcified tissues as well as their diversity remains enigmatic [8]. Some exceptions, such as aspartic acid-/glutamic acid-rich domain containing proteins, were hypothesized to interact with calcium ions or with calcium carbonate surfaces long before the actual discovery of their primary structures [21,22]. Among the shell proteins with no homologs in non-calcifying systems, one also finds members that do not exhibit any LCDs/RLCDs in spite of having biased AA composition. For instance, upsalin [23], found in the nacreous layer of the freshwater mussel Unio pictorum, is of that type.
Here we describe accripin11, which belongs to this second category but exhibits a composite primary structure, while 90% of its sequence does not have any homology with known proteins of non-mineralizing systems and is only slightly "compositionally biased," its C-terminus contains a short LCD with an aspartic acid-rich domain. Accripin11 was identified in the shell of the fan mussel Pinna nobilis, a species endemic to the Mediterranean Sea and also its largest bivalve representative. This species is on the verge of extinction due to a pandemic caused by a protozoan parasite [24,25]. As for many pteriomorphid bivalves, the shell of P. nobilis possesses two calcified layers comprising an internal aragonitic nacreous layer and an external calcitic layer made of long prisms, probably the longest monocrystal-like biominerals in the mollusk world, and a fascinating model per se to explore the mechanisms of shell formation.

Results
Overall strategy to identify and characterize accripin11 Here, we describe a novel shell protein derived from the Mediterranean fan mussel, P. nobilis (Fig. 1A). This protein was identified in the external reddishbrown layer made of long prismatic calcitic crystals that exhibit a polygonal section. The prisms are maintained together by a thin organic honeycomb-like matrix (Fig. 1B). They exhibit elongation axis orthogonal to the outer shell surface and grow inwards in the direction of the epithelium that generate them. The prisms can be entirely dissociated (Fig. 1C) by degrading the peri-prismatic sheath with dilute sodium hypochlorite (bleach).
The strategy to identify accripin11 was conducted in two stages separated by a dozen years: initially, its sequence was obtained by antibody screening of a cDNA library but was never published, only mentioned as CSP3 in a review paper on P. nobilis [26]. The sequence was obtained from an isolated Lambda-Zap clone that contained an insert encoding a 363 bp ORF.
The second step corresponds to the work presented here: the sequence was retrieved and confirmed by combining a transcriptome (SRA database, BioProject accession number PRJNA887567 at NCBI) made from the mantle tissue brim, that is supposed to secrete solely the prismatic outer layer, and proteomics performed on shell extracts of the isolated prismatic layer. In the transcriptome of the mantle brim of P. nobilis, we identified a transcript (> Pnobilis_R20673250, see Fig. S1), in which the longest open reading frame (121 AA) encodes the protein identified in the clone some years earlier. This protein was retrieved via proteomics on the prism matrix, and its sequence was covered at 93% by 40 peptides (Fig. S2), with several overlaps. For the first time, we obtained a putative 3D structure by using AlphaFold2 program. The protein was overexpressed in a bacterial strain, and the overexpressed recombinant protein was doubly purified by affinity and by preparative electrophoresis before being tested further in vitro. We named this protein accripin11, corresponding to the following acronym: Acidic Cterminus, Cysteine-rich protein of Pinna nobilis.

Characterization of the primary structure of accripin11
The mature accripin11 is 103 residues long, with a molecular weight of 11.6 kDa and a theoretical pI of 6.7 ( Fig. 2A), whereas the full-length accripin11 is 121 residues. The protein is predicted to be secreted, since a signal-peptide could be recognized with an unambiguous cleavage site between Ala18 and Lys19 (Fig. S1). Figure 2B shows the overall AA composition of accripin11 after signal peptide cleavage. This composition is slightly biased in comparison to other shell proteins, since the most abundant amino acid is Asp (10.7%), followed by Arg, Ala and Thr (8.7%) then by Cys and Pro (7.8%). The primary structure of mature accripin11 is enriched in small and charged residues as well as proline (8), but depleted in aliphatic residues (only two for Val, Ile, Leu and Met) suggesting that it may be an intrinsically disordered protein.
On the other hand, the protein contains Phe (5), Tyr (3) and Cys (8), with the possibility of four disulfide bridges. While accripin11 is overall slightly acidic, there is a distinct bipartite charge distribution. The first 90 residues are enriched in basic residues, including Arg (9) and Lys (7). In contrast, the C-terminus is exceptionally acidic with seven Asp and two Glu in a 13-residue stretch.
A standard BLAST search indicates that accripin11 exhibits sequence similarity only with six low molecular weight unnamed proteins of molluscan origin ( Fig. 3) that are all without exception presumably associated with the shell: they include three proteins from pearl oysters of the Pinctada genus (Polynesian, i.e., P. margaritifera; Australian, i.e., P. maxima; Japanese, P. fucata), one from the comb pen shell Atrina pectinata, one from the edible eastern oyster Crassostrea virginica and one from the edible Mediterranean mussel Mytilus galloprovincialis. Five of them are uncharacterized shell proteins, while the sixth one (M. galloprovincialis) is a hypothetical predicted protein, identified by genome assembly and annotation. The percent identity shared between accripin11 and the other six members is moderate, ranging from 32% to 41% (with signal peptide) and 36% to 48% (without signal peptide). While the N-termini partly align (at positions 20,21,31,32,33), all seven proteins align perfectly (with a single gap) in their central domain, in particular the eight cysteine residues at invariant positions, as well as two arginine residuesone flanking downstream the first Cys at position 40, the second between the 6th and 7th Cys residueand one proline at position 53. Six other positions give an almost perfect match in the central region: Arg55, Leu56, Gln61, Tyr64, Ala68 and Asp74.
The seven C-termini corresponding (in accripin11) to the 22 AA-long hydrophobic domain plus the 17 AA-long Asp-rich tail match poorly and have different lengths, comprised between 39 (accripin11) and 81 residues (VDI36374.1, P. maxima). Interestingly, all seven C-termini are acidic, but to different degrees: the least acidic is that of M. galloprovincialis and the most are from the three Pinctada shell proteins. Patterns of note are the poly-Asp motif of the Pinctada species and the poly-Ala block / Ala-rich motif flanking the acidic C-terminus in the C. virginica protein (see XP_022342781.1 in Fig. 3).
The perfect alignment of the seven proteins according to their cysteine pattern, the identical motif organization of their primary structure, a short N terminus, the Cys-rich motif flanked downstream by a hydrophobic domain terminated by a highly acidic tail, in addition to the fact that all seven exhibit similar molecular size, are all of molluscan origin and are all presumably shell proteins, lead us to assign them to a single protein family. The alignment ( Fig. 3) indicates that the shell proteins from the three Pinctada species exhibit the highest percentage of identity. These three members cluster with the protein from C. virginica, then with that of A. pectinata, then with accripin11. The hypothetical protein sequence from the mussel M. galloprovincialis shares less similarities with the other six members.
We used the Pattinprot tool with the following cysteine pattern: -C to identify other proteins exhibiting such a motif (see Fig. S2). We identified four additional proteins: two of them (from mouse and human) are of high molecular weight and contain zinc finger patterns. In both cases, accripin11 aligns with one of the NF-X1-type domain located at the C-terminal part of these proteins. The other proteins with identical Cys pattern comprise a small cysteine and glycine repeatcontaining protein and a keratin-associated protein, both supposedly involved in cross-linking with cysteine residues of keratins. However, the overall similarity of accripin11 with these proteins is low, suggesting that, in spite of sharing the same Cys pattern, they belong to completely different protein families.

3D model of accripin11 via AlphaFold2
The analysis of accripin11's putative three-dimensional structure via AlphaFold2 (Fig. 4A) indicates that the protein is comprised of three consecutive alpha helices and a disordered C-terminus (~35 residues). Further analysis of the sequence alignments used for model predictions shows that the N terminus (first 20 residues) and C-terminus (last 35 residues) have highly unusual amino-acid composition, which prevent reliable identification of homologous targets (Fig. 4B) and therefore a low confidence in the prediction for these two regions (Fig. 4C). Only helices 2 and 3 (in red Fig. 4A) show a score of confidence high enough to make interpretations regarding the predicted fold.
Interestingly, these two helices are organized in an antiparallel manner and held together by four disulfide bridges that apparently stabilize the helix bundle ( Fig. 4D). Alphafold2 was also used for predicting the 3D structures of the six putative molluscan shell proteins that exhibit homology with accripin11 including the same cysteine pattern. As shown in Fig. S3, the bestpredicted models (ranked 1) demonstrate that all six sequences possess the same structural motif consisting in the antiparallel alpha-helix containing 4 disulfide bonds. All six C-termini are disordered.

Overexpression of recombinant accripin11
Accripin11 was overexpressed and purified by Pro-teoGenix via affinity chromatography (using a Step-Tag2 tag and StrepTactin resin) under native conditions. When tested on silver-stained gel (Fig. 5B, lane 4), we found this initial purification to be incomplete (Fig. 5B, lane 4). Consequently, the extract was purified again by preparative gel electrophoresis in two batches and pure Accripin11 was detected on dot-blot ( Fig. 5A) with anti-StrepTag2 antibody, in tube 23-26 (1st batch) and 24-27 (2nd batch). We verified that this preparative purification yielded accripin11 by performing proteomics on the resulting extracts: this fully confirmed the presence of accripin11.
The purity of the pooled fractions of both batches was tested on silver-stained gel (Fig. 5B, lanes 2 and 3) and western blot (Fig. 5C, lanes 2 and 3): in the first case, one unique band around 17 kDa was visualized on the gel, while the western blot shows predominantly one band at the same height, but also another thin band around 34 kDa, which may represent a dimer of accripin11. It is noteworthy that this protein exhibits anomalous gel migration as its predicted size is 11 kDa.

Layer-specificity of accripin11
We investigated whether accripin11 was spatially restricted to the calcitic prisms of the shell of P. nobilis, or if it could also be found in the nacreous layer. To this end, we extracted the ASM and AIM of the nacreous layer, as for the prisms layer. All six fractions, including the four prisms extracts (ASM p 1, ASM p 2, AIM p 1, AIM p 2) and the two nacre ones (ASM n , AIM n ), were tested by proteomics for the presence of accripin11. The results are summarized in Table 1. In addition, the list of peptides identified in each of the six extracts is shown in Table S1. In brief, all 6 extracts contain accripin11 but in varying abundance: while ASM p 1 and ASM p 2 contain respectively 40 and 36 peptides representing 93% of the sequence of mature protein (without its signal peptide), ASM n contains a single short peptide covering only 6% of the sequence. AIM p 1 and AIM p 2 contain also accripin11 with a relatively good coverage (58, 63%), but with a limited number of peptides (10 and 7, respectively); at last, AIM n contains only three accripin peptides, covering 40% of the sequence. Although not strictly quantitative, these data suggest that accripin11 is concentrated in the two soluble prism matrices, less concentrated in the insoluble prism matrices, and very poorly concentrated in the two nacre matrices.
We also performed a western blot (Fig. 6B) under the same conditions as for the gel (Fig. 6A). The blot, treated with anti-accripin11 antibody, revealed one band in the ASM p at approximately 17 kDa (Fig. 6B, lane 4) corresponding to a blurred negatively stained band in the silver-stained gel (Fig. 6A, lane 4). No band was identified in LS-AIM p (Fig. 6B, lane 5) neither in ASM n (Fig. 6B, lane 2) nor LS-AIM n (Fig. 6B,  lane 3). This suggests that accripin11 is particularly concentrated in the acetic acid soluble matrix of the prisms but poorly concentrated or absent from the other extracts. We cannot exclude that, as polymerized or crossed-linked form, this protein may be present in the insoluble matrices of both the prisms and the nacre, since it was identified by proteomics in these two fractions.
As the solubility of accripin11 limited our options, ELISA tests were performed only on the two acetic acid-soluble matrices, with recombinant accripin11 acting as a positive control (Fig. 6C). The graph shows that the anti-accripin11 antibody cross-reacted with ASM p but not with ASM n tested under the same conditions and at the same concentration (200 ng per well), suggesting that accripin11 is concentrated in the prism matrix, but is absent (or very rare) in the nacre matrix. Quantification by ELISA of accripin11 in the prism soluble matrix was performed with a calibration curve with the target protein tested at known serial concentrations (Fig. S4). The ELISA indicates that accripin11 may represent around 12% of the prism ASM, making it a major soluble protein of the prismatic matrix.
In vitro crystallization in the presence of recombinant accripin11 The interaction of purified accripin11 and bulk shell matrix (ASM1) with the precipitation of calcium carbonate crystals grown in vitro was investigated by scanning electron microscopy (SEM) (Fig. 7). When no protein is added, that is, blank, in both cases, the typical rhombohedral calcite crystals with smooth crystal faces were obtained (Fig. 7A,F).
When accripin11 was tested, we observed a gradual change in morphology, which is concentrationdependent (Fig. 7B-E). However, this change was gradual: even at high concentration (Fig. 7E), the produced crystals still exhibited sharp edges and few  rounded angles. In contrast, when ASM1 (Fig. 7G-J) was tested for comparison, the alteration of crystal morphology was already observed at 0.5 lgÁmL À1 , that is, polycrystalline aggregates were observed with rounded edges (Fig. 7G). This alteration was accentuated at 1 lgÁmL À1 (Fig. 7H). Starting from 2 lgÁmL À1 (Fig. 7I), the effects were drastic: we observed fully rounded polycrystalline aggregates. With respect to crystal size, we noticed differences between the two extracts: accripin11 did not induce any size modifications in the tested concentration range, while we observed a size increase up to 2 lgÁmL À1 (Fig. 7I) and then a size decrease in the case of ASM1 (Fig. 7J). We interpret the size decrease as an inhibitory effect of ASM1.

Discussion
In this paper, we describe a novel protein derived from the shell matrix of the Mediterranean fan mussel, P. nobilis that we have named accripin11. It is a small, secreted protein of 103 amino acid residues, with a theoretical isoelectric point of 6.7. This makes accrip-in11 not particularly acidic, in contrast to a large cohort of proteins extracted from mollusk shells, and more generally, from metazoan calcium carbonate skeletons [7,26]. However, the protein contains an exceptionally acidic C-terminus with 10 acidic residues out of 17. Although accripin11 might be ubiquitous in the shell, both proteomics and the use of an anti-accripin11 antibody show that this protein is most abundant in the prismatic layer, and particularly in the acetic acid-soluble fraction, a fact that is in agreement with the overall hydrophilicity of accripin11. Western blot failed to detect accripin11 in the Laemmli-soluble acetic acid-insoluble fraction of both prismatic and nacreous layers, but proteomics identified accripin11 in the acetic acid-insoluble fraction of the prisms. This latter comprises the Laemmli-soluble acetic acidinsoluble fraction and the most insoluble fraction, which was not analyzed further. The apparent discrepancy between western blot and proteomics may be explained as follows: beside being associated with the acetic acid-soluble fraction, accripin11 is also present in the most insoluble matrix, the fraction that cannot be solubilized further by Laemmli sample buffer, that is, that cannot be analyzed further via techniques such as western blot or ELISA. If so, this means that accripin11 may polymerize into large insoluble units, or that it extensively crosslinks with other components to form a completely insoluble matrix. The sequence of accripin11 is characterized by a short basic N terminus, a cysteine-rich domain, a hydrophobic alanine-rich domain and a very acidic Cterminus. This primary structure is rather unusual: while 80% of the sequence is ordered and consists of three successive alpha-helices in the cysteine-rich domain, the terminal "tail" is disordered and highly acidic. The presence of such a terminal domain may explain why accripin11 exhibits an anomalous migration in SDS/PAGE gels: both the recombinant protein (which contains the Strep2 tag) and the protein from the soluble prism matrix show delayed migration (often referred to as "gel shifting") under denaturing conditions (from 11 to 17 kDa). According to Tiwari and coworkers, proteins with acidic domains always exhibit an overestimated apparent molecular weight. These authors explain this phenomenon by the fact that highly acidic domains electrostatically repel SDS, resulting in insufficient SDS binding and consequently lowered electrophoretic mobility [27]. We cannot exclude that incomplete unfolding of accripin11 may be another cause for anomalous migration in a denaturing gel, but because of the concentration of mercaptoethanol used, this explanation is less likely.
Because accripin11 only exhibits sequence similarity with putative shell proteins that have not been functionally tested, its functions(s) in biomineralization remain unknown. Consequently, it belongs to the growing collection of shell proteins of "unknown function," similar to MRNP34 [17], or to upsalin [23]. However, when tested in vitro, we observed that recombinant accripin11 exerts a relatively strong effect on the precipitation of calcium carbonate, in spite of its almost neutral isoelectric point. Whether this effect is simply an artifact due to the sequence peculiarities of accripin11, or whether it is an accurate representation of its in vivo function remains unknown, and a proper answer will only be given via gene knockdown/CRIPR-Cas9 technologies. Already perceptible at a low protein concentration (0.5 lgÁmL À1 ), the interference effect increases proportional to the protein concentration. We noticed however that the interaction of accripin11 alone with growing crystals is less effective than the effect of the bulk soluble matrix. This latter represents a pool of numerous proteins, including highly acidic ones such as caspartin, which was found to be effective in the interference as well as inhibition tests [28]. This suggests that synergistic or additive effects occur during the in vitro growth of calcium carbonate crystals when a bulk soluble matrix is used instead of a purified protein.
In a schematized view (Fig. 4A), the Asp-rich Cterminus of accripin11 may bind the positively charged surface of growing calcium carbonate nuclei via electrostatic interactions, while the more hydrophobic domain may be repelled and protrude from the surface, interfering with the normal growth of crystallites by disrupting the movements of lattice ions to the crystals, as has been suggested for synthetic polyaspartic acid peptides containing a polyalanine domain [29]. In past simulations, it was found that a short polyalanine (i.e., hydrophobic) domain of 8 or so residues would be sufficiently large at about 3 nm to interfere with the zone of attraction between lattice ions and surface charges [29,30]. In this view, the hydrophobic domain would then control the access of lattice ions to crystal surfaces. Accripin11 exhibits such short hydrophobic motifs in the first half of its sequence, for example, between residue 12 (2nd methionine) and 18 (leucine) and between the second cysteine (position 25) and phenylalanine (position 39). However, we cannot exclude other functions of the N terminus and/or the central domain; these may bind other macromolecular partners of the matrix via the three alpha helices. If this is the case, accripin11 may then act as a true "linker" between the mineral surface and the organic framework.
In our representation, the unstructured C-terminus interacts mainly with calcium carbonate minerals (crystallized or amorphous) but we can also envisage its folding to be induced by a partner macromolecule. It is interesting to note that accripin11 is not the only example of a relatively short shell-associated protein with an acidic tail; as revealed by our BLAST searches, this property is shared with all other putative members of the family and with the two members of the prismin "family," (prismin1 and 2 [31]). These latter are two polypeptides of about 4 kDa identified in the prismatic layer matrix of the Japanese pearl oyster Pinctada fucata. Like accripin11, prismin 1 and 2 exhibit apparent molecular weights on Tris-tricine gels higher than their respective sequences suggest. These two polypeptides belong to a protein family that is apparently unrelated to accripin11. However, their discoverers suggest that their acidic C-termini have the ability to interact with the surface of calcium carbonate crystals.
The sequence similarity of accripin11 along with six other putatively related proteins suggests a common ancestry of these proteins that appears to be restricted to the Pteriomorphia. All of these sequences are defined by their patterns of cysteine, proline and arginine residues and the overall high-sequence similarities, the distribution of hydrophobic and hydrophilic motifs along their sequence and their acidic C-termini. Furthermore, Alphafold2 predictions indicate that all seven contain the antiparallel alpha helix, a conserved structure that most likely has an important function in biomineralization. In spite of the significant morphological diversity of the Pteriomorphia, this major bivalve subclass, considered to be monophyletic [32,33] is comprised of six orders including the Arcida, Pectinida, Limida, Ostreida, Mytilida and Pteriida [34]. The seven proteins of the accripin family belong without exception to representatives of the three last orders, namely Ostreida (C. virginica), Mytilida (M. galloprovincialis) and Pteriida (with the three Pinctada species, belonging to Pterioidea superfamily, and A. pectinata + P. nobilis, belonging to the sister superfamily, Pinnoidea). Interestingly, a relatively recent large-scale phylogeny of bivalves [35] groups these three orders together (Fig. 8), making them the sister group of the Arcida-Limida-Pectinida clade. In this phylogeny, Ostreida and Pteriida are sister groups and, together, are the sister group of Mytilida.
Although there is no consensus on the branching pattern of the six pteriomorphian orders, both palaeontological [36] and more recent molecular clock data [37] indicate that these orders diverged in the Paleozoic era, between the Silurian (> 430 million years ago) and the Devonian (> 385 million years ago). Therefore, the most likely scenario is that the seven accripin11 orthologs are derived from a single "Lower Paleozoic" ancestor that existed before the Mytilida/ Ostreida/Pteriida split. Under this scenario, accripin constitutes a clear example of a lineage-restricted shellforming protein. In the future, as more bivalve genomes become available, it will be fascinating to check whether members of this family are also present in the three other pteriomorphian orders, namely Arcida, Limida and Pectinida (BLAST searches against these lineages currently reveal no hits against accripin11). In contrast to Mytilus, Crassostrea, Atrina, Pinna and Pinctada, representatives of these three orders do not possess an outer shell layer made of calcitic prisms, but rather, either an outer layer that is foliated calcite (Limida, Pectinida) or crossed-lamellar aragonite (Arcida) [38]. This intriguing observation suggests that accripin-related proteins may be functionally involved in the deposition of prismatic shell microstructures.

Earlier work for identifying accripin11
In 1996, mantle tissue was collected (0.5 g) from actively calcifying, juvenile P. nobilis, grown in aquaria (B. De Gaulejac) as previously described [39]. The amplified cDNA expression library (LambdaZap) made from this tissue library was antibody-screened. Screens with antibodies elicited against the soluble nacre matrix of P. nobilis led to the identification of mucoperlin [39,40]. A similar operation performed some years later with antibodies elicited against the soluble prism matrix generated 15 positive clones, which were isolated, rescreened to purity and sequenced (Eurofins Genomics, Ebersberg, Germany). The sequence of one of these inserts contained a 363 bp ORF, a 441 bp 3 0 UTR and a poly(A) tail. This ORF encoded an undescribed putative shell protein of 121 amino acids, which we initially named CSP3 [26] and have here renamed accrip-in11 and fully described.

Sample collection: fresh tissues and shell materials
A second series of mantle tissue collections was performed in the spring of 2017, requiring the authorization of the DDTM (Direction D epartementale du Territoire et de la Mer of Alpes Maritimes department, Arrêt e Pr efectoral n°2 017-459). Ten juvenile P. nobilis individuals (from 15 to 30 months) were collected between 5 and 8 m by SCUBA diving, at the Baie de Villefranche-sur-Mer in May 2017. Shells were carefully taken together with their byssus and the substrate attached to them. All sampling and operations were performed according to the appropriate ethics rules and regulations. Sampling mission information and metadata are stored in dat@UBFC portal of the Observatoire des Sciences de l'Univers (OSU) Theta, Besanc ßon, France at: https://search-data.ubfc.fr/search.php?s=Pinna+nobilis.
Individual mussels were placed in a seawater-filled tank on the boat, then transferred and acclimated in a large basin at the biological station, and fed twice a day with Seachem Reef Phytoplankton TM containing a mixture of algae (Thalassiosira weissflogii, Isochrysis sp., Nannochloropsis sp.), protein hydrolysate with carotenoids, citric acid, carboxylic acid, methyl paraben, sodium propionate. After 5 days, some animals were sacrificed. Valves were carefully opened, by cutting the adductor muscle with a scalpel. Only the outer border of the mantle characterized by several folds (which is supposed to contribute only to the secretion of the prismatic layer, including the spines) was sampled in addition to byssal glands, foot and gills. All tissues were immediately frozen in liquid nitrogen. Unused living specimens were transferred back to the Villefranche-sur-Mer bay and returned to their original biotope.
The shells of sacrificed animals were carefully cleaned to remove epibionts and kept for further matrix extraction.

Transcriptomics
Total RNA was extracted from the mantle of two individuals using QIAzol (#79306; Qiagen GmbH, Hilden, Germany) following the manufacturer's instructions. The RNA extractions were quantified with a Nanodrop and qualified by agarose gel electrophoresis before being sent to the NGS-Service for Integrative Genomics (G€ ottingen) for library preparation and sequencing. Paired-end libraries were constructed and sequenced for 250 bases from both ends on the Illumina HiSeq2500 platform. Raw Illumina reads were processed and assembled as previously described [41]. We particularly focused on the 1417 bp-long transcript R20673250 that contains a 363 bp-long ORF encoding accripin11.

Prism matrices extraction and mono-dimensional gel check
A cleaned shell valve of a 28-to 30 -month-old specimen was used for the extraction of the prism matrix. In brief, the upper two-third of the shell (which exclusively contains the prismatic layer) was cut into fragments with a diamond saw. The fragments were sonicated in diluted NaOCl (1% active chlorine) for 5 min, rinsed with water, 70% alcohol, air-dried and roughly crushed. The cleaned fragments were divided into two batches: one for extracting the whole organic matrix and the other for extracting only the intraprismatic organic matrix. The first batch was powdered by using a mortar and pestle grinder to a particle size below 200 lm (sieving). The second batch was bleached in NaOCl for 2 days under constant rotation (Speed 25 r.p.m.) to isolate prisms by dissolving the peri-prismatic organic sheaths. Prisms were then collected by centrifugation, the NaOCl discarded, and the suspension was rinsed several times with Milli-Q water and dried [28]. Both powdered (Batch 1 = 15.8 g) and bleached (Batch 2 = 12.99 g) samples were decalcified overnight, by titrating with cold diluted acetic acid (10% v/v) to obtain the full prismatic and intra-prismatic organic matrices, respectively. The obtained clear solution from both batches was then centrifuged (3900 g, 30 min) to separate an Acid Soluble Matrix of the prisms (ASM p ) from an Acid Insoluble Matrix of the prisms (AIM p ). The AIMs of the two batches (AIM p 1 and AIM p 2) were rinsed with Milli-Q water and freeze-dried. ASM p 1 and 2 were further ultrafiltered by using 3 kDa cut-off membrane (volume reduction to approx. 10 mL), and the concentrated solutions were desalted by dialyzing (Spectra/Por6 dialysis pre-wetted RC tubing, molecular weight cutoff 1 kDa) in 1 L Milli-Q water, with at least five water changes. These salt-free solutions were then freeze-dried. Aliquots were denatured with Laemmli sample buffer. The AIMs did not fully dissolve but the solubilized fractions were named LS-AIM (Laemmli-Soluble Acid Insoluble Matrix). ASMs and LS-AIMs were run on hand-casted 15% acrylamide monodimensional mini-gels (Bio-Rad Laboratories, Hercules, CA, USA) according to the manufacturer's instructions. The gels were stained with silver nitrate [42].

Proteomics and subsequent in silico analysis
MS/MS analyses were conducted on the four unfractionated bulk matrices, ASM p and AIM p , after a short migration in an acrylamide gel according to a published procedure [20]. Database searches were carried out using MASCOT version 2.4 and 2.5 (MatrixScience, London, UK) using the P. nobilis transcriptome.
The accripin11 sequence, both deduced from the analysis of the 1417 bp-long transcript R20673250 and from proteomics, was analyzed for its physico-chemical parameters, including isoelectric point, molecular weight, amino acid composition, by using the PROTPARAM [43]. Its signal-peptide was identified by SIGNALP-6.0 [44]. We performed standard BLASTP searches against GenBank (https://blast.ncbi.nlm.nih. gov) and we used the Pattinprot search tool from PRABI (Pôle Rhône-Alpes de Bioinformatique, Lyon, France) to identify cysteine patterns similar to that of accripin11.

AlphaFold2 simulations
3D model predictions of accripin11 were obtained using first, a local installation of AlphaFold2 [45] and later, the online version of ColabFold [46] from which alignment statistics and prediction scores were extracted. In addition, 3D model predictions were performed with the six putative molluscan shell proteins that contain the same cysteine pattern and that were found to exhibit sequence similarity with accripin11. Figures of the best-predicted model were prepared with the PYMOL Molecular Graphics System version 2.4.0 (Schr€ odinger platform, Mannheim, Germany).

Overexpression of recombinant accripin11 and purification
Overexpression work was performed by the company Pro-teoGenix (Schiltigheim, France). In brief, the cDNA coding for mature accripin11 (without signal peptide) was chemically synthesized with optimization for Escherichia coli expression. It was subsequently cloned into a pT7 expression vector, containing a StrepTag2 tag in its C-terminus, according to the manufacturer's instructions. The subcloned DNA insert encodes a recombinant protein that is slightly longer than the natural one (Fig. S5) as it includes an MG dipeptide at the N terminus (from the pT7 vector) and the 10 AA-long Strep-Tag2 tag (SAWSHPQFEK) at the C-terminus. The culture growth conditions were as follows: growth at 37°C until OD600 nm > 0.5, followed by induction of overexpression with IPTG (1 mM final concentration, induction time from 1 to 4 h). Cells were harvested by centrifugation, disrupted by sonication in native buffer and tested on a standard SDS/ PAGE gel (not shown). Relatively low expression was observed. The protein was purified from the supernatant by affinity chromatography as follows: the supernatant was allowed to bind to a StrepTactin resin, which was subsequently washed with TBS. The bound extract was then eluted with desthiobiotin. As the purification was far from optimal as revealed by silver-stained gels, we decided to purify it further by preparative electrophoresis (Bio-Rad; model 491 Prep Cell) on a 12% acrylamide gel, according to a protocol developed by one of us [39,47]. Two successive purifications were performed from two equal batches. The protein was detected by dot-blot on the 80 different eluted fractions owing to an anti-StrepTag2 antibody, diluted 1/1000 (StrepMAB-Classic, ref. 2-1507-001; IBA Lifesciences, G€ ottingen, Germany). The tubes containing the eluted recombinant protein (Tubes 23-26 for batch 1, tubes 24-27 for batch 2) were pooled and the pooled fractions, dialyzed against milli-Q water (Spectra/Por6 dialysis pre-wetted RC tubing, molecular weight cutoff 1 kDa), before being freeze-dried. The extract was quantified by weighing the lyophilizate: around 300 lg of pure protein was obtained. An aliquot (about 10 lg) was sent to 3P5 platform for proteomic analysis, to ensure the "accripin11 nature" of the extract.
The freeze-dried pellet was dissolved again in Milli-Q water, and an aliquot was tested on a SDS/PAGE gel as described above (with silver staining), and on western blot. In this last case, we employed a procedure currently used in our lab [28]

Polyclonal antibodies against synthetic peptides of accripin11
Since the quantity of pure recombinant accripin11 was too low to generate antibodies and to develop in parallel in vitro assays, we chose to produce a polyclonal antibody from two synthetic immunogenic peptides corresponding to the central (KDCAQQCTRDRETCFG, residues 65-80) and to the C-terminal part (TAAPKPAKEPSSADD, residues 97-111) of the accripin11 sequence. The whole procedure (peptide synthesis + antibody production) was performed by Eurogentec (Seraing, Belgium). The synthesized peptides were coupled to a carrier (KLH for the 1st central peptide, OVA for the 2nd C-terminal one) and injected in two white rabbits (SY3579, SY3580), according to a speedy 28-day immunization protocol (contract FR10162), including injections at 0, 7, 10 and 18 days, and bleedings at 0 (pre-immune serum), 21 (medium) and 28 (final) days. The titers of second and third bleed antiserum were determined by ELISA test with pre-immune serum used as negative control, as previously described [28,39].

Localization of accripin11 in the nacreous layer
To investigate whether accripin11 was either exclusively associated with the prismatic layer of P. nobilis shell or also present in the nacreous layer, we employed two strategies: (a) searching for accripin11 in nacre extracts via qualitative proteomics and (b) testing the presence of accripin11 in both extracts with the polyclonal anti-accripin11 antibody. In both cases, about 8-g portions of the nacreous layer of a cleaned shell (of a juvenile specimen) were mechanically isolated from the prismatic layer, powdered and decalcified overnight at 4°C with cold acetic acid (10% v/v), similarly to the prismatic layer (see above). These extractions generated an acetic acid-insoluble (AIM n , 67.4 mg, i.e., 0.85% of the powder weight) and acetic acid-soluble (ASM n , 4.6 mg, i.e., 0.058% of the powder weight) nacre fractions. In strategy 1, aliquots of these two extracts were analyzed via proteomics (3P5 platform) identically to the prism extracts, and the accripin11 sequence was searched via MASCOT. According to strategy 2, the soluble fractions of both prisms (ASM p ) and nacre (ASM n ) were tested by ELISA according to two approaches, and by western blot. For the first ELISA approach the antigens concentrations were kept constant (200 ng per well) and the anti-accripin11 antibody was 2fold serially diluted from 1/500 to 1/64 000 (eight dilutions).
Purified accripin11 was used as a positive control at 100 ng per well. Each point was tested in triplicate. For the second ELISA approach, the two soluble fractions (ASM n , ASM p ) were 2-fold serially diluted, from 800 to 6.25 ng per well and the antibody dilution was kept constant (1/1000). In parallel, on the same plate, a calibration curve was generated with accripin11 using concentrations varying from 100 to 0.78 ng per well. Each point was tested in triplicate. For the western blot, Laemmli-denatured preparations of ASM p , AIM p , ASM n , AIM n and purified accripin11 were run on a 15% acrylamide gel and electro-transferred as indicated above. After blocking, the membrane was incubated overnight with anti-accripin11 antibody, diluted 1/1000. After rinsing (TBS/ Tween 20), the membrane was incubated 90 min with GAR-AP conjugate diluted 1/30 000 (Sigma-Aldrich; A3687-1ML), extensively rinsed and the color developed either by chemo-luminescent staining (CDP-Star) or by SIGMAFast BCIPÒ/NBT tablets.

In vitro crystallization
Directly following its purification, the interaction of recombinant accripin11 with the growth of calcite in vitro was tested in the calcium carbonate crystallization assay (diffusion method), according to the initial Albeck et al. protocol [48] with some modifications (as described in ref. [49]). We did not remove the StrepTag2 tag from the recombinant accripin11 as it is regarded to be biologically inert, proteolytically stable and unlikely to interfere with the folding or bioactivity of recombinant protein [50]. The effect of recombinant accripin11 on crystallization was compared to that of the complete soluble prism matrix (ASM p 1). Increasing concentrations of accripin11 (top row) and of ASM p 1 (bottom row), ranging from 0.5 to 4 lgÁmL À1 in 10 mM filtered sterile CaCl 2 , were applied to 16-well culture slides (Lab-Tek, Nunc/Thermo Scientific, Rochester, NY, USA; 200 lL per well). The plate with a pierced top was kept in a desiccator under vacuum in the presence of ammonium bicarbonate (NH 4 HCO 3 ) crystals and maintained at 4°C. Blanks were tested with CaCl 2 solution alone. After 72 h, the solutions were carefully removed from the wells (with a blunt end needle connected to a Millipore vacuum pump), and the plate was dried. The glass plate of the slide was dissociated from the well spare part and directly observed under a Hitachi TM1000 Tabletop Microscope without carbon coating. This experiment was repeated three times to ensure homogeneity of the results.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. Nucleotide sequence of the contig (>Pno-bilis_R20673250) encoding accripin11. The amino acid sequence of accripin11 is shown under the nucleotide sequence in one-letter symbols (in blue): the first 18 amino acids in red represent the signal peptide. This amino acid sequence was found in the longest open reading frame of the translation 2F. Fig. S2. Alignment of the cysteine pattern (shaded blocks) with that of one zinc finger protein (Swiss-Prot accession number Q8R151) and one keratin-associated protein (Swiss-Prot accession number Q64507) of Mus musculus. Fig. S3. 3D structure prediction with AlphaFold2 of the 6 putative molluscan shell sequences that were found to be homologous to accripin11: (A) Atrina pectinata, (B) Crassostrea virginica, (C) Mytilus galloprovincialis, (D) Pinctada fucata, (E) Pinctada maxima and (F) Pinctada margaritifera. Each of these predictions shows the two antiparallel alpha helices. With the exception of the protein C, all of them exhibit a disordered C-terminus. Fig. S4. Quantification of Accripin11 by ELISA in ASM p 1 and ASM n extracts. The curve was obtained with recombinant Accripin11, serially diluted (100 to 0.7 ng) and tested with the anti-accripin11 antibody, diluted 1000 times. (A) Calibration curve; the red line represents the linear fit and the insert table includes the linear equation and its parameters. (B) List of absorbance values obtained with ASM p 1 and ASM n , both tested at concentrations ranging from 800 to 6.25 ng per well. The absorbance value of ASM p 1 at 200 ng/well is almost equal to the value of accripin11 at 25 ng/well. Note that all absorbance values measured in ASM n correspond to blank values, indicating the absence of accripin11 in the extract. Fig. S5. Complete amino acid (AA) sequence of recombinant accripin11 with the StrepTag2 tag. The first two AAs (in red) belong to the pT7 expression vector. AAs in blue (from position 106 to position 115) represent the StrepTag2 tag. Table S1. List of peptides identified by MASCOT for accripin11 in the four matrices of prismatic layer (ASM p 1, AIM p 1, ASM p 2 and AIM p 2) and in the two nacre matrices (ASM n , AIM n ).