Comparison of grain protein profiles of Brazilian cowpea (Vigna unguiculata) cultivars based on principal component analysis

This study aims to compare the grain protein profile of four Brazilian cowpea cultivars (BRS Aracê, BRS Itaim, BRS Pajeú, and BRS Xiquexique) by two-dimensional electrophoresis (2-DE) and principal component analysis (PCA). 2-DE efficiently separate cowpea protein profiles, showing high homogeneity among the four cultivars. In addition, the principal component analysis indicated that there is a difference in abundance of proteins among the cultivars. The cultivars BRS Aracê and BRS Xiquexique, both biofortified in iron and zinc, were separated from the cultivars BRS Itaim and BRS Pajeú. These results demonstrate that protein profiles can be used to discriminate cowpea varieties.


Introduction / Background
Cowpea (Vigna unguiculata) is an important component of the basic food basket in Africa, and in north and northeast regions of Brazil. This legume is rich in protein (23% to 30%), fiber (16% to 19%), and other essential nutrients, such as B vitamins (Aida et al. 2021;Baptista et al. 2017;Filho et al. 2011;Frota et al. 2008). Among the proteins, vicilin 7S is the main storage protein of cowpea, and depending on the variety, it may be glycosylated or not, interfering with its functional properties (Kimura et al. 2008). Four Brazilian cultivars developed by Brazilian Agricultural Research Corporation (EMBRAPA) are analyzed here, BRS Aracê, BRS Itaim, BRS Pajeú, and BRS Xiquexique. They have different tegument colors, grains of different sizes, and two of them are biofortified in iron and zinc (BRS Aracê and BRS Xiquexique) (Embrapa 2009;Filho et al. 2011;Vilarinho et al. 2010a;Vilarinho et al. 2010b).
Due to the high protein content of cowpea, the encouragement of its consumption can be associated with a low-cost protein source, when compared to the cost of animal protein or other plant sources. In addition, cowpea can be an alternative protein source for vegetarian and / or vegan populations; and to expand food options to contribute to dietary diversity. Besides the nutritional importance, proteins have a direct relationship with the physiological state of plants, specific processes such as photosynthesis, biosynthesis and transport, as well as with responses to biotic and abiotic factors. During seed development, different protein groups accumulate, such as so-called storage proteins, which act as markers of the maturation phase (Clerens et al. 2012;D' Alessandro & Zolla 2012;Rasheed et al. 2020).
Protein analysis by two-dimensional gel electrophoresis (2-DE) promotes the protein separation in two stages (isoelectric point and molecular mass) with great efficiency and robustness (Jorrin-Novo et al. 2019;Rabilloud & Lelong 2011;Zhan et al. 2019). The technique has been widely applied as a tool related to food quality and safety (Alikord et al. 2018;Lorenzini et al. 2016;Rossi et al. 2017;Valentim-Neto et al. 2016). However, 2-DE generates hundreds of spots per gel, which can make the analysis of this large datasets a time consuming and difficult step, when performed by univariate analysis tools Lualdi & Fasano 2019), becoming the multivariate statistical approaches like the principal components analysis (PCA) more effective. PCA is used to concentrate the information contained in several Graphical Abstract Page 3 of 11 Honaiser et al. Food Production, Processing and Nutrition (2022) 4:16 original variables in a smaller set of statistical variables (components) with a minimal loss of information, thus allowing an overview of the data set, highlighting possible relationships among these (Engkilde et al. 2007). It is one of the most used multivariate analysis methods in the analysis of proteomic data (Balsamo et al. 2015;de Mello et al. 2016;Lualdi & Fasano 2019;Valentim-Neto et al. 2016). Principal Component Analysis is a useful tool to compare protein profiles of plant varieties without the need of protein identification.
In this work, we carried out the first comparative protein profile study of four Brazilian cowpea cultivars (BRS Aracê, BRS Itaim, BRS Pajeú, and BRS Xiquexique) developed by EMBRAPA using 2-DE and principal component analysis.

First protein extraction protocol
The first method applied to the four cowpea grain cultivars was previously established for common beans (Rossi et al. 2017) and consisted of approximately 30 g of grains from each cultivar, in triplicate, were ground in an analytical mill (IKA, Staufen, Germany) with liquid nitrogen and subsequently stored at -80 °C until the moment of extraction.
The protein extracts for each cultivar were obtained from 300 mg of each ground sample suspended in 0.8 mL of extraction buffer [0.5 M Tris-HCl, pH 8; 0.7 M sucrose; 100 mM EDTA; 1 mM PMSF; 1% (w/v) CHAPS; 14 mM DTT; Roche protease inhibitor (Mannheim, Germany)], the mixture was vortexed for 30 s. The samples were centrifuged at 20,000 × g for 20 min at 4 °C, the supernatant was equally divided into two microtubes because of the total volume, then 0.8 mL of solution containing pure acetone, 12.5% (w/v) TCA and 0.125% (w/v) DTT was added and kept overnight at 4 °C, after, subjected to another centrifugation at 20,000 × g for 20 min at 4 °C. The precipitate was washed three times with 1 mL of cold methanol, twice with 1 mL of pure acetone and finally, with acetone containing 0.1% (w/v) DTT. After centrifugation at 10,000 × g for 30 min at 4 °C, the supernatant was discarded, and the precipitates were suspended in 300 μL of rehydration buffer containing 7 M urea, 2 M thiourea, 2% (w/v) CHAPS, 0.28% (w/v) DTT and 1% (w/v) PMSF, kept at 21 °C for 2 h, then subjected to another centrifugation at 10,000 × g for 30 min at 15 °C, the supernatants from the two microtubes were combined into one, and stored at -80 °C for further quantification. The protein extracts were quantified using the 2-D Quant Kit (GE Healthcare, Uppsala, Sweden).

Modified protein extraction protocol
Based on the results obtained in the first extraction, the protocol was modified as described above (Fig. 1), to verify the best condition for cowpea protein extraction. BRS Xiquexique cultivar was chosen for grain protein extration because the lowest protein content was obtained with the previous protein extraction. The entire procedure was performed in 15 mL conical tubes, avoiding sample division during the protocol. The influence of the sample initial mass was also evaluated, in addition to 300 mg, a second sample of 500 mg was used in this test. Finally, in the third variable studied, it was decided to exclude the steps of washing with methanol, observing whether this would reduce the protein loss in the process.
After quantification, the condition that resulted in greater extraction was applied to the other cultivars grains. Then, the protein extracts were cleaned using 2-D Clean-Up Kit (GE Healthcare, Uppsala, Sweden).

2-DE
Three protein extracts were prepared from each cultivar (with and without 2-D Clean-Up kit), and from each protein extract one 2-DE gel, so three protein profiles for each cultivar were obtained. 2-DE was carried out as described by (Valentim-Neto et al. 2016).
Isoelectric focusing (IEF) was performed using Immobiline Drystrip Gels (IPG strips, pH gradient 4-7, 13 cm) (GE Healthcare), according to (Nogueira et al. 2007; É. A. R. Vasconcelos et al. 2005) most cowpea proteins have PI between pH 4 and 7, so the use of strips in this range allows better separation and visualization. Approximately 250 μg of total protein was diluted in 250 μL rehydration buffer containing 0.2 mL L −1 IPG buffer pH 4-7 (GE Healthcare) and bromophenol blue was used as tracking dye. The strips were focused on the following conditions: step one from 50 to 25 Vh, step two from 500 to 500 Vh, step three from 1000 to 750 Vh, step four from 4000 to 2500 Vh, step five from 8000 V to 15,000 Vh, and a final step from 6000 to 6000 Vh, up to a total of 25,000 Vh, at a limit of 50 mA per strip. After focusing, strips were kept at -80 °C.
Before SDS-PAGE, the proteins contained in the strips were incubated for 15 min with 10 g L −1 of DTT in 5 mL of buffer containing 50 mmol L −1 Tris-HCl, pH 8.8; 6 mol L −1 urea; 0.2 g L −1 SDS; 3 ml L −1 glycerol; 2.5 mg L −1 bromophenol blue. Followed by alkylation for 15 min with 25 g L −1 iodoacetamide in 5 ml of the same buffer. SDS-PAGE was performed on 12.5% polyacrylamide gel using SE 600 Ruby System (GE Healthcare). The applied electrical current was 15 mA per gel for 30 min and 30 mA per gel until the end of the run. The temperature was maintained at 10 °C using a MultiTempIII Thermostatic Circulator (GE Healthcare). Protein gels were stained by Coomassie Brilliant Blue G -250 Stain (Bio-Rad).

Image and data analysis
The gels were scanned on an Image Scanner System II and analyzed with ImageMaster 2-D Platinum Software Version 7.0 (both from GE Healthcare). Automatic matching has been supplemented manually. The number of total spots was detected according to the following parameters: smooth ≥ 4, saliency ≥ 100, and area ≥ 11. The triple gels of each cultivar were compared with each other and, subsequently, between all cultivars. For the identification of differentially accumulated proteins, the relative volume of the spots (% Vol) was compared between the cultivars with analysis of variance by the ImageMaster software. Spot volumes are considered to have significant differences (p < 0.05) between the mean value of each cultivar triplicate compared to other cultivars.

Statistics
The protein contents were expressed as means ± standard deviation of the three replicates. Significant differences (p < 0.05) between results were determined by analysis of variance (ANOVA), Tukey's test and Student's t-test. The software used was STATISTICA version 7.0.
For the principal component analysis (PCA), gels from the four cultivars were compared with each other; the corresponding spots in the twelve gels were selected. These selected data were transformed into log2 and each sample centered on the median. PCA was executed using Software R Language, package 'stats' , function 'prcomp' .

Results and discussion
Plant samples contain different levels of secondary metabolites and nutrients, which vary even within the same species, depending on the stage of maturation, part of the plant or even environmental influences (Hussein & El-Anssary 2019). In 2-DE, these compounds interfere with gel quality as well as in the separation and identification of proteins (Vâlcu & Schlink 2006;Wu et al. 2010). An additional purification step by clean-up kit is suggested for further removal of contaminants such as salts, lipids, nucleic acids and detergents, to improve separation of spots after clean-up (Figs. 2 and 3). The 2-DE clean-up kit was effective to remove interfering substances, improving the gels quality of cowpea, as well as in other samples (Kumar et al. 2017).

Comparison of protein profile
Three gels of each cultivar were obtained for protein profile comparison among cultivars and samples,  (Fig. 4), demonstrated an efficient protein extraction with absence or low concentration of interferents such as salts, carbohydrates, and lipids (Görg et al. 2004). One representative map of each cultivar is shown in Fig. 4. BRS Aracê had the largest number of spots 501 ± 13 (average ± standard deviation), being used as a reference gel for correspondence analysis, BRS Xiquexique presented 496 ± 6 spots, BRS Itaim, 488 ± 16, and BRS Pajeú had the lowest number of spots, 451 ± 5. Two matching analysis were carried out (Table 1), the gel with the higher number of spots, for each matching category, was used as the reference gel. The first  et al. Food Production, Processing and Nutrition (2022) 4:16 matching, per cultivar (was made between three gels of the same cultivar) showed % spots detected higher than 95%, representing similarities between replicates, in the same way, the Pearson's correlation coefficient (r) varied between 0.95 and 0.99 and coefficients of variation smaller than 3.28%. In the second matching, comparing all twelve samples, the % spots detected higher than 95%, also represented similarity among grain protein profiles of four cultivars, as well as the Pearson's correlation coefficient (r) varied between 0.95 and 0.98.  et al. Food Production, Processing and Nutrition (2022) 4:16 Data like the % spots detected and Pearson's correlation coefficient observed in Table 1 show strong similarities among protein profiles, among the same cultivars and among all samples, and the reference gel. In this way, these replicates presented high homogeneity and 2-DE analysis admissible repeatability as well as observed among common bean cultivar by (Rossi et al. 2017). Although a strong correspondence between gels has been observed in Table 1 (above 90% of correspondence between detected spots) when analyzing the percentage of volume of each spot through the PCA, it is verified that there is a difference among cultivars.
PCA was used to reduce the dimensions of almost 300 spots (original variables) with minimal loss of information. This transformation performed on the data, organized them so that the first component is responsible for the highest possible variation, as the second component presents the second largest variability (Hongyu et al. 2015;Jacobsen et al. 2007). The reduction of data complexity promotes better observation of possible  et al. Food Production, Processing and Nutrition (2022)  For PCA, only the spots present in the 12 gels were considered, thus, 297 spots were analyzed as to the volume percentage (% vol). Of the 12 principal components (PC) generated, the first 5 account for 72.73% of the total variation of the data, and the first two components showed in Fig. 5 represented 40.67% of these, 22.61% are allocated in PC1 and 18.06%, in PC2. There is a clear grouping by cultivars (Fig. 5). The cultivars BRS Aracê and BRS Xiquexique were separated by PC1 from the cultivars BRS Itaim and BRS Pajeú, in addition, there is a clear separation between BRS Pajeú and BRS Itaim in PC2. Four cowpea cultivars were planted side by side before protein analysis, so these differences between grain protein profiles of these cowpea cultivars, concerning the variation in the volume percentage of the analyzed spots, occur due to genetic factors (Pullaiahgari et al. 2019;Thiellement et al. 2002). In addition, the primary sequence of proteins also interferes with their position in 2-DE gels, thus, the spots presented in 2-DE gels can be considered genetic markers (He et al. 2015;Pullaiahgari et al. 2019).

Conclusions
We demonstrated that 2-DE was efficient to separate cowpea proteins, allowing visualizing high homogeneity among the grain protein profiles of these four Brazilian cultivars evaluated. The PCA indicated that there is a difference between the protein abundance among the cultivars, which allows this technique to be used as a genetic marker. In view of the significant protein content present in cowpea grains and the emergence of diverse protein sources, studies like this provide relevant information for breeding programs related to the accumulation of proteins in Vigna unguiculata, as well as for food safety.