Abstract
The nucleotide sequence of the Clostridium cellulolyticum endo-beta-1,4-glucanase (EGCCD)-encoding gene, celCCD, and its flanking regions, was determined. The open reading frame encodes a protein (M(r) 66061) which consists of 584 amino acids (aa). The N terminus shows the features of the typical signal peptide, with a cleavage site after Gly24. The protein could be divided into N-terminal and C-terminal regions by an intermediate Pro + Thr-rich sequence. Deletion analysis suggests the C-terminal region is not necessary for EG activity. The predicted aa sequence of the mature protein was similar to those of the central catalytic and the following C-terminal regions of the C. thermocellum endoglucanase H (EGH; identity, 58.8%). The N-terminal region resembled that of the endoglucanase. EGCCA, from C. cellulolyticum (identity, 24.7%; 336 aa) and the endoglucanase, EGE, from C. thermocellum (identity, 31.4%; 373 aa). The C-terminal regions ended with two conserved 21-aa stretches which had close similarity to each other. The C-terminal sequence was also highly similar to the reiterated domain of several EG and a xylanase from C. thermocellum, and of an EG from C. cellulolyticum.