Academia.eduAcademia.edu
Natural Language Engineering http://journals.cambridge.org/NLE Additional services for Natural Language Engineering: Email alerts: Click here Subscriptions: Click here Commercial reprints: Click here Terms of use : Click here BASRAH: an automatic system to identify the meter of Arabic poetry MAYTHAM ALABBAS, ZAINAB A. KHALAF and KHASHAN M. KHASHAN Natural Language Engineering / Volume 20 / Issue 01 / January 2014, pp 131 - 149 DOI: 10.1017/S1351324912000204, Published online: 08 August 2012 Link to this article: http://journals.cambridge.org/abstract_S1351324912000204 How to cite this article: MAYTHAM ALABBAS, ZAINAB A. KHALAF and KHASHAN M. KHASHAN (2014). BASRAH: an automatic system to identify the meter of Arabic poetry. Natural Language Engineering, 20, pp 131-149 doi:10.1017/S1351324912000204 Request Permissions : Click here Downloaded from http://journals.cambridge.org/NLE, IP address: 158.42.28.33 on 12 Mar 2015 Natural Language Engineering 20 (1): 131–149. doi:10.1017/S1351324912000204 c Cambridge University Press 2012  131 BASRAH: an automatic system to identify the meter of Arabic poetry M A Y T H A M A L A B B A S 1 , Z A I N A B A. K H A L A F 1,2 and K H A S H A N M. K H A S H A N 3 1 Department of Computer Science, College of Science, Basrah University, Basrah, Iraq e-mail : maytham.alabbas@gmail.com 2 School of Computer Science, University Science Malaysia (USM), 11800 Penang, Malaysia e-mail : zainab ali2004@yahoo.com 3 P.O. BOX 42821, Riyadh 11551, Saudi Arabia e-mail : khashan kh@yahoo.com (Received 25 November 2010; revised 14 May 2012; accepted 31 May 2012; first published online 8 August 2012 ) Abstract Arabic prosody is the science that studies the music of Arabic poetry, which is mainly meter and rhyme. The identification of meters for Arabic verses or poems is a complicated task. This task requires a certain level of expertise to identify the meter to which a verse belongs. In this paper, we present BASRAH,1 a system that automatically identifies the meter of Arabic poetry by using the numerical prosody method. The numerical prosody method depends on verse coding, which is derived from the general concept of Al-Khalil’s feet by using two primary units (cord = 2) and (peg = 3). On testing both old and modern Arabic verses and poems, BASRAH has proved to be an efficient tool to help inexperienced users to determine the meter of Arabic verses and poems. 1 Introduction ) is a form of metrical speech with a Arabic poetry (Alšςr Alςrby,2 rhyme. The rhyme in Arabic poetry is achieved by every line of the poem ending upon a specific tone. Arabic poetry is categorized into two main types: rhymed (or measured), and prose, with the former greatly outnumbering the latter. The rhymed ) collected and explained by poetry falls within fifteen different meters (wzn, famous lexicographist, grammarian and prosodist Al-Khalil Bin Ahmad Al-Farahidi 1 2 The system is named after the native city of Al-Khalil, the father of Arabic prosody. The transcription of Arabic examples in this paper follows the Habash–Soudi–Buckwalter (HSB) transliteration scheme (Habash et al. 2007) for transcribing Arabic symbols. This scheme extends Buckwalter’s scheme to increase its readability while maintaining the oneto-one correspondence with Arabic orthography as represented in standard encodings of Arabic, such as Unicode. The following is the HSB transliteration map with different Buckwalter scheme values indicated in parentheses: Ā (|), Â (>), ŵ (&), Ǎ (<), ŷ ({), h̄ (p), θ (v), D (∗ ), š ($), Ď (Z), ς (E), γ (g), ý (Y), ã (F), ũ (N), ı̃ (K), . (o). 132 M. Alabbas et al. Table 1. The Arabic meters Meter English Name Standard patterns for both hemistiches . . . . . . . . . . . . . . . . (791 A.D.) in what is known as the science of prosody (ςlm AlςrwD, ) (AlKatib 1971; Kushek 2004). His elaborate circle system remains directly influential in theories of meters to this day. Al-Akhfash later added one more meter to make sixteen meters, as listed in Table 1 (Al-‘edany 2001; Mustajeer 2005). Arabic poetry must follow one of these to be correct (Uthman 2004; Mustafa 2005). The meter of ). The measuring unit of the meter the rhythmical poetry is known as sea (bHr, ), with every meter containing a certain number is known as a foot (tfςylh̄, ) of the poem. A line of feet that the poet has to observe in every verse (byt, BASRAH, a system to identify meter of Arabic poetry 133 ) or hemistiches (mSrς, ). The of a verse is divided into two halves (šTr, measuring procedure of a poem is very rigorous. Sometimes adding or removing a consonant or a vowel can shift the verse from one meter to another. Some of these meters can be identified, depending on the number of units (syllables) in each hemistich, with other sub-types such as completed (tAm, ) meters, which contain ) meters, which are verses where the four units, or brachycatalectic (mjzw’, ) ‘in a last foot in both hemistiches is omitted (the last foot is called (ςrwD, ) in the second narrow sense’ in the first hemistich, whereas it is called (Drb, hemistich). For instance, AlTwyl/The Long meter is always completed, whereas Alhzj/The Trilling meter is always brachycatalectic. On the other hand, Alrjz/The Trembling, AlbsyT/The Outspread and Alrml/The Running meters could be either completed or brachycatalectic. Furthermore, each one of these sixteen meters has many forms due to different forms of foot for each meter. So there are about 17,292 different forms representing sixteen meters for Arabic prosody. Also, in rhymed poetry, every verse has to end with the same rhyme (qAfyh̄, ) throughout the poem. The rest of the paper is organized as follows. Section 2 introduces an overview about the Arabic prosody and its basic elements. A brief review of related works are given in Section 3. Section 4 explains the algorithm for converting the dictation form to the prosodic form. The numerical prosody method description will be given in Section 5. Section 6 explains the BASRAH system and Section 7 reports the experiments performed using BASRAH. Finally, Section 8 presents the conclusions. 2 Arabic prosody and its basic elements Arabic prosody, which was established by Al-Khalil, can be defined as, a science by which the right meter is recognized as opposed to a wrong one (Ahmad 2006; Etmeesh 2006). In his analysis, Al-Khalil observed that every verse consists of an ), which are the letters identical sequence of vowelless consonants (swAkn, ) or the prolongation letters ( , i.e., Alif ( ), provided with Sukun ( ), which are the letters Waw ( ) and Ya ( )) and vowelizes (mtHrkAt, ), Fatha ( ) and Kasra ( ) provided with diacritical marks, such as Damma ( (Maling 1973; Abdullateef 2007; Wajeeh 2007). These are governed by determinate collocations of easily distinguishable rhythmic elements of a fixed length, which he ), with elements of variable length, which he called cords called pegs (ÂwtAd, ). A cord consists of two letters, while a peg has three letters. A cord (ÂsbAb, which is composed of a vowelize followed by a vowelless consonant is called a light ), whereas it is called a heavy cord (sbb θqyl, ) cord (sbb xfyf, if it consists of two vowelizes (Abdullateef 2007; Wajeeh 2007). A peg which is composed of two vowelizes followed by a vowelless consonant is called a joined peg ), while it is called a separated peg (wtd mfrwq, ) (wtd mjmwς, if it consists of two vowelizes separated by a vowelless consonant (Abdullateef 2007; Wajeeh 2007). Some prosodists refer to another rhythmic element called ), which is a combination of three or four vowelizes followed interspace (fASlh̄, by a vowelless consonant. An interspace is called a small interspace (fASlh̄ Sγrý, 134 M. Alabbas et al. ) when it consists of three vowelizes followed by a vowelless consonant, ) when four vowelizes are and called a large interspace (fASlh̄ kbrý, followed by a vowelless consonant (Harkat 2007). From these three elements, AlKhalil determined larger entities called feet, which can be combined in different ways to generate the traditional meters of Arabic prosody. A foot must contain double cords or a peg and a cord, and must not contain two pegs or three consecutive cords. In order to grasp the above analysis, it has been compressed into a sentence: lam. Âra ςalaý Ďah.r jabalı̃ samakatã (lit.: I did not see a fish on top of a mountain). Ďah.r is ςalaý is joined peg, Here, lam is light cord, Âra is heavy cord, separated peg, jabalĩ is small interspace and samakatã is large interspace. These names, like most of the metrical terminology, have been borrowed from Bedouin life, especially from the tent. Prosodists claim that there are ten feet ) that were formed from cords, known as primary feet (tfAςyl ÂsAsyh̄, pegs and interspaces. These feet are (Al-Taweel 2006; Abdullateef 2007; Wajeeh ), mafaAςiyln ( ), mufaAςaltun. 2007; Khalil 2009) as follows: faςuwlan ( ), fAςi lAtun. ( ), faAςiln ( ), fAςilAatun. ( ), mus.tafςilun. ( ), mutafaAςiln ( ), maf.ςuwlAatu ( ) and mus.taf.ς lun. ( ). ( Minor changes may arise in some parts of the primary feet, which result in new ) (Mukhtar 1985; Al-‘Ali types of feet called alternative feet (tfAςyl bdylh̄, ), 1998; Isa 2010). The first alteration is called minor relaxations (zHAfAt, which affect the cords of a verse. These relaxations are divided into twelve types ), Alkf ( ), AlTy ( ), etc. The other, called (Shittu 2006), such as Alxbn ( ), only affects the end of the last foot of a hemistich major defects or diseases (ςll, (Mukhtar 1985). It arises through addition or omission on that basis. These defects ), AlqTς ( ), also divide into twelve types (Shittu 2006), such as AlqTf ( ) etc. Albtr ( 3 Related work Some attempts have been made to automate the process of determining the meter of Arabic verses. Among these attempts is Al-‘edany’s system (Al-‘edany 2001), which is based on Al-Hanafy’s method (Al-Hanafi 1991). It takes as input a fully diacritized verse and outputs different information, such as the prosodic form, verse’s codes in terms of the numbers (1, 2 and 3), feet and places of segmentation. This system does not specify the types of relaxations and defects. It recognizes only thirteen meters. Al-Hussain’s system (Al-Hussian, n.d.) is another attempt in this respect. It takes as input the code of one of the verse’s hemistiches and outputs the hemistich’s meter, feet, relaxations and defects. This system does not specify the places of segmentation for the verse. It recognizes all sixteen meters. The Khalaf, Shahed and Ali’s system (2009), which is based on Al-Katib’s (1971) method, recognizes all meters. It takes as input a partially diacritized verses (which contain the diacritics like Sukun, Tanwiyn, assimilation and end of each hemistich) and outputs the verse’s meter, the prosodic form, the verse’s code in terms of the BASRAH, a system to identify meter of Arabic poetry 135 numbers (1, 2, 4, 8 and 16), feet, place of segmentation and the types of relaxations and defects. The current system, which we call BASRAH, is a step forward in automating the process of Arabic prosody. It uses the numerical prosody method (Khashan 2003, 2004, 2005, 2006), which depends on numerical patterns, and not on feet like the previous methods, to specify the verse’s or poem’s meter(s). BASRAH recognizes not only all Arabic meters but can also recognize Arabic poetry rhythms. BASRAH also identifies the types of relaxations and defects by using alternative codes that are derived from the codes of sixteen primary meters. 4 Prosodic form Diacritization in Arabic is done by adding special symbols called diacritical marks (HrkAt, ) to help in spoken language. Some of these special symbols are put , above normal Arabic characters, such as the short vowels, known as Damma ( , u), Fatha ( , , a), and a zero vowel, known as Sukun ( , , .), while others are , , i). For example, put under them, such as the short vowel, known as Kasra ( ) ‘I wrote’ and katab.ta ( ) ‘you (masculine) wrote’. katab.tu ( Arabic prosody is considered a phonetic science. It depends on pronounced, not on written, letters. The prosodic form is based on the following principle rule: Only the pronounced sounds are written down, even if they have no corresponding letters in dictation form. Also, what is not pronounced is left unprinted, even if it has a corresponding letter in dictation form. (Khalaf et al. 2009) Accordingly, some letters are either inserted or deleted in the prosodic form. The prosodic form is an essential step for any correct start to identify the meter of the verse because it represents the verbal components for any verse in order to facilitate the next steps of processing (Al-‘edany 2001). This task needs a human knowledge level in the rules of Arabic and words diacritization, as well as the correct pronunciation of the Arabic lexical items. The following algorithm (Al-Katib 1971; Al-‘edany 2001; Khalaf et al. 2009) shows the steps applied to convert the Arabic verse from the dictation form to the prosodic form: (1) Duplicate the geminated letters (having Shadda ( , , ∼) over it) by making the first one a vowelless consonant (zero vowel) and the other a vowelize. For ). instance, md∼a ( ) becomes md.da ( (2) Duplicate the prolongated Alif ( , Ā) by making the first one a vowelize and the other a vowelless consonant. , (3) Replace any Tanwiyn3 (or Nunation), known as Tanwiyn Damm ( , , ã or , Aã) and Tanwiyn Kaser ( , ı̃), by ũ), Tanwiyn Fath ( ) becomes jnwbun. ( ). noon with Sukun ( , n.). For instance, jnwbũ ( 3 Nunation is an indefinite morpheme consisting of a short vowel followed by the phoneme /n/. Nunation is represented using a unique diacritic that has the shape of two of the diacritics of the short vowel (Habash 2010). 136 M. Alabbas et al. (4) Delete the conjunctive Hamza ( ) within a word from the dictation ) becomes wktŷAb ( ). form. For instance, wAktŷAb ( (5) Write down each pronounced letter (e.g., dagger Alif). Like: hðA ( ) becomes hAðA ( ). Also, leave unprinted each letter that is not pronounced. For ) becomes ktbw ( ). instance, ktbwA ( , ) from the definite article known as (6) Delete the assimilated Lam ( Al ( ), which exists before any one of the Sun letters or Solar letters (AlHrwf , t , θ , d , ð , r , z , s , š , S , D , Alšmsyh̄, T , Ď , l and n ) and duplicate the Sun letter (making the first one a ) vowelless consonant and the other a vowelize). For instance, Alš∼ams ( ). becomes Aš.šams ( (7) Replace any of the short vowels Damma, Fatha and Kasra, which appears over the letter Ha (h ) at the end of a word or over the end of a hemistich, by its corresponding letters known as Waw (w ), Alif (A ) and Ya (y ), respectively. (8) Delete any other special symbols ( , !, (, ), . . .) which exist in the verse. For ) becomes mn hAðA ( ). instance, mn hðA? ( (9) Finally, the first vowelless consonant letter from a pair of consecutive vowelless consonant letters is deleted except where this pair appears at the end of each hemistich. The result of this process contains the prosodic form only, i.e., the Arabic letters diacritized with the short vowels Damma, Fatha and Kasra, and the zero vowel Sukun. Figure 1 shows an example of Arabic verse in dictation form and its prosodic form. 5 Numerical prosody method The numerical prosody method (Khashan 2003, 2004, 2005, 2006) is an approach for presenting Al-Khalil prosody by using numbers as a form instead of feet. It aims at a comprehensive understanding that does not pay attention to terminology or ritual, and uses minimal basic necessary tools, which, in the case of prosody, are numbers (cord = 2) and (peg = 3). Thus, numerical prosody manifests itself in simple, short, almost mathematical rules representing a program. In addition, it does not burden man with the terms that loaded this science. ) is a heptameter, this means For instance, when we say that fAςilaAtun. ( that it contains seven letters: (fA = 10 = 2)4 + (ςilaA = 110 = 3) + (tun. = 10 = 2) = 7 diacritical marks (vowelless consonants and vowelizes). So the Arabic prosody is numerical from its beginning. 4 These codes are explained below. Note that these codes are not the interpretation of the sequences of 0s and 1s as binary numbers. 137 BASRAH, a system to identify meter of Arabic poetry Fig. 1. (Colour online) An example of Arabic verse in dictation form (first line), and its prosodic form (second line). The following Arabic verse is taken as an example to explain the steps of the numerical prosody: (lam. yaςud. qaw.miy. kamaA kaAnuwA wamaA ÂH.snwA γy.r Alt∼ajAfy. ςmlA) In numerical prosody two steps are used to indicate the meter for any verse: (1) Representing each letter provided with sukun or the prolongation letters (i.e., Alif ( ), Waw ( ) and Ya ( )); for example, m. ( ) by code ‘0’, and everything else by code ‘1’, for example, qa ( ).5 The prosodic form of the first hemistich of the previous example verse is lam. yaςud. qaw.miy. kamaA kaAnuw wamaA So it can be written by these two codes (1 and 0) as follows: la 1 5 m. 0 ya 1 u 1 d. 0 qa 1 w. 0 mi 1 y 0 ka 1 ma 1 A 0 ka 1 A 0 nu 1 w 0 wa 1 ma 1 A 0 Here the letters without diacritics in the prosodic form are considered as vowelizes because we work with partially diacritized verses. 138 M. Alabbas et al. (2) Grouping the binary codes into segments. Each segment must start with (1) and end with (0). Certain patterns of these segments are significant, in particular 110 (joined peg) and 10 (light cord). We segment the verse by matching the longest prefix that matches one of these, and we assign a code to each segment (i.e., 110 = 3 and 10 = 2). Sometimes there is still one vowelize letter (1) alone; in this case we leave it (e.g. 1110 (small interspace) = 13). Then we simplify (222 = 6 and 22 = 4). So the first hemistich of the previous example verse can be coded using cord and peg codes as follows: la 1 m. 0 2 cord 2 ya 1 u d. 1 0 3 peg 3 qa 1 w. 0 mi y 1 0 2 cord 2 cord 4 ka 1 ma 1 3 peg 3 A 0 ka 1 A 0 nu 1 2 cord w 0 2 cord 4 wa 1 ma 1 3 peg 3 A 0 Using the numbers as a form soon revealed mathematical properties for Arabic poetry verses’ rhythm, which could be inferred from meters and their circles. The numerical prosody codes did not exceed the code (0, 1, 2, 3, 4 and 6), here lies the importance of the numerical prosody for its simplicity, accuracy and ability to be programmed. For this reason we used this method in our system to identify the meter of Arabic poetry. The whole set of features for the Arabic prosody is summarized in the following specific rules: ) (1) There are two main rhythms for Arabic poetry known as Amble (xbby, ) rhythms. These two rhythms are overlapped according and Naval (bHry, ). to specific rules, which are known as AltxAb ( (2) The Amble rhythm consists of the equivalent of light cord and heavy cord, where one of them is taking the place of the other, so there is no relaxation in the Amble rhythm. (3) The Naval rhythm consists of alternations of light cord 2 and peg 3. All cords in this rhythm must be light cords only and able to do relaxation. (4) The Amble and Naval rhythms are overlapped in interspace 22 in the two meters (AlkAml/The Perfect and AlwAfr/The Exuberant), which is called AltxAb. The summary of these rules is that there are mathematical conditions that determine and describe the Arabic properties through which the meter is accepted. Also, there is a mathematical framework that controls these properties and the numerical prosody comes to determine proprieties’ features. 6 BASRAH system The system described in this paper, i.e., BASRAH, is used to identify the Arabic verse’s or poem’s meter using the numerical prosody method (Khashan 2003, 2004, 2005, 2006). BASRAH takes as input a partially diacritized verse and outputs the verse’s meter, the prosodic form, the verse’s code, place of segmentation and the 139 BASRAH, a system to identify meter of Arabic poetry types of relaxations and defects. BASRAH has many advantages when compared with the previous systems that were explained in Section 3. These advantages are as follows: (1) All the previous systems depended on the feet to determine the meter of the Arabic verse, while BASRAH depends on mathematical patterns. The feet in BASRAH can be inferred from the mathematical patterns. This makes BASRAH simpler and faster than the other systems. (2) The Previous systems find the meter for verses only, whereas BASRAH does so for both verses and poems. ) rhythm as a part of (3) The Previous systems identify Amble (Alxbb, AlmtdArk/The Continuous meter, while in BASRAH it is identified as a separated notion. This makes the results of BASRAH nearest to the thought of the most of the Arabic prosody human experts. (4) BASRAH like Khalaf et al.’s (2009) system works on partially diacritized verses (as input), whereas other systems work on fully diacritized verses (Al-‘edany 2001) or the numerical coding of the prosodic form (Al-Hussian, n.d.). To make BASRAH faster and effective, we created general standard patterns for each meter, instead of saving all the meter codes (see Section 5) as in the other systems. Sometimes, for the same meter there is more than one standard pattern. For example, the standard pattern of AlTwyl/The Long meter is coded as follows6 : First hemistich (Sdr, ) Second hemistich ( , ) 7 3 [1|2] 3 [21|3|4] 3 [1|2] 3 3 3 [1|2] 3 [21|3|4] 3 [1|2] 3 [2|4|3] In the above code, the square brackets [ ] mean that only one number between the brackets is selected. In the standard pattern of AlmqtDb/the Loppped meter, the codes of both hemistiches are equivalent, as shown in the following pattern: First hemistich (Sdr, [6|32|23] 3 1 3 ) Second hemistich ( , [6|32|23] 3 1 3 ) BASRAH proceeds in four stages, as follows: Stage 1: This stage is responsible for inputting partial diacritization of Arabic verse or poem from the keyboard or selecting from the stored database system. Then the input is checked, whether it is acceptable according to the rules of Arabic poetry writing (i.e., does not contain numbers or special symbols) or not. If the input verse or poem is incorrect, an error message is displayed, otherwise the processing continues. 6 7 Arabic is written right-to-left. In the current paper, all meter codes are left-to-right, which are equivalent to verses’ transliteration. This number becomes 4 when the verse is the first one in the poem only, otherwise it is still 3. 140 M. Alabbas et al. Stage 2: This stage uses the algorithm that was explained in Section 4 to convert the verse (or each verse in the poem) from the dictation form to the prosodic form. Stage 3: This stage uses the principles of numerical prosody, which was described in Section 5, to replace the prosodic form for the verse (or each verse in the poem) into its equivalent numerical code in terms of the codes (0,8 1, 2, 3, 4 and 6) only. Stage 4: In this stage, the numeric code of the verse (or each verse in the poem) is compared with all general patterns for the sixteen meters to identify the meter(s) of the verse (or each verse in the poem) and then display the meter(s) if it is found. Otherwise ‘unknown meter’ is displayed. Furthermore, when the input is a poem, the meter of each verse, as well as the meter of the whole poem, is displayed. Finally, if the verse or poem belongs to at least one meter, the user can save the input in a database system if it is not found in it. 7 Experimental results To evaluate the effectiveness of BASRAH, we collected two corpora. The first corpus is a verse corpus that contains a set of 3,000 old and modern Arabic verses. The second corpus is a poem corpus that contains a set of over 500 old and modern Arabic poems (3,459 verses). Both corpora contain fully and partially diacritized verses and poems. These corpora are annotated by human experts and considered the gold standard corpora. Most of these samples are taken from the following websites: Adab: Al-Mawso‘a Al-‘lmya ll-Shi‘r Al-‘arabi,9 Awzan Al-Shi‘r Al-‘arabi,10 and Mawso‘at Al-Shi‘r Al-‘arabi.11 BASRAH achieves an overall 98.6 per cent precision and 98.1 per cent recall when it is tested on the verse corpus compared with 97.6 per cent precision and 96.3 per cent recall for Khalaf et al.’s (2009) system12 on the same corpus as shown in Table 2. For all types of verses, BASRAH, which uses the numerical prosody method, gives better results than Khalaf et al.’s (2009) system, which uses Al-Katib’s method. This is because the numerical prosody method codes cover a wider range of Arabic samples than those covered by Al-Katib’s method codes, which did not consider the old Arabic prosody problems (i.e., unreal relaxations and defects). As shown in Table 2, BASRAH achieves 100 per cent precision and recall for AlTwyl/The Long, AlwAfr/The Exuberant, AlmtqArb/The Tripping, Alrml/The Running, Alhzj/The Trilling, AlmqtDb/The Lopped, AlmtdArk/The Continuous and AlmDArς/The Similar meters compared with AlwAfr/The Exuberant meter for Khalaf et al.’s 8 9 10 11 12 We used the code (0) in the current paper instead of the code ( ) in the original numerical prosody method for simplicity. Available at: www.adab.com. Available at: http://awzan.com/index.htm. Mo’asasat Mohammed bin Rashid al-Maktoum, available at: www.arpoetry.com. This system is the previous system for the first two authors. 141 BASRAH, a system to identify meter of Arabic poetry Table 2. BASRAH’s precision (P) and recall (R) compared with Khalaf et al.’s (2009) system, verse corpus BASRAH 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Khalaf et al.’s (2009) system Meter # Verses % P (%) R (%) P (%) R (%) AlTwyl/The Long AlkAml/The Perfect AlbsyT/The Outspread AlwAfr/The Exuberant Alrjz/The Trembling Alsryς/The Swift AlmtqArb/The Tripping Alxfyf/The Nimble Almdyd/The Extended Alrml/The Running AlmnsrH/The Flowing Alhzj/The Trilling Almjtθ/The Amputated AlmqtDb/The Lopped AlmtdArk/The Continuous AlmDArς/The Similar Total 350 320 255 49 133 300 255 252 156 320 150 37 25 45 323 30 3,000 11.6 10.6 8.5 1.6 4.4 10 8.5 8.4 5.2 10.6 5 1.2 0.8 1.5 10.7 1 – 100 97.8 98.4 100 91.7 95.6 100 100 96.7 100 100 100 96 100 100 100 98.6 100 96.9 98 100 91.7 94 100 99.2 94.9 100 98 100 96 100 100 100 98.1 98 95.8 98.8 100 93.8 95.5 99.2 99.2 95.4 98.4 95.3 97.2 96 97.8 99.7 96.6 97.6 98 93.1 97.3 100 90.2 92.3 99.2 99.6 93 98.4 94.7 94.6 96 97.8 99.4 93.3 96.3 (2009) system. On the other hand, Alrjz/The Trembling meter is the lowest precision and recall for both systems (91.7 per cent precision and 91.7 per cent recall in BASRAH compared with 93.8 per cent precision and 90.2 per cent recall in Khalaf et al.’s (2009) system). This is due to the fact that this meter is overlapped with AlkAml/The Perfect meter (the same case with Alsryς/The Swift meter, which is overlapped by AlkAml/The Perfect as shown in Example 5). We also got 100 per cent accuracy when we tested BASRAH on the poem corpus. This is due to the fact that in a tested poem the system finds the meter for each verse in the poem and then finds the meter for the whole poem. So it gives more accurate result than a tested verse. BASRAH, for instance, identifies the verse # 197713 in verses corpus as having AlbsyT/The Outspread meter, which is correct, as shown in Example 1. In each example below, lines represent the verse’s transliteration, the prosodic form, segmentation and code respectively. 13 We use partially diacritized examples in Section 6 (Experimental Results) to show the strength of BASRAH because these samples are considered difficult for non-expert users. 142 M. Alabbas et al. (Sdr, ) hðA Al~ðy. t .rf Al.bT.HA' wT.Âthu hAð l.lðy. t .rf l.bT.HA' wT.Âthw hA- ðl.- lðy.- t .- rfl.- bT.- HA- 'wT.- Â- thw 2 - 2- 3 - 2- 3 - 2 - 2 - 3 -1- 3 ( , ) wAl.by.t y .rfhu wAl.Hl~ wAl.Hrmu wl.by.t y .rfhw wl.Hl.l wl.Hrmw wl.- by.- ty .- r- fhw- wl.- Hl.- lwl.- H- rmw 2 - 2 - 3 - 1- 3 - 2 - 2 - 3 - 1- 3 Example 1 BASRAH gives ‘unknown meter’ or gives incorrect results if there is an error in the diacritization of the input verses (e.g., if someone entered Example 1 by changing the diacritical mark of each hemistich ending from Damma to Sukun) as shown in Example 2. First hemistich (Sdr, ) hðA Al~ðy. t .rf Al.bT.HA' wT.Âth. hAð l.lðy. t .rf l.bT.HA' wT.Âth. hA- ðl.- lðy.- t .- rfl.- bT.- HA- 'wT.- Âth. 2 - 2- 3 - 2- 3 - 2 - 2 - 3 - 3 Second hemistich ( , ) wAl.by.t y .rfhu wAl.Hl~ wAl.Hrm. wl.by.t y .rfhw wl.Hl.l wl.Hrm. wl.- by.- ty .- r- fhw- wl.- Hl.- lwl.- Hrm. 2 - 2 - 3 - 1- 3 - 2 - 2 - 3 - 3 Example 2 Here BASRAH identifies this verse as ‘unknown meter’ because the final code does not belong to any meter codes, which corresponds to the judgment of a human expert because the verse’s diacritization is not correct. BASRAH also gives ‘unknown meter’ if there is an error in the dictation form of the input verses. BASRAH can also identify the sub-types of meters, e.g., if they are completed or brachycatalectic or other. For example, the verse in Example 3 is identified correctly as having brachycatalectic of AlwAfr/The Exuberant ( ). Example 3 As we have mentioned before, BASRAH depends on codes and not on feet as the other systems do. It therefore identifies the types of relaxations and defects (which these types related to the feet method) by using alternative codes that are derived from the codes of sixteen primary meters rather than by using feet. BASRAH also works perfectly on both fully and partially diacritized input verses. Since BASRAH does not work so well on non-diacritized verses, our plan therefore BASRAH, a system to identify meter of Arabic poetry 143 is to use an Arabic diacritizer system (e.g., MADA; Habash, Rambow and Roth 2009) for the input stage as the future work for BASRAH. Example 4 In Example 4, in spite of the fact that the input verse is non-diacritized, BASRAH correctly identifies this verse as having AlmtqArb/The Tripping meter. This is a very rare case, especially when the verse contains the prolongation letters in the segmentation positions. When testing poems, BASRAH assigns the commonest meter of the verses that make up a poem. For instance, BASRAH identifies the following poem, which contains four verses, as having Alsryς/The Swift meter, not AlkAml/The Perfect meter. This is because the first verse belongs to Alsryς/The Swift meter; whereas the other three verses belong to both AlkAml/The Perfect and Alsryς/The Swift meters. BASRAH, therefore, identifies the poem as having Alsryς/The Swift meter, because this is the commonest meter for the individual verses. Example 5 Some of BASRAH’s result screens are shown in the Appendix. 8 Conclusions Arabic prosody is the science that studies the music of Arabic poetry, which is mainly meter and rhyme. The identification of meters for Arabic poetry verses or 144 M. Alabbas et al. poems is a complicated task. This task needs human expertise to identify the meter of poetry’s verse. This paper is an attempt to simplify Arabic prosody by utilizing the computer to help inexperienced users to identify the meter of Arabic verses or poems and report the correctness of the verse (or poem). BASRAH uses the numerical prosody method to achieve its aim through coding the verse using (0, 1, 2, 3, 4 and 6) codes only. BASRAH was tested on a large set of old and modern Arabic verses and poems. We have shown that using numbers, by coding the vowel letter as ‘1’ and the consonant letter as ‘0’, instead of feet to identify the Arabic verse helps to reduce the burden of many terms that overload the Arabic prosody and make it a thorny and complex discipline. This will provide a new way of thinking about Arabic poetry itself, and will also open the door to applications of these ideas to another Arabic field such as music. BASRAH achieves an overall 98.6 per cent precision and 98.1 per cent recall versus 97.6 per cent precision and 96.3 per cent recall by using Khalaf et al.’s (2009) system over the same set of 3,000 old and modern Arabic verses. This is because our previous system used Al-Katib’s method, which did not take into consideration the old Arabic prosody problems (i.e., unreal relaxations and defects). So BASRAH, which uses the numerical prosody method, can identify some verses that Khalaf et al.’s (2009) system cannot identify because the codes of these verses do not match the Al-Katib’s codes, whereas they do match the numerical prosody method codes. On the other hand, BASRAH achieves 100 per cent accuracy over a set of 500 Arabic poems. It is characterized by accuracy and simplicity. We therefore intend to add some other helpful information about Arabic prosody, such as general information about prosody, the definition of each meter, examples for each meter with its feet, the prosodic form conversion algorithm, spoken verses and poems etc., to enrich BASRAH’s user knowledge, because we hope that it might be useful as an educational aid. We speculate that further work by adding an Arabic diacritizer (e.g., MADA), which plays a vital role in these applications, for the input stage might further improve BASRAH’s results. Acknowledgments We would like to thank Professor Allan Ramsay (The University of Manchester, UK), Dr. Yasser Sabtan (Al-Azhar University, Eygpt), Dr. Yavor Nenov (Oxford University, UK), Fatimah Furaiji (University of Szczecin, Poland), Siham Al-Rikabi (Humboldt University of Berlin, Germany) and Khamis Al-Qubaeissy (Manchester Metropolitan University, UK) for important suggestions and helpful discussions. We would also like to extend our thanks to the anonymous reviewers for their helpful comments. Zainab owes her deepest gratitude to USM and TWAS for financial support in her PhD study. BASRAH, a system to identify meter of Arabic poetry 145 Appendix: Current System Example Screens14 Fig. A1. (Colour online) AlTwyl/The Long meter example. Fig. A2. (Colour online) AlmtqArb/The Tripping meter example. 14 In the current screen, all meter codes are right-to-left depending on Arabic transcription. 146 M. Alabbas et al. Fig. A3. (Colour online) AlkAml/The Perfect or Alrjz/The Trembling meter example. Fig. A4. (Colour online) AlkAml/The Perfect or Alsryς/The Swift meter example. BASRAH, a system to identify meter of Arabic poetry Fig. A5. (Colour online) AlmnsrH/The Flowing meter example. Fig. A6. (Colour online) Poem test example (identifying the meter for each verse). 147 148 M. Alabbas et al. Fig. A7. (Colour online) Poem test example (the meter for the whole poem, which is Alsryς/The Swift meter). References (Al-byna’ Al-ςrwDy llqSydh̄ Al-ςrbyh̄). Abdullateef, M. 2007. Cairo, Egypt: Dar Ghareeb llteba‘a wa al-Nasher. (Al-kAfy fy ςlmy Al-ςruD wa Al-qwAfy), 1st Ahmad, S. 2006. ed. Cairo, Egypt: Mktabat al-Thaqafa al-Denya. (Al-mysr Al-kAfy fy Al-ςrwD wa Al-qwAfy). Al-‘Ali, F. 1998. Amman, Jordan: Dar al-Thaqafah Llnasher wa al-Tawzee‘. (Hwsbt ςlm Al-ςrwD Al-ςrby). Basrah, Iraq: Al-‘edany, J. 2001. University of Basrah. (Al-ςrwD thðybh wa ĂςAdt tdwynh). Baghdad, Al-Hanafi, J. 1991. Iraq: Dar al-Sh’oon al-Thaqafya al-‘ama. Al-Hussian, A. n.d. Program of Azkary Al-Hussian for Al-ςrwD Al-ςrby. http://azahou45.free.fr/arod1.php Accessed Sep 2010. (Al-šςr Al-ςrby bAstςmAl Al-ArqAm Al-Katib, M. 1971. Al-θnA’yh̄). Basrah, Iraq: Mtba‘t Mslahat al-Moany’ al-Iraqia. “ ” (fy ςrwD Al-šςr Al-ςrby Al-Taweel, M. 2006. “qDAyA wa mnAkšAt”). Cairo, Egypt: Dar Ghareeb Llteba‘a wa al-Nasher. (thwylAt Al-šjrh̄, drAsh̄ fy Etmeesh, M. 2006. mwsyqh̄ Al-šςr Al-jdyd). Baghdad, Iraq: Dar al-Sh’oon al-Thaqafya al-‘ama. Habash, N. 2010. Introduction to Arabic natural language processing. Synthesis Lectures on Human Language Technologies 3(1): 1–187. Habash, N., Rambow, O., and Roth, R. 2009. Mada+ tokan: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. Paper presented at the Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), pp. 242–254. Cairo, Eygpt. BASRAH, a system to identify meter of Arabic poetry 149 Habash, N., Soudi, A., and Buckwalter, T. 2007. On Arabic transliteration. In Abdelhadi Soudi, Antal van den Bosch and Günter Neumann (eds.), Arabic Computational Morphology: Knowledge-Based and Empirical Methods, pp. 15–22. New York: Springer. (Al-lsAnyAt Al-ryADyh̄ wa Al-ςrwD). Beirut, Harkat, M. 2007. Lebanon: Dar al-Hadatha Llteba‘a wa al-Nasher. (Al-ςrwD Al-ςrby wa mHAwlAt AlIsa, F. 2010. tTwyr wa Al-tjdyd fyh), 1st ed. Alexandria, Egypt: Dar al-Ma’rifah al-Jami’iah. (Hwsbh̄ mwAzyn Al-šςr Khalaf, Z., Shahed, M., and Ali, S. 2009. Al-ςrby). University of Sharjah Journal of Pure and Applied Sciences 6(1): 41–62. (ςrwD Al-šςr Al-ςrby), 1st ed. Amman, Jordan: Dar Khalil, I. 2009. al-Masyrah lltyba’ah wa al-Nashr. (Al-xlyl wa Al-ςrwD Al-rqmy 1). Journal of Khashan, K. 2003. 1 Arabic Linguistics Tradition (JALT) 1: 25–34. (Al-xlyl wa Al-ςrwD Al-rqmy 2). Journal of Khashan, K. 2004. 2 Arabic Linguistics Tradition (JALT) 2: 1–12. (Al-xlyl wa Al-ςrwD Al-rqmy 3). Journal of Khashan, K. 2005. 3 Arabic Linguistics Tradition (JALT) 3: 24–47. (Al-xlyl wa Al-ςrwD Al-rqmy 4). Journal of Khashan, K. 2006. 4 Arabic Linguistics Tradition (JALT) 4: 46–67. (mHAwlAt Al-tjdyd fy ĂyqAς Al-šςr). Cairo, Kushek, A. 2004. Egypt: Dar Ghareeb Llteba‘a wa al-Nasher. Maling, J. 1973. The Theory of Classical Arabic Metrics. Cambridge, MA: MIT. (dAŷrh̄ Al-wHdh̄ fy ÂwzAn Al-šςr AlMukhtar, A. 1985. ςrby). Tunis, Tunisia: al-Monadhama al-Arabia Lltarbya wa al-Thaqafa wa al-‘llom. (Âhdý sbyl Ălý ςlmy Al-xlyl, Mustafa, M. 2005. Al-ςrwD wa Al-qwAfy). Beirut, Lebanon: ‘alaam al-Kutoob Llteba‘a wa al-Nasher wa al-Tawzee’. (mdxl ryAdy Ălý ςrwD Al-šςr AlMustajeer, A. 2005. ςrby). Cairo, Egypt: Dar al-’ayn Llnasher. Shittu, S. 2006. Rules of metrics, alterations and addition in Arabic prosody. An Encyclopaedia of The Art 2(1): 1–5. (Al-mršd Al-wAfy fy Al-ςrwD wa AlUthman, M. 2004. qwafy). Beirut, Lebanon: Dar al-Kutoob al-‘lmya. (Al-ςrwD wa Al-qAfyh̄ byn Al-trAθ wa Wajeeh, M. 2007. Al-tjdyd). Cairo, Egypt: Mo‘ssat al-Mukhtar Llnasher wa al-Tawzee‘.