Natural Language Engineering
http://journals.cambridge.org/NLE
Additional services for Natural
Language Engineering:
Email alerts: Click here
Subscriptions: Click here
Commercial reprints: Click here
Terms of use : Click here
BASRAH: an automatic system to identify the meter of
Arabic poetry
MAYTHAM ALABBAS, ZAINAB A. KHALAF and KHASHAN M. KHASHAN
Natural Language Engineering / Volume 20 / Issue 01 / January 2014, pp 131 - 149
DOI: 10.1017/S1351324912000204, Published online: 08 August 2012
Link to this article: http://journals.cambridge.org/abstract_S1351324912000204
How to cite this article:
MAYTHAM ALABBAS, ZAINAB A. KHALAF and KHASHAN M. KHASHAN (2014). BASRAH: an
automatic system to identify the meter of Arabic poetry. Natural Language Engineering, 20, pp
131-149 doi:10.1017/S1351324912000204
Request Permissions : Click here
Downloaded from http://journals.cambridge.org/NLE, IP address: 158.42.28.33 on 12 Mar 2015
Natural Language Engineering 20 (1): 131–149.
doi:10.1017/S1351324912000204
c Cambridge University Press 2012
131
BASRAH: an automatic system to identify the
meter of Arabic poetry
M A Y T H A M A L A B B A S 1 , Z A I N A B A. K H A L A F 1,2 and
K H A S H A N M. K H A S H A N 3
1 Department
of Computer Science, College of Science, Basrah University, Basrah, Iraq
e-mail : maytham.alabbas@gmail.com
2 School of Computer Science, University Science Malaysia (USM), 11800 Penang, Malaysia
e-mail : zainab ali2004@yahoo.com
3 P.O. BOX 42821, Riyadh 11551, Saudi Arabia
e-mail : khashan kh@yahoo.com
(Received 25 November 2010; revised 14 May 2012; accepted 31 May 2012;
first published online 8 August 2012 )
Abstract
Arabic prosody is the science that studies the music of Arabic poetry, which is mainly meter
and rhyme. The identification of meters for Arabic verses or poems is a complicated task.
This task requires a certain level of expertise to identify the meter to which a verse belongs. In
this paper, we present BASRAH,1 a system that automatically identifies the meter of Arabic
poetry by using the numerical prosody method. The numerical prosody method depends on
verse coding, which is derived from the general concept of Al-Khalil’s feet by using two
primary units (cord = 2) and (peg = 3). On testing both old and modern Arabic verses and
poems, BASRAH has proved to be an efficient tool to help inexperienced users to determine
the meter of Arabic verses and poems.
1 Introduction
) is a form of metrical speech with a
Arabic poetry (Alšςr Alςrby,2
rhyme. The rhyme in Arabic poetry is achieved by every line of the poem ending
upon a specific tone. Arabic poetry is categorized into two main types: rhymed (or
measured), and prose, with the former greatly outnumbering the latter. The rhymed
) collected and explained by
poetry falls within fifteen different meters (wzn,
famous lexicographist, grammarian and prosodist Al-Khalil Bin Ahmad Al-Farahidi
1
2
The system is named after the native city of Al-Khalil, the father of Arabic prosody.
The transcription of Arabic examples in this paper follows the Habash–Soudi–Buckwalter
(HSB) transliteration scheme (Habash et al. 2007) for transcribing Arabic symbols. This
scheme extends Buckwalter’s scheme to increase its readability while maintaining the oneto-one correspondence with Arabic orthography as represented in standard encodings of
Arabic, such as Unicode. The following is the HSB transliteration map with different
Buckwalter scheme values indicated in parentheses: Ā (|), Â (>), ŵ (&), Ǎ (<), ŷ
({), h̄ (p), θ (v), D (∗ ), š ($), Ď (Z), ς (E), γ (g), ý (Y), ã (F), ũ (N), ı̃ (K), .
(o).
132
M. Alabbas et al.
Table 1. The Arabic meters
Meter
English Name
Standard patterns for
both hemistiches
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(791 A.D.) in what is known as the science of prosody (ςlm AlςrwD,
) (AlKatib 1971; Kushek 2004). His elaborate circle system remains directly influential
in theories of meters to this day. Al-Akhfash later added one more meter to make
sixteen meters, as listed in Table 1 (Al-‘edany 2001; Mustajeer 2005). Arabic poetry
must follow one of these to be correct (Uthman 2004; Mustafa 2005). The meter of
). The measuring unit of the meter
the rhythmical poetry is known as sea (bHr,
), with every meter containing a certain number
is known as a foot (tfςylh̄,
) of the poem. A line
of feet that the poet has to observe in every verse (byt,
BASRAH, a system to identify meter of Arabic poetry
133
) or hemistiches (mSrς,
). The
of a verse is divided into two halves (šTr,
measuring procedure of a poem is very rigorous. Sometimes adding or removing
a consonant or a vowel can shift the verse from one meter to another. Some of
these meters can be identified, depending on the number of units (syllables) in each
hemistich, with other sub-types such as completed (tAm, ) meters, which contain
) meters, which are verses where the
four units, or brachycatalectic (mjzw’,
) ‘in a
last foot in both hemistiches is omitted (the last foot is called (ςrwD,
) in the second
narrow sense’ in the first hemistich, whereas it is called (Drb,
hemistich). For instance, AlTwyl/The Long meter is always completed, whereas
Alhzj/The Trilling meter is always brachycatalectic. On the other hand, Alrjz/The
Trembling, AlbsyT/The Outspread and Alrml/The Running meters could be either
completed or brachycatalectic. Furthermore, each one of these sixteen meters has
many forms due to different forms of foot for each meter. So there are about 17,292
different forms representing sixteen meters for Arabic prosody. Also, in rhymed
poetry, every verse has to end with the same rhyme (qAfyh̄,
) throughout the
poem.
The rest of the paper is organized as follows. Section 2 introduces an overview
about the Arabic prosody and its basic elements. A brief review of related works
are given in Section 3. Section 4 explains the algorithm for converting the dictation
form to the prosodic form. The numerical prosody method description will be given
in Section 5. Section 6 explains the BASRAH system and Section 7 reports the
experiments performed using BASRAH. Finally, Section 8 presents the conclusions.
2 Arabic prosody and its basic elements
Arabic prosody, which was established by Al-Khalil, can be defined as, a science
by which the right meter is recognized as opposed to a wrong one (Ahmad 2006;
Etmeesh 2006). In his analysis, Al-Khalil observed that every verse consists of an
), which are the letters
identical sequence of vowelless consonants (swAkn,
) or the prolongation letters (
, i.e., Alif ( ),
provided with Sukun (
), which are the letters
Waw ( ) and Ya ( )) and vowelizes (mtHrkAt,
), Fatha (
) and Kasra (
)
provided with diacritical marks, such as Damma (
(Maling 1973; Abdullateef 2007; Wajeeh 2007). These are governed by determinate
collocations of easily distinguishable rhythmic elements of a fixed length, which he
), with elements of variable length, which he called cords
called pegs (ÂwtAd,
). A cord consists of two letters, while a peg has three letters. A cord
(ÂsbAb,
which is composed of a vowelize followed by a vowelless consonant is called a light
), whereas it is called a heavy cord (sbb θqyl,
)
cord (sbb xfyf,
if it consists of two vowelizes (Abdullateef 2007; Wajeeh 2007). A peg which is
composed of two vowelizes followed by a vowelless consonant is called a joined peg
), while it is called a separated peg (wtd mfrwq,
)
(wtd mjmwς,
if it consists of two vowelizes separated by a vowelless consonant (Abdullateef
2007; Wajeeh 2007). Some prosodists refer to another rhythmic element called
), which is a combination of three or four vowelizes followed
interspace (fASlh̄,
by a vowelless consonant. An interspace is called a small interspace (fASlh̄ Sγrý,
134
M. Alabbas et al.
) when it consists of three vowelizes followed by a vowelless consonant,
) when four vowelizes are
and called a large interspace (fASlh̄ kbrý,
followed by a vowelless consonant (Harkat 2007). From these three elements, AlKhalil determined larger entities called feet, which can be combined in different ways
to generate the traditional meters of Arabic prosody. A foot must contain double
cords or a peg and a cord, and must not contain two pegs or three consecutive
cords. In order to grasp the above analysis, it has been compressed into a sentence:
lam. Âra ςalaý Ďah.r jabalı̃ samakatã
(lit.: I did not see a fish on top of a mountain).
Ďah.r is
ςalaý is joined peg,
Here, lam is light cord, Âra is heavy cord,
separated peg,
jabalĩ is small interspace and
samakatã is large interspace.
These names, like most of the metrical terminology, have been borrowed from
Bedouin life, especially from the tent. Prosodists claim that there are ten feet
) that were formed from cords,
known as primary feet (tfAςyl ÂsAsyh̄,
pegs and interspaces. These feet are (Al-Taweel 2006; Abdullateef 2007; Wajeeh
), mafaAςiyln (
), mufaAςaltun.
2007; Khalil 2009) as follows: faςuwlan (
), fAςi lAtun. (
), faAςiln (
), fAςilAatun. (
), mus.tafςilun.
(
), mutafaAςiln (
), maf.ςuwlAatu (
) and mus.taf.ς lun. (
).
(
Minor changes may arise in some parts of the primary feet, which result in new
) (Mukhtar 1985; Al-‘Ali
types of feet called alternative feet (tfAςyl bdylh̄,
),
1998; Isa 2010). The first alteration is called minor relaxations (zHAfAt,
which affect the cords of a verse. These relaxations are divided into twelve types
), Alkf (
), AlTy (
), etc. The other, called
(Shittu 2006), such as Alxbn (
), only affects the end of the last foot of a hemistich
major defects or diseases (ςll,
(Mukhtar 1985). It arises through addition or omission on that basis. These defects
), AlqTς (
),
also divide into twelve types (Shittu 2006), such as AlqTf (
) etc.
Albtr (
3 Related work
Some attempts have been made to automate the process of determining the meter of
Arabic verses. Among these attempts is Al-‘edany’s system (Al-‘edany 2001), which is
based on Al-Hanafy’s method (Al-Hanafi 1991). It takes as input a fully diacritized
verse and outputs different information, such as the prosodic form, verse’s codes in
terms of the numbers (1, 2 and 3), feet and places of segmentation. This system does
not specify the types of relaxations and defects. It recognizes only thirteen meters.
Al-Hussain’s system (Al-Hussian, n.d.) is another attempt in this respect. It takes
as input the code of one of the verse’s hemistiches and outputs the hemistich’s meter,
feet, relaxations and defects. This system does not specify the places of segmentation
for the verse. It recognizes all sixteen meters.
The Khalaf, Shahed and Ali’s system (2009), which is based on Al-Katib’s (1971)
method, recognizes all meters. It takes as input a partially diacritized verses (which
contain the diacritics like Sukun, Tanwiyn, assimilation and end of each hemistich)
and outputs the verse’s meter, the prosodic form, the verse’s code in terms of the
BASRAH, a system to identify meter of Arabic poetry
135
numbers (1, 2, 4, 8 and 16), feet, place of segmentation and the types of relaxations
and defects.
The current system, which we call BASRAH, is a step forward in automating the
process of Arabic prosody. It uses the numerical prosody method (Khashan 2003,
2004, 2005, 2006), which depends on numerical patterns, and not on feet like the
previous methods, to specify the verse’s or poem’s meter(s). BASRAH recognizes
not only all Arabic meters but can also recognize Arabic poetry rhythms. BASRAH
also identifies the types of relaxations and defects by using alternative codes that
are derived from the codes of sixteen primary meters.
4 Prosodic form
Diacritization in Arabic is done by adding special symbols called diacritical marks
(HrkAt,
) to help in spoken language. Some of these special symbols are put
,
above normal Arabic characters, such as the short vowels, known as Damma (
, u), Fatha (
, , a), and a zero vowel, known as Sukun (
, , .), while others are
, , i). For example,
put under them, such as the short vowel, known as Kasra (
) ‘I wrote’ and katab.ta (
) ‘you (masculine) wrote’.
katab.tu (
Arabic prosody is considered a phonetic science. It depends on pronounced, not
on written, letters. The prosodic form is based on the following principle rule:
Only the pronounced sounds are written down, even if they have no corresponding letters in
dictation form. Also, what is not pronounced is left unprinted, even if it has a corresponding
letter in dictation form. (Khalaf et al. 2009)
Accordingly, some letters are either inserted or deleted in the prosodic form.
The prosodic form is an essential step for any correct start to identify the meter
of the verse because it represents the verbal components for any verse in order to
facilitate the next steps of processing (Al-‘edany 2001). This task needs a human
knowledge level in the rules of Arabic and words diacritization, as well as the correct
pronunciation of the Arabic lexical items.
The following algorithm (Al-Katib 1971; Al-‘edany 2001; Khalaf et al. 2009)
shows the steps applied to convert the Arabic verse from the dictation form to the
prosodic form:
(1) Duplicate the geminated letters (having Shadda ( , , ∼) over it) by making
the first one a vowelless consonant (zero vowel) and the other a vowelize. For
).
instance, md∼a ( ) becomes md.da (
(2) Duplicate the prolongated Alif ( , Ā) by making the first one a vowelize and
the other a vowelless consonant.
,
(3) Replace any Tanwiyn3 (or Nunation), known as Tanwiyn Damm (
,
, ã or , Aã) and Tanwiyn Kaser (
, ı̃), by
ũ), Tanwiyn Fath (
) becomes jnwbun. (
).
noon with Sukun ( , n.). For instance, jnwbũ (
3
Nunation is an indefinite morpheme consisting of a short vowel followed by the phoneme
/n/. Nunation is represented using a unique diacritic that has the shape of two of the
diacritics of the short vowel (Habash 2010).
136
M. Alabbas et al.
(4) Delete the conjunctive Hamza (
) within a word from the dictation
) becomes wktŷAb (
).
form. For instance, wAktŷAb (
(5) Write down each pronounced letter (e.g., dagger Alif). Like: hðA ( ) becomes
hAðA ( ). Also, leave unprinted each letter that is not pronounced. For
) becomes ktbw (
).
instance, ktbwA (
, ) from the definite article known as
(6) Delete the assimilated Lam (
Al ( ), which exists before any one of the Sun letters or Solar letters (AlHrwf
, t , θ , d , ð , r , z , s , š , S
, D
,
Alšmsyh̄,
T , Ď , l and n ) and duplicate the Sun letter (making the first one a
)
vowelless consonant and the other a vowelize). For instance, Alš∼ams (
).
becomes Aš.šams (
(7) Replace any of the short vowels Damma, Fatha and Kasra, which appears
over the letter Ha (h ) at the end of a word or over the end of a hemistich,
by its corresponding letters known as Waw (w ), Alif (A ) and Ya (y ),
respectively.
(8) Delete any other special symbols ( , !, (, ), . . .) which exist in the verse. For
) becomes mn hAðA (
).
instance, mn hðA? (
(9) Finally, the first vowelless consonant letter from a pair of consecutive vowelless
consonant letters is deleted except where this pair appears at the end of each
hemistich.
The result of this process contains the prosodic form only, i.e., the Arabic letters
diacritized with the short vowels Damma, Fatha and Kasra, and the zero vowel
Sukun.
Figure 1 shows an example of Arabic verse in dictation form and its prosodic
form.
5 Numerical prosody method
The numerical prosody method (Khashan 2003, 2004, 2005, 2006) is an approach for
presenting Al-Khalil prosody by using numbers as a form instead of feet. It aims at
a comprehensive understanding that does not pay attention to terminology or ritual,
and uses minimal basic necessary tools, which, in the case of prosody, are numbers
(cord = 2) and (peg = 3). Thus, numerical prosody manifests itself in simple, short,
almost mathematical rules representing a program. In addition, it does not burden
man with the terms that loaded this science.
) is a heptameter, this means
For instance, when we say that fAςilaAtun. (
that it contains seven letters:
(fA = 10 = 2)4 + (ςilaA = 110 = 3) + (tun. = 10 = 2) = 7 diacritical marks
(vowelless consonants and vowelizes).
So the Arabic prosody is numerical from its beginning.
4
These codes are explained below. Note that these codes are not the interpretation of the
sequences of 0s and 1s as binary numbers.
137
BASRAH, a system to identify meter of Arabic poetry
Fig. 1. (Colour online) An example of Arabic verse in dictation form (first line), and its
prosodic form (second line).
The following Arabic verse is taken as an example to explain the steps of the
numerical prosody:
(lam. yaςud. qaw.miy. kamaA kaAnuwA wamaA ÂH.snwA γy.r Alt∼ajAfy. ςmlA)
In numerical prosody two steps are used to indicate the meter for any verse:
(1) Representing each letter provided with sukun or the prolongation letters (i.e.,
Alif ( ), Waw ( ) and Ya ( )); for example, m. ( ) by code ‘0’, and everything
else by code ‘1’, for example, qa ( ).5
The prosodic form of the first hemistich of the previous example verse is
lam. yaςud. qaw.miy. kamaA kaAnuw wamaA
So it can be written by these two codes (1 and 0) as follows:
la
1
5
m.
0
ya
1
u
1
d.
0
qa
1
w.
0
mi
1
y
0
ka
1
ma
1
A
0
ka
1
A
0
nu
1
w
0
wa
1
ma
1
A
0
Here the letters without diacritics in the prosodic form are considered as vowelizes because
we work with partially diacritized verses.
138
M. Alabbas et al.
(2) Grouping the binary codes into segments. Each segment must start with (1) and
end with (0). Certain patterns of these segments are significant, in particular
110 (joined peg) and 10 (light cord). We segment the verse by matching the
longest prefix that matches one of these, and we assign a code to each segment
(i.e., 110 = 3 and 10 = 2). Sometimes there is still one vowelize letter (1) alone;
in this case we leave it (e.g. 1110 (small interspace) = 13). Then we simplify
(222 = 6 and 22 = 4). So the first hemistich of the previous example verse can
be coded using cord and peg codes as follows:
la
1
m.
0
2
cord
2
ya
1
u d.
1 0
3
peg
3
qa
1
w.
0
mi y
1 0
2
cord
2
cord
4
ka
1
ma
1
3
peg
3
A
0
ka
1
A
0
nu
1
2
cord
w
0
2
cord
4
wa
1
ma
1
3
peg
3
A
0
Using the numbers as a form soon revealed mathematical properties for Arabic
poetry verses’ rhythm, which could be inferred from meters and their circles. The
numerical prosody codes did not exceed the code (0, 1, 2, 3, 4 and 6), here lies the
importance of the numerical prosody for its simplicity, accuracy and ability to be
programmed. For this reason we used this method in our system to identify the
meter of Arabic poetry.
The whole set of features for the Arabic prosody is summarized in the following
specific rules:
)
(1) There are two main rhythms for Arabic poetry known as Amble (xbby,
) rhythms. These two rhythms are overlapped according
and Naval (bHry,
).
to specific rules, which are known as AltxAb (
(2) The Amble rhythm consists of the equivalent of light cord and heavy cord,
where one of them is taking the place of the other, so there is no relaxation
in the Amble rhythm.
(3) The Naval rhythm consists of alternations of light cord 2 and peg 3. All cords
in this rhythm must be light cords only and able to do relaxation.
(4) The Amble and Naval rhythms are overlapped in interspace 22 in the two
meters (AlkAml/The Perfect and AlwAfr/The Exuberant), which is called
AltxAb.
The summary of these rules is that there are mathematical conditions that
determine and describe the Arabic properties through which the meter is accepted.
Also, there is a mathematical framework that controls these properties and the
numerical prosody comes to determine proprieties’ features.
6 BASRAH system
The system described in this paper, i.e., BASRAH, is used to identify the Arabic
verse’s or poem’s meter using the numerical prosody method (Khashan 2003, 2004,
2005, 2006). BASRAH takes as input a partially diacritized verse and outputs the
verse’s meter, the prosodic form, the verse’s code, place of segmentation and the
139
BASRAH, a system to identify meter of Arabic poetry
types of relaxations and defects. BASRAH has many advantages when compared
with the previous systems that were explained in Section 3. These advantages are as
follows:
(1) All the previous systems depended on the feet to determine the meter of the
Arabic verse, while BASRAH depends on mathematical patterns. The feet
in BASRAH can be inferred from the mathematical patterns. This makes
BASRAH simpler and faster than the other systems.
(2) The Previous systems find the meter for verses only, whereas BASRAH does
so for both verses and poems.
) rhythm as a part of
(3) The Previous systems identify Amble (Alxbb,
AlmtdArk/The Continuous meter, while in BASRAH it is identified as a
separated notion. This makes the results of BASRAH nearest to the thought
of the most of the Arabic prosody human experts.
(4) BASRAH like Khalaf et al.’s (2009) system works on partially diacritized verses
(as input), whereas other systems work on fully diacritized verses (Al-‘edany
2001) or the numerical coding of the prosodic form (Al-Hussian, n.d.).
To make BASRAH faster and effective, we created general standard patterns for
each meter, instead of saving all the meter codes (see Section 5) as in the other
systems. Sometimes, for the same meter there is more than one standard pattern.
For example, the standard pattern of AlTwyl/The Long meter is coded as follows6 :
First hemistich (Sdr,
)
Second hemistich ( ,
)
7
3 [1|2] 3 [21|3|4] 3 [1|2] 3 3 3 [1|2] 3 [21|3|4] 3 [1|2] 3 [2|4|3]
In the above code, the square brackets [ ] mean that only one number between the
brackets is selected.
In the standard pattern of AlmqtDb/the Loppped meter, the codes of both
hemistiches are equivalent, as shown in the following pattern:
First hemistich (Sdr,
[6|32|23] 3 1 3
) Second hemistich ( ,
[6|32|23] 3 1 3
)
BASRAH proceeds in four stages, as follows:
Stage 1: This stage is responsible for inputting partial diacritization of Arabic verse
or poem from the keyboard or selecting from the stored database system.
Then the input is checked, whether it is acceptable according to the rules of
Arabic poetry writing (i.e., does not contain numbers or special symbols) or
not. If the input verse or poem is incorrect, an error message is displayed,
otherwise the processing continues.
6
7
Arabic is written right-to-left. In the current paper, all meter codes are left-to-right, which
are equivalent to verses’ transliteration.
This number becomes 4 when the verse is the first one in the poem only, otherwise it is
still 3.
140
M. Alabbas et al.
Stage 2: This stage uses the algorithm that was explained in Section 4 to convert the
verse (or each verse in the poem) from the dictation form to the prosodic
form.
Stage 3: This stage uses the principles of numerical prosody, which was described
in Section 5, to replace the prosodic form for the verse (or each verse in
the poem) into its equivalent numerical code in terms of the codes (0,8 1,
2, 3, 4 and 6) only.
Stage 4: In this stage, the numeric code of the verse (or each verse in the poem) is
compared with all general patterns for the sixteen meters to identify the
meter(s) of the verse (or each verse in the poem) and then display the
meter(s) if it is found. Otherwise ‘unknown meter’ is displayed.
Furthermore, when the input is a poem, the meter of each verse, as well as the
meter of the whole poem, is displayed. Finally, if the verse or poem belongs to at
least one meter, the user can save the input in a database system if it is not found
in it.
7 Experimental results
To evaluate the effectiveness of BASRAH, we collected two corpora. The first
corpus is a verse corpus that contains a set of 3,000 old and modern Arabic
verses. The second corpus is a poem corpus that contains a set of over 500 old
and modern Arabic poems (3,459 verses). Both corpora contain fully and partially
diacritized verses and poems. These corpora are annotated by human experts and
considered the gold standard corpora. Most of these samples are taken from the
following websites: Adab: Al-Mawso‘a Al-‘lmya ll-Shi‘r Al-‘arabi,9 Awzan Al-Shi‘r
Al-‘arabi,10 and Mawso‘at Al-Shi‘r Al-‘arabi.11
BASRAH achieves an overall 98.6 per cent precision and 98.1 per cent recall
when it is tested on the verse corpus compared with 97.6 per cent precision and 96.3
per cent recall for Khalaf et al.’s (2009) system12 on the same corpus as shown in
Table 2. For all types of verses, BASRAH, which uses the numerical prosody method,
gives better results than Khalaf et al.’s (2009) system, which uses Al-Katib’s method.
This is because the numerical prosody method codes cover a wider range of Arabic
samples than those covered by Al-Katib’s method codes, which did not consider
the old Arabic prosody problems (i.e., unreal relaxations and defects). As shown in
Table 2, BASRAH achieves 100 per cent precision and recall for AlTwyl/The Long,
AlwAfr/The Exuberant, AlmtqArb/The Tripping, Alrml/The Running, Alhzj/The
Trilling, AlmqtDb/The Lopped, AlmtdArk/The Continuous and AlmDArς/The
Similar meters compared with AlwAfr/The Exuberant meter for Khalaf et al.’s
8
9
10
11
12
We used the code (0) in the current paper instead of the code ( ) in the original numerical
prosody method for simplicity.
Available at: www.adab.com.
Available at: http://awzan.com/index.htm.
Mo’asasat Mohammed bin Rashid al-Maktoum, available at: www.arpoetry.com.
This system is the previous system for the first two authors.
141
BASRAH, a system to identify meter of Arabic poetry
Table 2. BASRAH’s precision (P) and recall (R) compared with Khalaf et al.’s
(2009) system, verse corpus
BASRAH
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Khalaf et al.’s
(2009) system
Meter
# Verses
%
P (%)
R (%)
P (%)
R (%)
AlTwyl/The Long
AlkAml/The Perfect
AlbsyT/The Outspread
AlwAfr/The Exuberant
Alrjz/The Trembling
Alsryς/The Swift
AlmtqArb/The Tripping
Alxfyf/The Nimble
Almdyd/The Extended
Alrml/The Running
AlmnsrH/The Flowing
Alhzj/The Trilling
Almjtθ/The Amputated
AlmqtDb/The Lopped
AlmtdArk/The Continuous
AlmDArς/The Similar
Total
350
320
255
49
133
300
255
252
156
320
150
37
25
45
323
30
3,000
11.6
10.6
8.5
1.6
4.4
10
8.5
8.4
5.2
10.6
5
1.2
0.8
1.5
10.7
1
–
100
97.8
98.4
100
91.7
95.6
100
100
96.7
100
100
100
96
100
100
100
98.6
100
96.9
98
100
91.7
94
100
99.2
94.9
100
98
100
96
100
100
100
98.1
98
95.8
98.8
100
93.8
95.5
99.2
99.2
95.4
98.4
95.3
97.2
96
97.8
99.7
96.6
97.6
98
93.1
97.3
100
90.2
92.3
99.2
99.6
93
98.4
94.7
94.6
96
97.8
99.4
93.3
96.3
(2009) system. On the other hand, Alrjz/The Trembling meter is the lowest precision
and recall for both systems (91.7 per cent precision and 91.7 per cent recall in
BASRAH compared with 93.8 per cent precision and 90.2 per cent recall in Khalaf
et al.’s (2009) system). This is due to the fact that this meter is overlapped with
AlkAml/The Perfect meter (the same case with Alsryς/The Swift meter, which is
overlapped by AlkAml/The Perfect as shown in Example 5). We also got 100 per
cent accuracy when we tested BASRAH on the poem corpus. This is due to the fact
that in a tested poem the system finds the meter for each verse in the poem and
then finds the meter for the whole poem. So it gives more accurate result than a
tested verse.
BASRAH, for instance, identifies the verse # 197713 in verses corpus as having
AlbsyT/The Outspread meter, which is correct, as shown in Example 1. In each
example below, lines represent the verse’s transliteration, the prosodic form, segmentation and code respectively.
13
We use partially diacritized examples in Section 6 (Experimental Results) to show the
strength of BASRAH because these samples are considered difficult for non-expert users.
142
M. Alabbas et al.
(Sdr,
)
hðA Al~ðy. t .rf Al.bT.HA' wT.Âthu
hAð l.lðy. t .rf l.bT.HA' wT.Âthw
hA- ðl.- lðy.- t .- rfl.- bT.- HA- 'wT.- Â- thw
2 - 2- 3 - 2- 3 - 2 - 2 - 3 -1- 3
( ,
)
wAl.by.t y .rfhu wAl.Hl~ wAl.Hrmu
wl.by.t y .rfhw wl.Hl.l wl.Hrmw
wl.- by.- ty .- r- fhw- wl.- Hl.- lwl.- H- rmw
2 - 2 - 3 - 1- 3 - 2 - 2 - 3 - 1- 3
Example 1
BASRAH gives ‘unknown meter’ or gives incorrect results if there is an error in
the diacritization of the input verses (e.g., if someone entered Example 1 by changing
the diacritical mark of each hemistich ending from Damma to Sukun) as shown in
Example 2.
First hemistich (Sdr,
)
hðA Al~ðy. t .rf Al.bT.HA' wT.Âth.
hAð l.lðy. t .rf l.bT.HA' wT.Âth.
hA- ðl.- lðy.- t .- rfl.- bT.- HA- 'wT.- Âth.
2 - 2- 3 - 2- 3 - 2 - 2 - 3 - 3
Second hemistich ( ,
)
wAl.by.t y .rfhu wAl.Hl~ wAl.Hrm.
wl.by.t y .rfhw wl.Hl.l wl.Hrm.
wl.- by.- ty .- r- fhw- wl.- Hl.- lwl.- Hrm.
2 - 2 - 3 - 1- 3 - 2 - 2 - 3 - 3
Example 2
Here BASRAH identifies this verse as ‘unknown meter’ because the final code
does not belong to any meter codes, which corresponds to the judgment of a
human expert because the verse’s diacritization is not correct. BASRAH also
gives ‘unknown meter’ if there is an error in the dictation form of the input
verses.
BASRAH can also identify the sub-types of meters, e.g., if they are completed or
brachycatalectic or other. For example, the verse in Example 3 is identified correctly
as having brachycatalectic of AlwAfr/The Exuberant (
).
Example 3
As we have mentioned before, BASRAH depends on codes and not on feet
as the other systems do. It therefore identifies the types of relaxations and defects (which these types related to the feet method) by using alternative codes
that are derived from the codes of sixteen primary meters rather than by using
feet.
BASRAH also works perfectly on both fully and partially diacritized input verses.
Since BASRAH does not work so well on non-diacritized verses, our plan therefore
BASRAH, a system to identify meter of Arabic poetry
143
is to use an Arabic diacritizer system (e.g., MADA; Habash, Rambow and Roth
2009) for the input stage as the future work for BASRAH.
Example 4
In Example 4, in spite of the fact that the input verse is non-diacritized, BASRAH
correctly identifies this verse as having AlmtqArb/The Tripping meter. This is a
very rare case, especially when the verse contains the prolongation letters in the
segmentation positions.
When testing poems, BASRAH assigns the commonest meter of the verses that
make up a poem. For instance, BASRAH identifies the following poem, which
contains four verses, as having Alsryς/The Swift meter, not AlkAml/The Perfect
meter. This is because the first verse belongs to Alsryς/The Swift meter; whereas
the other three verses belong to both AlkAml/The Perfect and Alsryς/The Swift
meters. BASRAH, therefore, identifies the poem as having Alsryς/The Swift meter,
because this is the commonest meter for the individual verses.
Example 5
Some of BASRAH’s result screens are shown in the Appendix.
8 Conclusions
Arabic prosody is the science that studies the music of Arabic poetry, which is
mainly meter and rhyme. The identification of meters for Arabic poetry verses or
144
M. Alabbas et al.
poems is a complicated task. This task needs human expertise to identify the meter
of poetry’s verse.
This paper is an attempt to simplify Arabic prosody by utilizing the computer
to help inexperienced users to identify the meter of Arabic verses or poems and
report the correctness of the verse (or poem). BASRAH uses the numerical prosody
method to achieve its aim through coding the verse using (0, 1, 2, 3, 4 and 6) codes
only. BASRAH was tested on a large set of old and modern Arabic verses and
poems. We have shown that using numbers, by coding the vowel letter as ‘1’ and the
consonant letter as ‘0’, instead of feet to identify the Arabic verse helps to reduce the
burden of many terms that overload the Arabic prosody and make it a thorny and
complex discipline. This will provide a new way of thinking about Arabic poetry
itself, and will also open the door to applications of these ideas to another Arabic
field such as music.
BASRAH achieves an overall 98.6 per cent precision and 98.1 per cent recall
versus 97.6 per cent precision and 96.3 per cent recall by using Khalaf et al.’s (2009)
system over the same set of 3,000 old and modern Arabic verses. This is because
our previous system used Al-Katib’s method, which did not take into consideration
the old Arabic prosody problems (i.e., unreal relaxations and defects). So BASRAH,
which uses the numerical prosody method, can identify some verses that Khalaf
et al.’s (2009) system cannot identify because the codes of these verses do not
match the Al-Katib’s codes, whereas they do match the numerical prosody method
codes. On the other hand, BASRAH achieves 100 per cent accuracy over a set
of 500 Arabic poems. It is characterized by accuracy and simplicity. We therefore
intend to add some other helpful information about Arabic prosody, such as general
information about prosody, the definition of each meter, examples for each meter
with its feet, the prosodic form conversion algorithm, spoken verses and poems etc.,
to enrich BASRAH’s user knowledge, because we hope that it might be useful as
an educational aid.
We speculate that further work by adding an Arabic diacritizer (e.g., MADA),
which plays a vital role in these applications, for the input stage might further
improve BASRAH’s results.
Acknowledgments
We would like to thank Professor Allan Ramsay (The University of Manchester,
UK), Dr. Yasser Sabtan (Al-Azhar University, Eygpt), Dr. Yavor Nenov (Oxford
University, UK), Fatimah Furaiji (University of Szczecin, Poland), Siham Al-Rikabi
(Humboldt University of Berlin, Germany) and Khamis Al-Qubaeissy (Manchester
Metropolitan University, UK) for important suggestions and helpful discussions. We
would also like to extend our thanks to the anonymous reviewers for their helpful
comments. Zainab owes her deepest gratitude to USM and TWAS for financial
support in her PhD study.
BASRAH, a system to identify meter of Arabic poetry
145
Appendix: Current System Example Screens14
Fig. A1. (Colour online) AlTwyl/The Long meter example.
Fig. A2. (Colour online) AlmtqArb/The Tripping meter example.
14
In the current screen, all meter codes are right-to-left depending on Arabic transcription.
146
M. Alabbas et al.
Fig. A3. (Colour online) AlkAml/The Perfect or Alrjz/The Trembling meter example.
Fig. A4. (Colour online) AlkAml/The Perfect or Alsryς/The Swift meter example.
BASRAH, a system to identify meter of Arabic poetry
Fig. A5. (Colour online) AlmnsrH/The Flowing meter example.
Fig. A6. (Colour online) Poem test example (identifying the meter for each verse).
147
148
M. Alabbas et al.
Fig. A7. (Colour online) Poem test example (the meter for the whole poem, which is
Alsryς/The Swift meter).
References
(Al-byna’ Al-ςrwDy llqSydh̄ Al-ςrbyh̄).
Abdullateef, M. 2007.
Cairo, Egypt: Dar Ghareeb llteba‘a wa al-Nasher.
(Al-kAfy fy ςlmy Al-ςruD wa Al-qwAfy), 1st
Ahmad, S. 2006.
ed. Cairo, Egypt: Mktabat al-Thaqafa al-Denya.
(Al-mysr Al-kAfy fy Al-ςrwD wa Al-qwAfy).
Al-‘Ali, F. 1998.
Amman, Jordan: Dar al-Thaqafah Llnasher wa al-Tawzee‘.
(Hwsbt ςlm Al-ςrwD Al-ςrby). Basrah, Iraq:
Al-‘edany, J. 2001.
University of Basrah.
(Al-ςrwD thðybh wa ĂςAdt tdwynh). Baghdad,
Al-Hanafi, J. 1991.
Iraq: Dar al-Sh’oon al-Thaqafya al-‘ama.
Al-Hussian, A. n.d. Program of Azkary Al-Hussian for Al-ςrwD Al-ςrby.
http://azahou45.free.fr/arod1.php Accessed Sep 2010.
(Al-šςr Al-ςrby bAstςmAl Al-ArqAm
Al-Katib, M. 1971.
Al-θnA’yh̄). Basrah, Iraq: Mtba‘t Mslahat al-Moany’ al-Iraqia.
“
” (fy ςrwD Al-šςr Al-ςrby
Al-Taweel, M. 2006.
“qDAyA wa mnAkšAt”). Cairo, Egypt: Dar Ghareeb Llteba‘a wa al-Nasher.
(thwylAt Al-šjrh̄, drAsh̄ fy
Etmeesh, M. 2006.
mwsyqh̄ Al-šςr Al-jdyd). Baghdad, Iraq: Dar al-Sh’oon al-Thaqafya al-‘ama.
Habash, N. 2010. Introduction to Arabic natural language processing. Synthesis Lectures on
Human Language Technologies 3(1): 1–187.
Habash, N., Rambow, O., and Roth, R. 2009. Mada+ tokan: A toolkit for Arabic tokenization,
diacritization, morphological disambiguation, pos tagging, stemming and lemmatization.
Paper presented at the Proceedings of the 2nd International Conference on Arabic Language
Resources and Tools (MEDAR), pp. 242–254. Cairo, Eygpt.
BASRAH, a system to identify meter of Arabic poetry
149
Habash, N., Soudi, A., and Buckwalter, T. 2007. On Arabic transliteration. In Abdelhadi Soudi,
Antal van den Bosch and Günter Neumann (eds.), Arabic Computational Morphology:
Knowledge-Based and Empirical Methods, pp. 15–22. New York: Springer.
(Al-lsAnyAt Al-ryADyh̄ wa Al-ςrwD). Beirut,
Harkat, M. 2007.
Lebanon: Dar al-Hadatha Llteba‘a wa al-Nasher.
(Al-ςrwD Al-ςrby wa mHAwlAt AlIsa, F. 2010.
tTwyr wa Al-tjdyd fyh), 1st ed. Alexandria, Egypt: Dar al-Ma’rifah al-Jami’iah.
(Hwsbh̄ mwAzyn Al-šςr
Khalaf, Z., Shahed, M., and Ali, S. 2009.
Al-ςrby). University of Sharjah Journal of Pure and Applied Sciences 6(1): 41–62.
(ςrwD Al-šςr Al-ςrby), 1st ed. Amman, Jordan: Dar
Khalil, I. 2009.
al-Masyrah lltyba’ah wa al-Nashr.
(Al-xlyl wa Al-ςrwD Al-rqmy 1). Journal of
Khashan, K. 2003. 1
Arabic Linguistics Tradition (JALT) 1: 25–34.
(Al-xlyl wa Al-ςrwD Al-rqmy 2). Journal of
Khashan, K. 2004. 2
Arabic Linguistics Tradition (JALT) 2: 1–12.
(Al-xlyl wa Al-ςrwD Al-rqmy 3). Journal of
Khashan, K. 2005. 3
Arabic Linguistics Tradition (JALT) 3: 24–47.
(Al-xlyl wa Al-ςrwD Al-rqmy 4). Journal of
Khashan, K. 2006. 4
Arabic Linguistics Tradition (JALT) 4: 46–67.
(mHAwlAt Al-tjdyd fy ĂyqAς Al-šςr). Cairo,
Kushek, A. 2004.
Egypt: Dar Ghareeb Llteba‘a wa al-Nasher.
Maling, J. 1973. The Theory of Classical Arabic Metrics. Cambridge, MA: MIT.
(dAŷrh̄ Al-wHdh̄ fy ÂwzAn Al-šςr AlMukhtar, A. 1985.
ςrby). Tunis, Tunisia: al-Monadhama al-Arabia Lltarbya wa al-Thaqafa wa al-‘llom.
(Âhdý sbyl Ălý ςlmy Al-xlyl,
Mustafa, M. 2005.
Al-ςrwD wa Al-qwAfy). Beirut, Lebanon: ‘alaam al-Kutoob Llteba‘a wa al-Nasher wa
al-Tawzee’.
(mdxl ryAdy Ălý ςrwD Al-šςr AlMustajeer, A. 2005.
ςrby). Cairo, Egypt: Dar al-’ayn Llnasher.
Shittu, S. 2006. Rules of metrics, alterations and addition in Arabic prosody. An Encyclopaedia
of The Art 2(1): 1–5.
(Al-mršd Al-wAfy fy Al-ςrwD wa AlUthman, M. 2004.
qwafy). Beirut, Lebanon: Dar al-Kutoob al-‘lmya.
(Al-ςrwD wa Al-qAfyh̄ byn Al-trAθ wa
Wajeeh, M. 2007.
Al-tjdyd). Cairo, Egypt: Mo‘ssat al-Mukhtar Llnasher wa al-Tawzee‘.