TMX editor with Regex support (Windows 10)?
Thread poster: Hayley Leva
Hayley Leva
Hayley Leva
France
Local time: 21:38
French to English
+ ...
Oct 17, 2018

Hello,

I have been looking into tmx editors with the aim of cleaning up a large (and overly messy) translation memory. In particular, I would like to delete:
- number-only segments
- segments containing (only) product codes and/or brand names
- segments shorter than X characters.

I imagine an editor which supports Regex might be the answer, but does such a thing exist?

I've already tried:
- Heartsome tmx editor - I can't find a way
... See more
Hello,

I have been looking into tmx editors with the aim of cleaning up a large (and overly messy) translation memory. In particular, I would like to delete:
- number-only segments
- segments containing (only) product codes and/or brand names
- segments shorter than X characters.

I imagine an editor which supports Regex might be the answer, but does such a thing exist?

I've already tried:
- Heartsome tmx editor - I can't find a way to filter as I need to
- Olifant tmx editor - I read that it supports Regex, but I get an error when trying to import my tmx file ("Item has already been added") so haven't got any further than that (I've made a separate post for that issue*)

Any ideas?
Hayley


*Post regarding Olifant error:
https://www.proz.com/forum/cat_tools_technical_help/329711-olifant_vers_20_build_5_error_when_importing_"item_has_already_been_added_key_in_dictionary".html
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 20:38
Member (2009)
Dutch to English
+ ...
CafeTran = also a very powerful TMX editor Oct 17, 2018

Hmm, CafeTran can open/edit TMXs, and you can then use CafeTran's (extensive) TMX editing features, as well as filter using regular expressions (and delete relevant TUs and re-save), if I remember correctly. Will have a look if I have a moment. Also ask here: https://cafetran.freshdesk.com/support/discussions where I am sure someone else will help you.

Filter dialogue when opening a TMX in CafeTran:

Annotation

Various TMX editing functions available via: Task > TMX memory once TMX is already open in CafeTran:

Capture

Capture2

Michael

[Edited at 2018-10-17 13:23 GMT]


 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
[Deleted] Oct 17, 2018

[Deleted]

[Edited at 2018-10-17 13:47 GMT]


 
Hayley Leva
Hayley Leva
France
Local time: 21:38
French to English
+ ...
TOPIC STARTER
CafeTran looks like a good option (but licence required for TMs with more than 1000 TUs) Oct 17, 2018

Thank you Michael for your reply - from your screenshots it certainly looks like CafeTran might be able to do what I need.
However, I checked the CafeTran licensing page, and it looks like the free version has a TM limit of 1000 TUs. As the TMs I want to clean up each have 15,000 - 30,000 TUs, if I did want to use CafeTran, I would have to pay for a licence. I'm not ruling that out as an option, but I'll need to do some more research first!

One specific query I have relates t
... See more
Thank you Michael for your reply - from your screenshots it certainly looks like CafeTran might be able to do what I need.
However, I checked the CafeTran licensing page, and it looks like the free version has a TM limit of 1000 TUs. As the TMs I want to clean up each have 15,000 - 30,000 TUs, if I did want to use CafeTran, I would have to pay for a licence. I'm not ruling that out as an option, but I'll need to do some more research first!

One specific query I have relates to compatibility, as I created (and currently use) my TMs in Trados Studio (2017). Can you confirm that I can do the following:
- export a tmx of my TM from Trados Studio (2017) (I know how to do this step)
- open and clean up* that tmx in CafeTran
- convert the edited tmx back into a Studio TM


*And in terms if clean up in CafeTran, as well as deleting unwanted segments, do you know whether it is also possible to edit the values of custom fields in my TM?

Regards,
Hayley
Collapse


 
Rodolfo Raya
Rodolfo Raya  Identity Verified
Local time: 17:38
English to Spanish
Maxprograms' TMXEditor supports regular expressions Oct 17, 2018

Hi,

TMXEditor, https://www.maxprograms.com/products/tmxeditor.html supports filtering on regular expressions.

Select the segments that you want and then delete what you see appropriate.

Regards,
Rodolfo


 
Selcuk Akyuz
Selcuk Akyuz  Identity Verified
Türkiye
Local time: 23:38
English to Turkish
+ ...
Why don't you use DVX? Oct 17, 2018

Hi Hayley,

Deja Vu is listed on your profile page, why don't you use it for this task? You can easily filter (and delete) such segments using SQL filters.


 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
No TUs restrictions for TMX Editing in CafeTran Espresso Oct 18, 2018

Hi Hayley,

As I have been able to confirm recently, the demo version limits do NOT apply to TMX editing in CafeTran Espresso.

You can edit big TMs without owning a license.

Just drop the TMX on the dashboard and choose Edit Translation Memory.

https://www.proz.com/forum/cat_tools_technical_help/326359-in_search_for_an_app_for_quick_translations.html

*And in terms if clean up in CafeTran, as well as deleting unwanted segments, do you know whether it is also possible to edit the values of custom fields in my TM?


Yes, from Michael's screenshot: Task > TMX memory > Set TMX property / Remove TMX property. This applies to all filtered segments. Apart from the Filter menu, you can also filter segments by searching them in the Quick Search Bar. For example:

Type the number 5 and only the 5th segment will be selected

Type 5-95 and only that range will be selected.

Just make sure that when you click Ctrl+F to bring up the advanced search preferences, "Segment numbers" is enabled on the left side of the search window.

One specific query I have relates to compatibility, as I created (and currently use) my TMs in Trados Studio (2017). Can you confirm that I can do the following:
- export a tmx of my TM from Trados Studio (2017) (I know how to do this step)
- open and clean up* that tmx in CafeTran
- convert the edited tmx back into a Studio TM


Yes, SDL Trados can export and import TMX files, and CafeTran produces valid TMX files which can be imported back into SDL Trados. I have done so successfully in the 2015 version.

Jean

PS: For quickly customizing the UI to your liking, you can read my quick Getting comfortable document for CafeTran. Especially, you will want to test which of the 6 available window layouts (found in View > Window layout submenu) is the most practical for you in the TMX editing scenario. Try this with a TMX already loaded.

[Edited at 2018-10-18 02:26 GMT]


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 21:38
Member (2006)
English to Afrikaans
+ ...
@Hayley Oct 18, 2018

Hayley Leva wrote:
Can you confirm that I can do the following:
- export a tmx of my TM from Trados Studio (2017) (I know how to do this step)
- open and clean up* that tmx in CafeTran
- convert the edited tmx back into a Studio TM


I have no idea what CafeTran does to a TMX file, but even if we assume that CafeTran and Trados 2017 both fully accept valid TMX, there may still be a problem on Trados' end. Trados can export TMX files in a variety of variations, and not all of those options are roundtrip compatible. You'd have to ask this question in the Trados forum, i.e. "if I want to export to TMX and then import from TMX again, what export/import settings should I use to ensure that the imported units contain all the same information as in the original TM?".

Theoretically, another thing can go wrong: CafeTran's XML engine may not be compatible with Trados' XML engine. I know from experience that Trados' engine is quite forgiving about non-valid XML characters in some cases (and unforgiving in other cases), and we don't know how CafeTran's engine would deal with such characters (would it refuse to open the TMX, would it open the TMX and silently replace the invalid characters with something else, would it open the TMX but then refuse to save it until you've removed the invalid characters, etc).


 
Mikhail Zavidin
Mikhail Zavidin
Local time: 23:38
English to Russian
+ ...
Still you can use an ordinary text editor with regex support Oct 18, 2018

I use following code to find empty segments and segments with integer numbers to delete them in tmx file loaded into Notepad++ editor (en-US and ru).


<tu>\r?\n?<tuvxml:lang=\"en-US\"><seg\/?>\d**\r?\n?(?:<\/seg>)?<\/tuv>\r?\n?<tuv xml:lang=\"ru\"><seg\/?>\d**\r?\n?(?:<\/seg>)?<\/tuv>\r?\n?<\/tu>\r?\n?


You can use it as a sample. Just change every �
... See more
I use following code to find empty segments and segments with integer numbers to delete them in tmx file loaded into Notepad++ editor (en-US and ru).


<tu>\r?\n?<tuvxml:lang=\"en-US\"><seg\/?>\d**\r?\n?(?:<\/seg>)?<\/tuv>\r?\n?<tuv xml:lang=\"ru\"><seg\/?>\d**\r?\n?(?:<\/seg>)?<\/tuv>\r?\n?<\/tu>\r?\n?


You can use it as a sample. Just change every “en-US” and “ru” to your source and target languages.

[Редактировалось 2018-10-18 11:09 GMT]
Collapse


Krzysztof Wierzbicki
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

TMX editor with Regex support (Windows 10)?







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »